On Wed, 9 May 2018 09:58:44 +1200, Peter Reutemann wrote:
The code wasn't the problem... The levels of
management to go through
to approve changes to a core component of Windows, then the same for
applying for budget, approving budget (each time several rounds) ...
Well, guess what: flushed with that previous success, those daring
folks at Microsoft have gone even further
The new and improved Notepad now has better Unicode support,
defaulting to saving files as UTF-8 _without_ a Byte Order Mark ...
You may or may not know, but “Byte Order Mark” is Unicode character
U+FEFF, while the character code with the bytes swapped, U+FFFE, is
“unassigned”, and will forever remain so. The usefulness of this pair
dates back to the era when Unicode was only 16 bits, so what is now
“UTF-16” encoding was equivalent to fixed-length “UCS-2” encoding. You
may also know about the “big-endian” versus “little-endian” issue
between different processor architectures. So text encoded in UCS-2 or
UTF-16 is supposed to begin with a Byte Order Mark, and any program
reading that text can check that the first character is indeed u+FEFF.
If it sees U+FFFE instead, then it knows that the encoding comes from a
machine with the opposite endianness, and can automatically apply a
corresponding byte-swap adjustment to the text.
Since UCS-2 is no longer sufficient to represent current versions of
Unicode, and UTF-16 is a pain to deal with, UTF-8 is considered a much
superior encoding. Furthermore, its definition is
endianness-independent, so software running on different architectures
always agrees about how the bytes are ordered. However, Microsoft in
their wisdom decided that their version of UTF-8 text should still begin
with a Byte Order Mark (UTF-8-encoded, of course). Which is completely
pointless and ends up introducing a garbage character at the start when
read by non-Windows software.