Step One: Open up Notepad
Step Two: type "this app can break" (without the quotes)
Step Three: Save the document
Step Four: Close Notepad
Step Five: double-click the text file you just saved to open it back up in Notepad.
You may be wondering why it behaves this way -- it's because it uses IsTextUnicode to try to determine whether the text is unicode input or not, and the call is failing. That Win32 API is essentially broken (though it can be useful in some circumstances), and it's claiming that the text is really unicode data. Basically, it boils down to the bytes stored in the file. If you look at them under a hex editor, you'll notice that they can be combined into legal CJK ideographs. So it is unicode text. And ASCII text. At the same time! Just depends on how you look at it. ;-)
I'm obviously not at my most inventive tonight. When can it be useful?
Some of the cases aren't utterly broken. For example, passing in the IS_TEXT_UNICODE_SIGNATURE ("The text contains the Unicode byte-order mark (BOM) 0xFEFF as its first character.") strikes me as one that's safe to use. However, since the spec for unicode is constantly changing, most of the flags seem to be things which are probably not going to always be safe.
How exactly does that phrase cause Notepad to break? Is it the first two characters, or...? If "th" caused problems, I'd imagine it would have been fixed by now.
This reminds me of those Google-bombs...
Did you discover this yourself, or something you saw somewhere else? If you found this, I must say that I am most impressed and be prepared to get a flood of traffic as this gets around.
@Mike -- I mentioned the cause in the post. It's the fact that the binary representation is both legal ASCII and Unicode at the same time.
@Phil -- actually, this is a pretty well-known issue with Notepad that's been around since NT 4 (I believe, though it may have been since NT 3.51 too). There are other ways to cause the same issue, such as "Bush hid the facts" (sans the quotes).
Is it Unicode as in UTF16 (I mean UTF16LE) or UTF8?
I believe UTF16LE (or BE), but I could be wrong.
This is one of those neat tricks like when you point your two fingers in front of your eyes, and if you do it just right, it looks like a floating hot dog.
And I thought all these years that the bugs I was squashing were my own when they're really Microsoft's ;)