A partner recently sent me a RESX (.NET Resource) text file in Hebrew for a project I'm working on. When I opened it, it looked really bad, as seen in the screenshot below.
I assumed this was a classic ANSI vs. UTF-8 problem, so I tried to do a quick conversion within Notepad2, but that resulted in nonsense East Asian characters from all over:
This means that the original file was in fact in ASCII, but just not using an English codepage. I've talked a little about codepages in the Hanselminutes Podcast on Internationalization in the past and regularly point folks to the Joel on Software article on Internationalization.
In a nutshell, a codepage is a "view" that someone can use to look
Ordinarily, I recommend that everyone (including Israelis) use Unicode/UTF-8 to represent their Hebrew characters, however, a lot of folks who write Hebrew use the Windows 1255 codepage for their encoding. (BTW, go here, scroll down, and bask in the glory of this reference site.)
So, how do I convert from one codepage to another, or in my case, from Hebrew 1255 to UTF8?
We start with Visual Studio.NET.
Now you've successfully normalized your document to UTF-8, and it should be usable as a RESX file or whatever you like. Here's that same document loaded into Notepad2.
And it was Good.
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.
Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.