How to convert a text file with a specific Codepage to UTF-8 using Visual Studio .NET
A partner recently sent me a RESX (.NET Resource) text file in Hebrew for a project I'm working on. When I opened it, it looked really bad, as seen in the screenshot below.
I assumed this was a classic ANSI vs. UTF-8 problem, so I tried to do a quick conversion within Notepad2, but that resulted in nonsense East Asian characters from all over:
This means that the original file was in fact in ASCII, but just not using an English codepage. I've talked a little about codepages in the Hanselminutes Podcast on Internationalization in the past and regularly point folks to the Joel on Software article on Internationalization.
In a nutshell, a codepage is a "view" that someone can use to look
Ordinarily, I recommend that everyone (including Israelis) use Unicode/UTF-8 to represent their Hebrew characters, however, a lot of folks who write Hebrew use the Windows 1255 codepage for their encoding. (BTW, go here, scroll down, and bask in the glory of this reference site.)
So, how do I convert from one codepage to another, or in my case, from Hebrew 1255 to UTF8?
We start with Visual Studio.NET.
- From the File menu, select Open to bring up the dialog. Note that you'll have to select a file before the Open button is un-grayed.
- Select the tiny down-arrow on the right side of the Open button and select "Open With..."
- -5 points for obscure UI design here
- Next, select "Source Code (Text) Editor with Encoding." The With Encoding part is crucial, otherwise the next dialog won't appear.
- From the Encoding Dialog, select the source document's encoding - in my case, that's Hebrew 1255.
- -10 points for obscure UI design here.
- At this point the document should appear correctly in Visual Studio.NET. Now, reverse the process by selecting File|Save As and clicking the tiny down arrow on the Save button, and selecting both the desired Encoding and Line Endings.
Now you've successfully normalized your document to UTF-8, and it should be usable as a RESX file or whatever you like. Here's that same document loaded into Notepad2.
And it was Good.