Scott Hanselman

Converting from a String Representation of a Unicode Character back into a char

April 18, '05 Comments [4] Posted in Internationalization
Sponsored By

Hopefully Michael Kaplan will step in here and explain some edge case or just a general comment like "that's totally wrong, Scott" - but until he does:

A fellow emailed me this question:

I want to convert a string representation of a Unicode character back into a 'char' in .NET C#.  Can you help?
i.e."U+0041" which is Hexidecimal for 65 which is ASCII for "A"
There's got to be a built in function(s) for this, and I just can't seem to find them?
To give you an idea, the pseudocode would be something like:
string s = "U+0041";
char c = new ?Unicode.Decoder.Decode?(s);
textBox1.Text = c.ToString();

Now, I have no idea why this gentleman would want to do this, but it's interesting enough. Here's what I came up with. I'm sure there's a better way.

//Just a reminder that you can use \u to escape Unicode in C#
char c = '\u0063';

//Here's how you'd go from a string to stuff like
// U+0053 U+0063 U+006f
string scott = "Scott and the letter c";
foreach(char s in scott)
	Console.Write("U+{0:x4} ",(int)s);
//Here's how converted a string (assuming it starts with U+)
// containing the representation of a char
// back to a char
// Is there a built in, or cleaner way? Would this work in Chinese?
string maybeC = "U+0063";
int p = int.Parse(maybeC.Substring(2), System.Globalization.NumberStyles.HexNumber);

Now playing: Craig Armstrong - Ray's Theme

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Monday, 18 April 2005 02:11:55 UTC
Well, you do have to know what code page. To convert using the default code page, you can use


to get back a byte array containing the non-Unicode character(s).
Monday, 18 April 2005 04:19:17 UTC
Ok, I get it, the question is confusing. He's not asking "how do I convert a string representation", he's asking "how do I convert a BYTE representation" Hex is just a string-y way of representing bytes (along with base64, etc).


b = Text.Encoding.Unicode.GetBytes(s)
s = Text.Encoding.Unicode.GetString(b)

where s = string and b = array of bytes
Monday, 18 April 2005 06:03:35 UTC
Depending on the requirement, you should also be aware of System.Globalization.StringInfo.
Tuesday, 19 April 2005 23:46:34 UTC
Since Michael didn't say it: that works for a UCS-2 string, but not for a UTF-16 string. Granted, few strings have UTF-16 bits, but isn't it more fun to make it completely right?

// Completely untested

// String to Unicode code points
string scott = "Scott and the letter c";
int highbits = 0;
foreach (char ch in scott)
/**/ int i = (int) ch;
/**/ if (i < 0xD800 || i > 0xDFFF)
/**/ /**/ Console.Write("U+{0:x4} ", i);
/**/ else if (i < 0xDC00) // ... Surrogate high
/**/ /**/ highbits = i - 0xD800;
/**/ else // ... Surrogate low
/**/ /**/ Console.Write("U+{0:x6} ", highbits << 10 + (i - 0xDC00) + 0x10000);

// Unicode code point to string
string codePoint = "U+12345";
int ordinal = int.Parse(codePoint.substring(2), System.Globalization.NumberStyles.HexNumber);
if (ordinal < 0x10000)
/**/ Console.WriteLine((char) ordinal);
/**/ Console.WriteLine((char) ((ordinal - 0x10000) >> 10 + 0xD800), (char) ((ordinal - 0x10000) & 0x3FF + 0xDC00));
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.