First time here? Check out the site's "greatest hits" or read a post from the archives. Feel free to leave a comment or ask a question, and consider subscribing to the latest posts via RSS or e-mail. Thanks for visiting!
« Date.ParseExact and the subtle goo that ... | Main | Coding4Fun is LIVE! - A new syndicated M... »

Converting from a String Representation of a Unicode Character back into a char

Posted in Internationalization.

Hopefully Michael Kaplan will step in here and explain some edge case or just a general comment like "that's totally wrong, Scott" - but until he does:

A fellow emailed me this question:

I want to convert a string representation of a Unicode character back into a 'char' in .NET C#.  Can you help?
 
i.e."U+0041" which is Hexidecimal for 65 which is ASCII for "A"
 
There's got to be a built in function(s) for this, and I just can't seem to find them?
 
To give you an idea, the pseudocode would be something like:
 
string s = "U+0041";
char c = new ?Unicode.Decoder.Decode?(s);
textBox1.Text = c.ToString();

Now, I have no idea why this gentleman would want to do this, but it's interesting enough. Here's what I came up with. I'm sure there's a better way.

//Just a reminder that you can use \u to escape Unicode in C#
char c = '\u0063';
Console.WriteLine(c.ToString());

//Here's how you'd go from a string to stuff like
// U+0053 U+0063 U+006f
string scott = "Scott and the letter c";
foreach(char s in scott)
{
	Console.Write("U+{0:x4} ",(int)s);
}
		
//Here's how converted a string (assuming it starts with U+)
// containing the representation of a char
// back to a char
// Is there a built in, or cleaner way? Would this work in Chinese?
string maybeC = "U+0063";
int p = int.Parse(maybeC.Substring(2), System.Globalization.NumberStyles.HexNumber);
Console.WriteLine((char)p);

Now playing: Craig Armstrong - Ray's Theme



Sunday, April 17, 2005 6:11:55 PM (Pacific Standard Time, UTC-08:00)
Well, you do have to know what code page. To convert using the default code page, you can use

Encoding.Default.GetBytes(stUnicodeString)

to get back a byte array containing the non-Unicode character(s).
Sunday, April 17, 2005 8:19:17 PM (Pacific Standard Time, UTC-08:00)
Ok, I get it, the question is confusing. He's not asking "how do I convert a string representation", he's asking "how do I convert a BYTE representation" Hex is just a string-y way of representing bytes (along with base64, etc).

So,

b = Text.Encoding.Unicode.GetBytes(s)
s = Text.Encoding.Unicode.GetString(b)

where s = string and b = array of bytes
Sunday, April 17, 2005 10:03:35 PM (Pacific Standard Time, UTC-08:00)
Depending on the requirement, you should also be aware of System.Globalization.StringInfo.
Tuesday, April 19, 2005 3:46:34 PM (Pacific Standard Time, UTC-08:00)
Since Michael didn't say it: that works for a UCS-2 string, but not for a UTF-16 string. Granted, few strings have UTF-16 bits, but isn't it more fun to make it completely right?

// Completely untested

// String to Unicode code points
string scott = "Scott and the letter c";
int highbits = 0;
foreach (char ch in scott)
{
/**/ int i = (int) ch;
/**/ if (i < 0xD800 || i > 0xDFFF)
/**/ /**/ Console.Write("U+{0:x4} ", i);
/**/ else if (i < 0xDC00) // ... Surrogate high
/**/ /**/ highbits = i - 0xD800;
/**/ else // ... Surrogate low
/**/ /**/ Console.Write("U+{0:x6} ", highbits << 10 + (i - 0xDC00) + 0x10000);
}

// Unicode code point to string
string codePoint = "U+12345";
int ordinal = int.Parse(codePoint.substring(2), System.Globalization.NumberStyles.HexNumber);
if (ordinal < 0x10000)
/**/ Console.WriteLine((char) ordinal);
else
/**/ Console.WriteLine((char) ((ordinal - 0x10000) >> 10 + 0xD800), (char) ((ordinal - 0x10000) & 0x3FF + 0xDC00));
Comments are closed.

Contact

Sponsors

Hosting By

On this page...

Tags

Calendar

<November 2008>
SunMonTueWedThuFriSat
2627282930311
2345678
9101112131415
16171819202122
23242526272829
30123456

Archives

Google Ads