First time here? Check out the site's "greatest hits" or read a post from the archives. Feel free to leave a comment or ask a question, and consider subscribing to the latest posts via RSS or e-mail. Thanks for visiting!
Do you Tweet? Follow me on Twitter @shanselman or learn how to use Twitter!
« Date.ParseExact and the subtle goo that ... | Main | Coding4Fun is LIVE! - A new syndicated M... »

Hopefully Michael Kaplan will step in here and explain some edge case or just a general comment like "that's totally wrong, Scott" - but until he does:

A fellow emailed me this question:

I want to convert a string representation of a Unicode character back into a 'char' in .NET C#.  Can you help?
 
i.e."U+0041" which is Hexidecimal for 65 which is ASCII for "A"
 
There's got to be a built in function(s) for this, and I just can't seem to find them?
 
To give you an idea, the pseudocode would be something like:
 
string s = "U+0041";
char c = new ?Unicode.Decoder.Decode?(s);
textBox1.Text = c.ToString();

Now, I have no idea why this gentleman would want to do this, but it's interesting enough. Here's what I came up with. I'm sure there's a better way.

//Just a reminder that you can use \u to escape Unicode in C#
char c = '\u0063';
Console.WriteLine(c.ToString());

//Here's how you'd go from a string to stuff like
// U+0053 U+0063 U+006f
string scott = "Scott and the letter c";
foreach(char s in scott)
{
	Console.Write("U+{0:x4} ",(int)s);
}
		
//Here's how converted a string (assuming it starts with U+)
// containing the representation of a char
// back to a char
// Is there a built in, or cleaner way? Would this work in Chinese?
string maybeC = "U+0063";
int p = int.Parse(maybeC.Substring(2), System.Globalization.NumberStyles.HexNumber);
Console.WriteLine((char)p);

Now playing: Craig Armstrong - Ray's Theme



Sunday, April 17, 2005 6:11:55 PM (Pacific Standard Time, UTC-08:00)
Well, you do have to know what code page. To convert using the default code page, you can use

Encoding.Default.GetBytes(stUnicodeString)

to get back a byte array containing the non-Unicode character(s).
Sunday, April 17, 2005 8:19:17 PM (Pacific Standard Time, UTC-08:00)
Ok, I get it, the question is confusing. He's not asking "how do I convert a string representation", he's asking "how do I convert a BYTE representation" Hex is just a string-y way of representing bytes (along with base64, etc).

So,

b = Text.Encoding.Unicode.GetBytes(s)
s = Text.Encoding.Unicode.GetString(b)

where s = string and b = array of bytes
Sunday, April 17, 2005 10:03:35 PM (Pacific Standard Time, UTC-08:00)
Depending on the requirement, you should also be aware of System.Globalization.StringInfo.
Tuesday, April 19, 2005 3:46:34 PM (Pacific Standard Time, UTC-08:00)
Since Michael didn't say it: that works for a UCS-2 string, but not for a UTF-16 string. Granted, few strings have UTF-16 bits, but isn't it more fun to make it completely right?

// Completely untested

// String to Unicode code points
string scott = "Scott and the letter c";
int highbits = 0;
foreach (char ch in scott)
{
/**/ int i = (int) ch;
/**/ if (i < 0xD800 || i > 0xDFFF)
/**/ /**/ Console.Write("U+{0:x4} ", i);
/**/ else if (i < 0xDC00) // ... Surrogate high
/**/ /**/ highbits = i - 0xD800;
/**/ else // ... Surrogate low
/**/ /**/ Console.Write("U+{0:x6} ", highbits << 10 + (i - 0xDC00) + 0x10000);
}

// Unicode code point to string
string codePoint = "U+12345";
int ordinal = int.Parse(codePoint.substring(2), System.Globalization.NumberStyles.HexNumber);
if (ordinal < 0x10000)
/**/ Console.WriteLine((char) ordinal);
else
/**/ Console.WriteLine((char) ((ordinal - 0x10000) >> 10 + 0xD800), (char) ((ordinal - 0x10000) & 0x3FF + 0xDC00));
Comments are closed.

Contact

Sponsors

Hosting By

Hot Topics

Tags

Calendar

<November 2009>
SunMonTueWedThuFriSat
25262728293031
1234567
891011121314
15161718192021
22232425262728
293012345

Archives

November, 2009 (2)
October, 2009 (19)
September, 2009 (11)
August, 2009 (12)
July, 2009 (21)
June, 2009 (26)
May, 2009 (16)
April, 2009 (13)
March, 2009 (17)
February, 2009 (17)
January, 2009 (18)
December, 2008 (32)
November, 2008 (17)
October, 2008 (22)
September, 2008 (16)
August, 2008 (14)
July, 2008 (25)
June, 2008 (19)
May, 2008 (17)
April, 2008 (17)
March, 2008 (26)
February, 2008 (21)
January, 2008 (28)
December, 2007 (19)
November, 2007 (17)
October, 2007 (31)
September, 2007 (39)
August, 2007 (37)
July, 2007 (43)
June, 2007 (37)
May, 2007 (32)
April, 2007 (38)
March, 2007 (29)
February, 2007 (46)
January, 2007 (31)
December, 2006 (27)
November, 2006 (31)
October, 2006 (32)
September, 2006 (39)
August, 2006 (34)
July, 2006 (40)
June, 2006 (18)
May, 2006 (31)
April, 2006 (34)
March, 2006 (30)
February, 2006 (38)
January, 2006 (44)
December, 2005 (19)
November, 2005 (34)
October, 2005 (24)
September, 2005 (37)
August, 2005 (20)
July, 2005 (24)
June, 2005 (33)
May, 2005 (16)
April, 2005 (22)
March, 2005 (34)
February, 2005 (15)
January, 2005 (37)
December, 2004 (28)
November, 2004 (30)
October, 2004 (34)
September, 2004 (22)
August, 2004 (34)
July, 2004 (18)
June, 2004 (64)
May, 2004 (49)
April, 2004 (21)
March, 2004 (29)
February, 2004 (29)
January, 2004 (36)
December, 2003 (25)
November, 2003 (24)
October, 2003 (59)
September, 2003 (42)
August, 2003 (24)
July, 2003 (44)
June, 2003 (29)
May, 2003 (21)
April, 2003 (30)
March, 2003 (27)
February, 2003 (47)
January, 2003 (50)
December, 2002 (31)
November, 2002 (38)
October, 2002 (44)
September, 2002 (15)
May, 2002 (2)
April, 2002 (4)

Google Ads