Internationalized Regular Expressions
UPDATE: There's more on Internationalized RegExs in this StackOverflow question.
I was trying to make a regular expression for use in client-side JavaScript (using a PeterBlum Validator) that allowed a series of special characters:
-'.,&#@:?!()$\/
Plus letters and numbers and whitespace:
\w\d\s
However, I mistakenly assumed that \w meant truly "word characters." It doesn't, it means [A-Za-z].
That sucks. What about José, when he wants to put his First Name into a form?
Well, I could do a RegEx that denies specific characters and allows all others, but I really just wanted to support Spanish, French, English, German, and any language that uses the general Latin Character Set.
So, here's what I have.
^[
ÀÈÌÒÙ àèìòù ÁÉÍÓÚ Ý áéíóúý
ÂÊÎÔÛ âêîôû ÃÑÕ ãñõ ÄËÏÖÜŸ
äëïöüŸ ¡¿çÇŒœ ߨøÅå ÆæÞþ
Ðð ""\w\d\s-'.,&#@:?!()$\/
]+$
Did I miss anything? (Ignore the whitespace for the purposes of this post's RegEx)
It's lame that \w doesn't work on the client-side based on your browser's locale. This makes it difficult for your RegExes to have parity between the client and server.
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter


Why is it all the crazy stuff happens to me? I mean SERIOUSLY crazy.