Scott Hanselman

Using Crowdsourcing for Expanding Localization of Products

November 13, '08 Comments [33] Posted in ASP.NET | Internationalization | MSDN | Programming | Tools | Windows Client
Sponsored By

UPDATE: I wanted to add that these translation APIs are all part of Microsoft Translator services and are available for developers to use and build their own localized communities. The documentation is up on MSDN for AJAX/JSON, SOAP or POX (Plain old XML) APIs you can put in your apps today. Also, be sure to check out the Microsoft Translator Blog for more technical details on the V2 APIs and translator widget.

Not everyone in the world speaks English. Such a silly thing to say, but if you live in an English-speaking country it's easy to forget that many (most?) people in the world would prefer to do their work in the language of their choice.

Microsoft ships documentation in Visual Studio that is human-translated (a huge effort) into 9 major world languages. That's millions and millions of words * 9 languages. How can we cover more languages? How can we make documentation easier for folks who are trying to learn about our products and don't speak English fluently? How can we make English interfaces easier to use for non-English speakers who want to learn English?

Last month, I spoke to members of the internationalization/globalization team in DevDiv (Developer Division) about some of the little-known stuff they are doing. I think deserves more attention as there's some pretty innovative things being done. Some are experimental, but there's hope to expand them if they succeed.

MSDN uses Machine Translation and Crowdsourcing for Documentation

Doing a lot of work with a few people is hard. Doing a lot of work with a lot of people is confusing and expensive. However, doing a little bit of work with a LOT of interested people can be useful, cheap and fun if you "crowd-source" rather than outsource. Check out the screenshot below or visit the Brazilian MSDN site and check out the Translation Wiki v2.

BrazilianMSDN

You'll see there's the English MSDN documentation on the left, and Brazilian Portuguese on the right.

 LadoALado

Make sure to select "side-by-side" or "Lado a Lado." If you hover over a sentence on the Portuguese side, a small Edit button will appear.

image

Click Edit, and you can suggest a better translation, and they'll go into a queue for community moderators to approve. Notice also that under "Other Suggestions" you'll see existing suggested translations that are in the queue for moderation.

image

The initial Portuguese text comes from the Machine Translation team. For some reason, Portuguese is the best language that the Machine Translation team understands.

The text on the site is roughly 80% MT (Machine Translated) and 20% humans via these technique, and growing. There's a goal to include more languages for the next version of Visual Studio, including possibly Arabic, Czech, Polish and Turkish, although things are still a little up in the air.

If you know a Brazilian developer, spread the word about this project and encourage them to make edits to the Brazilian MSDN site and check out the Translation Wiki v2.

Big thanks to our community partners: a group of 30 CS students, partly from the team of Prof. Hirata and Prof. Forster of Instituto Tecnologico de Aeronautica and the team of Prof. Simone Barbosa from Pontifícia Universidade Católica who post-edited 1.8 million words of MT'ed content; the Brazilian Terminologist who managed the glossary project with our MVPs; and finally the Academic Evangelist Team in DPE in Brazil who gave us their support throughout the project.

It'll be interesting to see how far this project goes and what other languages can benefit from it.

Captions Language Interface Pack (CLIP) - includes 9 more partial language translations for Visual Studio

Here's a description of the CLIP from a launching page:

"The Microsoft Captions Language Interface Pack (CLIP) is a simple language translation solution that uses tooltip captions to display results. Use CLIP as a language aid, to see translations in your own dialect, update results in your own native tongue or use it as a learning tool."

This is pretty clever. It's a background application that will show balloon tooltip help in your language while you work in the English version of Visual Studio. For example, in the screenshot below, I'm hovering my mouse over Start Debugging, and the Arabic CLIP pops up with a human translation of that menu item.

clip

It'll even help with other applications within Windows if it thinks it's got a decent translation, but for now, it is focused on correct translation for common Visual Studio options.

Even better, you can add translations of your own. In future versions, there's talk about setting up sharing (I figure you can hack it today, though, unsupported, by sharing the language database.

image

Visual Studio CLIP is available in these languages so far, all created with community and student help!

In addition to the CLIP, there's also the ability to do a Language Pack for the Visual Studio interface itself, as exemplified by the Brazilian Visual Studio Express Language Pack for SP1 that does about a 70% translation of VS into Portuguese. There's talk to do more of these also. That should make Carlos Quintero happy!

There's a lot of cool possibilities for all this technology, expanding MSDN and VS to as many languages as possible!

If you think this kind of thinking is pretty cool, leave a comment or blog about it and maybe we'll be heard by *ahem* the boss when he next (soon) reviews plans for this kind of community involvement. ;)

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Friday, November 14, 2008 12:25:49 AM UTC
I may be able to co-ordinate a Swahili language version. I daresay that it is way down the list of target markets, but hey, humanitarian motives: probably fits in with the Bill & Melinda Foundation's vision. Unlock the creativity of a large area of the world. Think of all those Swahili speakers that we see in the camps in Goma on the news every night. Better to be playing with MVC than being shot by Rwandan troops.
Hexagon Global
Friday, November 14, 2008 1:54:44 AM UTC
Caution, rant coming up...
English is not my native language, but still I believe that this effort to be somewhat misguided. I generally prefer to read documentation and programming articles in english, because it has become a uniform language that most, if not all programmers out there can understand and communicate. Most articles, knowledge bases, books and so on are in English, so if you want to read up on something in depth, you need to have at least basic reading skills in English. Translating tooltips inside Visual Studio could end up causing confusion for at least new developers, as what they would see on-screen potentially did not match up with what the tutorial/book they were following.

And don't even get me started on the translation of .Net exceptions. Because of policy at work, I run the Norwegian version of Vista, and the .Net produces exception error messages in Norwegian. First, they are poorly translated(perhaps by the aforementioned machine-translation team?), so they make absolutely no sense, and second, more importantly, there's ABSOLUTELY ZERO results when you try and google for them. With XP, we could at least install a english version of .Net framework, but given that .Net is bundled with Vista, there's no way that can be done any more. I am seriously considering down-grading to XP so that I can get english error messages back. And if scottgu reads this, can I please have my english exception messages back??? Thanks.

End rant. I truly do appreciate that Microsoft is trying to make an effort, and I believe that MSDN has had a vast improvement in usability the past year or so. And the fact that MSFT are allowing community contribution is absolutely fantastic, but at least to me, the translation effort just seems a bit unneccessary.
Erling Paulsen
Friday, November 14, 2008 5:58:47 AM UTC
I absolutely agree with Erling Paulsen. If you don't know English, you're not a programmer!
Friday, November 14, 2008 11:59:51 AM UTC
The worst part is there is no application or add-on for Persian language support. I have no problem with this but some of my friends who're using Persian as their main language, are uncomfortable.

Does Microsoft has any plan to support Persian language?
Friday, November 14, 2008 1:19:01 PM UTC
I love this tooling. Having worked on a number of multi-language projects - translation usually involves some awful tool you've thrown together internally just to "get the translation done".
Friday, November 14, 2008 2:08:20 PM UTC
I tend to agree with Erling as well;

Developers might as well get used to learning new languages (even if they aren't programming languages).

I found that gaining a more profound knowledge of the english language can also lead to a more solid undertanding of certain concepts of a programming language. You'd be surprised how many developers use keywords for which they don't even know the english meaning (e.g. 'yield' in c#)
In most cases this won't cause any serious harm but I think you are a better dev if you have at least basic english reading skills.
Paul van de Loo
Friday, November 14, 2008 4:20:06 PM UTC
Nice article. You should consider it yourself, when you are making demos... I became a little angry when I saw you used a CHAR in your PDC demo. Wonder how many people sent chars that SQL Server actually cannot put that field.
Friday, November 14, 2008 6:35:34 PM UTC
"A programmer who doesn't at least understand English is not a programmer" that's an outrageous statement. That's like saying "a musician who is deaf is not a musician" patently untrue and ridiculous. plus pretty offensive to millions of programmers.
Friday, November 14, 2008 6:37:28 PM UTC
Most programming languages use keywords from the English language; but that's a tiny subset of the English language. To say "if you don't know English you're not a programmer" is a very ignorant thing to say.
Friday, November 14, 2008 6:42:52 PM UTC
I'm from Mexico, and I must say that even in the top private universities (where I was fortunate enough to study) most of the engineering students either don't speak english, or are simply lazy enough to avoid reading it. And the problem in the public universities (where most of the programmers in my country study) is even worse. I've been in a lot of consulting projects where really the use of English is a barrier, preventing other programmers from using the IDE to it's fullest capacity, as well as getting help from resources in the net.

I believe that in an ideal world every programmer should speak and read enough English to be able to work, learn and interact. However (and specially in Latin America) this is still a long term goal. I really applaud the effort being put in by Microsoft and other companies to make resources more available for everyone.

Like Hexagon says, there is a lot of untapped talent out there, trapped by the lack of understanding of the English language. Let's make it easier for everyone, we'll get greater software in return.
Friday, November 14, 2008 6:47:55 PM UTC
Today I try to find some info about SP1 for TFS 2008.
MSDN redirect me to Polish page: http://msdn.microsoft.com/pl-pl/tfs2008/default.aspx (translated version: http://translate.google.com/translate?u=http://msdn.microsoft.com/pl-pl/tfs2008/default.aspx&hl=pl&ie=UTF-8&sl=pl&tl=en). Polish is my native language.
Do you see what is on top? "Team Foundation Server software is ready!". What great news. Using this page is waste of time. I prefer English version:
http://msdn.microsoft.com/en-us/tfs2008/default.aspx.
In this case I don't blame Microsoft. I think that it's not possible to keep up to date all national pages.
In my opinion every programmer has to be able to read English documentation. Everything changes so fast now that waiting few months for translation is too much.
brzozow
Friday, November 14, 2008 6:49:40 PM UTC
I think the the REALLY useful feature about this Translation Wiki is that the content is side by side with English on the left and your language on the right. I think that is a great way to learn English, while still being productive in your native language (Brazilian in this example).
Friday, November 14, 2008 6:58:19 PM UTC
It's perfectly obvious that in this time understanding English is an asset for most people. Absolutely. But I think saying "If you don't know English, you're not a programmer" is a bit strong. And if people are willing to help others, what's the problem?
Friday, November 14, 2008 7:15:22 PM UTC
As programmer from non-english speaking country I agree with idea that programmer must know English by the several reasons:
1. english is international language;
2. I don't know how with translation to other languages, but when I'm reading a books in Russian (my native language) I'm shocked. Our translates make up new definitions and I lost in theirs, however before reading a books I thought that I know this field. So, my opinion that it would be better to read a document in the English.
Friday, November 14, 2008 7:27:55 PM UTC
Pedro
Friday, November 14, 2008 7:28:54 PM UTC
I agree with what Erling Paulsen said. I always install Windows in English even if I'm French, because I want the original messages, the original exceptions, etc, so I can find more information about those on the net. I also prefer English documentation so that exchanging with other developers around the world is easier. Take stackoverflow for example. Even if a French section would appear one day, I'd continue to use the English one, as the greatest pool of information is there.

Hurray to application localization and globalization, but the dev tools and docs can stay in English for me.

BTW, slighty OT: The OpenID box for commenting here is translated in French as "Clic au signe dedans", which is about as clear as "A click at sign into". This tells a lot about how inadecate computer-made translations are.
Friday, November 14, 2008 7:36:19 PM UTC
Moreover, I hate when localization is turned on by default as it made in google. I don't want to use google at Ukrainian, because I prefer English version, but it always redirect to local version:-(
Friday, November 14, 2008 7:43:14 PM UTC
I do think we developers need a common language. When you have a problem, get a strange exception, 9/10 just googling the error message will get you the answer. I have tried developing on a Swedish version of XP but trying to search for those error messages doesn't work. Can't say i agree with the statement "If you don't know English, you're not a programmer" but it does make life easier.
Friday, November 14, 2008 7:45:07 PM UTC
I agree that having a good understanding of the english language helps being a better programmer, and helps learning new programming languages. But coding isn't everything in life. These days I am more in contact with customers and end users, training new developers, etc. and I feel all that reading and writting in english definitely didn´t help my communication in my own language (portuguese from Portugal).

Languages should coexist, side by side, and this is a good example.

Pedro Carvalho
Friday, November 14, 2008 8:17:38 PM UTC
I Think it is a good idea,if we can see english and our language together. because it can solve some of programmer's problems. but do not forget :a programmer must learn english . I can not speak and write english very well , but i'm taking clases and reading english books in my major to make it better. beacase i want to be a good programmer.
farhaneh
Friday, November 14, 2008 8:53:47 PM UTC
I never said, or meant to say that you need to be fluent in english to be a good programmer. And as Scott points out, the side-by-side translation feature would actually be a great way for learning english. It is my belief that this is the reason most scandinavians can speak english so well; Most of the tv-shows we watch are subtitled english/american shows. You read the words in your language and hear the foreign language, you are actually learning another language without knowing it :) So this translation effort might actually help people learn english as a side-effect, which isn't a bad thing afterall.

But if there's a bluebadge from the CLR team reading this, any chance that you can shed some light on the reason for translating the error messages for .Net framework for localized OS versions? IMHO exception messages should NEVER be presented to end users. And developers and IT staff, which are the normal recipients for error messages, would have a much easier time finding answers on the internet if the error messages were given in a uniform language, ie. English.
Give us our exception messages back! :)
Erling Paulsen
Friday, November 14, 2008 9:01:59 PM UTC
One of the greatest challenges people face from one day to the next is the challenge of communicating with each other. Why not use computers to unlock the potential of reaching out to others who communication in different languages. There is a lot of brain power and creativity that is yet to be tapped into.

S'all good. Translation: It's all good.
Chad596
Friday, November 14, 2008 9:42:58 PM UTC
I would disagree that you need to know english to program.

However, I would agree that you need to know english in order to work on any application I am currently on. That's because we try to use the Ubiquitous Language and Domain Driven Development. If you didn't know english, you'd have a hell of a time with most everything in our code. Then again, maybe it would be a really good way to learn.

Still, there'd be a lot of fighting over what to rename something a non-native english speaker would name some of their classes/methods in our app :D
Saturday, November 15, 2008 1:55:44 AM UTC
The english syntax that has been used in programming languages for the last 50 years.

Discussing (blogs, references, articles) in more languages makes it harder to share information (especially with technical words).

Therefore, I would say that you "SHOULD" (a very big "SHOULD") know at least some basic english, in order to be a programmer.

Of course, having references in both english and your own native language helps a lot :-)
Saturday, November 15, 2008 4:15:28 AM UTC
As a Brazilian developer I would be delighted to read this post. However, I refuse to read technical documents or use software translated to Portuguese. Some reasons:

a) After so many years I got used to the English terms. Portuguese words in this topic sound weird to my ears, like "depurar" instead of "debugging". Yuck! It is more common among Brazilian developers to turn those terms into Portuguese-ish words, like "debugar" (to debug as if it was Portuguese verb);

2) A lot of book don't have Portuguese version. And for those that have, translation quality vary a lot;

3) There is no consistency in translations. The translation for "build" may vary between softwares and documents.

4) 10 years ago It was common to say that the patches come first for English version. Is it true nowadays?

5) Global economy. What if you need to send screen shots or make a webcast for non Portuguese speakers?

Nevertheless, this effort of providing localized information is honorable. Congratulations to the team

Daniel Melo
Saturday, November 15, 2008 4:25:46 AM UTC
Scott, can developers use CLIP with their own applications? We localize our apps using Launchpad (and get lots of free translation by strangers). These apps are used to document minority languages, often by the native speakers themselves. But often times the foreign advisers might be weak in the language chosen for the UI, and they would love to hover over items and see them in English (or French, or whatever). So I've always wanted to make us of CLIP, but never found any information on how a developer can use it.
Saturday, November 15, 2008 7:56:07 PM UTC
Great Start, for reaching most people. Thanks for your work.
Sunday, November 16, 2008 1:23:25 PM UTC
A minor correction, I think it's "Fryźlewicz", not "Fryžlewicz" (there's no "ž" in Polish alphabet).
MichalT
Monday, November 17, 2008 11:43:52 PM UTC
Here's another example of a successful crowdsourcing localization community project: Team System Web Access Translations

The Product Group enabled a group of MVPs and other Influencers to localize TSWA into more languages than what's available out of the box. It was a great team effort between the localization group and the PG.
Wednesday, November 19, 2008 4:20:51 PM UTC
To say that if you don't know English, you're not a programmer is a perfect example of ethnocentrism in this country. I think that being productive in your on language is awesome. Whoever wrote the comment, please learn another language, culture, expand your mind.
John Peek
Wednesday, November 19, 2008 8:21:28 PM UTC
@John

Not trying to sound like 'that guy'...but the author of the statment is not a native english speaker.

This discussion is very interesting. It would *seem* (totally non-scientific sampling) that the non-english speakers (as a first language anyway) tend to agree with the statement "If you don't know English, you're not a programmer" more than native english speakers.



Ryan
Monday, November 24, 2008 12:05:36 PM UTC
I think what complicates things is when the developer has to switch from one computer language to its own. Writing linq statements and commented in it's own language so the next developer (with no English knowledge) will actually get the idea of what's happening there.
There is an opportunity to have DSL languages for translation of code maybe?

Knowing a different languages pays, be that English, French, Spanish or Chinese. If you are over 60 it even helps your brain to remain healthy.

Maybe the statement was a rebuff?

Thanks for the info on CLIPS.
Adrian Stoica
Wednesday, November 26, 2008 3:59:56 PM UTC
I also very much agree with Erling. Furthermore, what makes it all that much worse is that when users (or a program with logging/reporting capabilities) reports exceptions back to me from my own program, the exceptions appear in whatever language their system is localized to!

We're talking exceptions here! Presumably the point of these is to convey information back to the programmer so steps can be taken to correct the problem. These aren't user-directed generic errors like "invalid password", they are things like "value passed to argument in method xxx cannot be negative at yyy in zzz". Please explain to me the logic in localizing these to the user's language! Are they going to fix my bug for me?

There doesn't even appear to be any way to get an English version of an exception manually. Localizing exceptions was a silly idea and a poor design decision, imo.
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.