If you have ever had to localize your website or web app, you know that it can be time consuming, difficult and expensive. This week we (www.worldwidelexicon.org) are rolling out a new service that makes this as simple as a server side include. We have offered a platform for machine and community translation for several years, and have upgraded this to automatically call out to professional translation bureaus such as SpeakLike, ProZ.com and others.
This service is accessed via a simple web API which lives at www.worldwidelexicon.org/t
You call it with a short list of parameters, including:
- (sl) The source language (ISO code, required)
- (tl) The language to translate to (ISO code, required)
- (st) The text to translate (required)
- (domain) The site's domain (e.g. www.foo.com, optional, to limit scope of search to domain)
- (url) The parent URL (optional, but recommended for search visibility)
- (allow_machine) Allow machine translations (y/n, default = y)
- (allow_anonymous) Include anonymous user translations (y/n, default = y)
- (require_professional) Require professional translations (y/n, default = n)
- (minimum_score) Minimum quality score for translations (0..5, default = 0/any)
- (lsp) The name of the professional translation service to use (optional, currently supports speaklike, more coming soon)
- (lspusername) The username for your professional translation account
- (lsppw) The password or API key for your professional translation account
- (mtengine) Machine translation engine to use (optional, default is automatic selection, options are: google, apertium, moses, and worldlingo which requires an API key)
- (mtusername) Machine translation service username, if you are using an enterprise grade translation server such as World Lingo (Language Weaver)
- (mtpw) Machine translation service password
Best Effort Translation : How It Works
WWL is a best effort system, so it works differently than free web translation and conventional translation management systems. When you call for a translation, it looks for matching texts in the following order:
- Cached translations from previous requests for the same translation (memcached)
- Professional translations (e.g. SpeakLike, ProZ, etc)
- Translations from trusted users
- Anonymous user translations
- Machine translations from best available service for a language pair
- If all else fails, it returns the original text
You decide, at the time you make the API call, which types of translations you want, whether you want to request professional translations, or to override default behaviors (for example by requiring professional translations for an important text). Because of this you have fine grained control over how you manage translations in your website or web app.
The API is simple to use, and if you cache translations locally via memcached or similar tools, the performance hit is not significant except the first time a page is served in a new language. The first time you request a translation, this can take some time because the server may be calling out to a variety of sources prior to returning a response. If this is an issue you can mask this by having a spider crawl the most visible parts of your site in the languages you're most concerned about.
Example #1 : Translating A Website
This API is a simple and cost-effective way to localize a website. If you have an active user community, you can invite your users to translate your site for you. On the other hand, you may not trust your users to do this consistently or may not have an active enough user community to support crowdsourced translation, in which case you can supply credentials for a professional translation service when requesting translations. In this scenario, WWL will return a machine translation as a placeholder while calling out to request a professional translation. When the professional translation is completed, it will be stored in the global translation memory and will take precedence over user and machine translations.
SpeakLike, for example, charges between 5 to 10 cents per word for most language pairs and provides good language coverage. Other services we will be adding to the network charge similar rates, making this a highly automated and cost effective solution for localizing a site, as well as translating dynamic content. Because you can request professional translations on a per query basis, you can control which languages you request paid translations for, as well as which parts of your site (you may be OK with users translating some pages, but not others).
We have developed a simple callback API that can be used to integrate WWL with other translation service providers, such as specialists that focus on a specific language or industry domain. This API consists of a simple HTTP call out to send a job to the translation bureau which, in turn, is followed by a callback into WWL to submit the completed translation. It took about a day to implement the connection into SpeakLike, so if you are a language service provider and would like to be included in our network, you can see the documentation at www.worldwidelexicon.org/api
Wrapper Libraries, Widgets and More
We are developing wrapper libraries and widgets to front-end this API, as well as to cache translations locally for better performance. If you are interested in contributing a library for your favorite programming language or content management system, we'd love to hear from you (bsmcconnell /at/ gmail). We are currently working on Python, C/C++ libraries, as well as an AJAX widget that can be embedded in a wide range of sites, and expect to begin public beta for these tools in the near future.
If you'd like to build a wrapper library, you can find the API documentation at www.worldwidelexicon.org/api A minimal implementation need only support the /t and /submit API calls, which are used to request and submit translations, although we recommend you also include support for comments, votes and account management if possible. Basically all you need to do is mirror what's in the web API to provide users with a simple set of functions they can call. In the case of the /t API call, you should do the following:
- Check to see if there is a locally cached translation so you're not constantly pinging the web translation memory. If you set a time to live of a few minutes, this will keep most frequently requested texts in cache, and will allow for new submissions to appear fairly quickly.
- Check to see if the text has been translated in gettext(), if so, this should take precedence over WWL.
- Call out to /t to request a translation with the parameters listed previously
- Cache the returned translation locally and return it
Following the steps above will give you good performance, but will also allow for dynamic translation with a delay of just a few minutes in most cases (of course, you can adjust cache behavior as desired). It's also not a bad idea to have a spider crawl your most popular pages in the top languages periodically to force translations into the cache, and to request translations for new texts. One caveat... if your website generates a lot of dynamic text that varies slightly (e.g. timestamps, etc), be careful to avoid requesting professional translations for these, as you'll be paying to translate every variation of "You have N widgets in your account."
In the upcoming months, we will be adding support for more language service providers and translation memories, while also releasing easy to use libraries for popular programming languages and publishing platforms. Look for our C/C++ library for high performance applications as well as a Drupal module in the near future. Hopefully we'll receive contributions from developers for other languages, and will be able to include those libraries in our toolset soon as well.