While the web has rendered time and distance moot, the language barrier is still standing. Although services like Babelfish and Google Translate have made a few dents in the wall, the web remains not a worldwide network, but dozens of smaller networks mostly isolated from each other. The language barrier can be overcome, through a "best effort" technique that combines inputs from professional translators, volunteers (user community), and machine generated translations. I recently wrote about this in my essay, The End of the Language Barrier.
How It Works
The administrator of the web server who wants to use the translation service defines subdomains representing pages the visitor wants translated. For instance, if the main domain is yoursite.com, the administrator could configure the server to offer nn.yoursite.com where nn is an ISO language code. Like our current Firefox browser-side translator, the translation server will be able to handle general requests and determine the target language, so the web server could simply offer translate.yoursite.com to all visitors who want translation.
The translation proxy is a C/C++ program that acts as a tunneling agent, and does the following tasks:
- Looks for HTTP requests for a domain (e.g. translate.yoursite.com or nn.yoursite.com where nn is an ISO language code)
- Tunnels through to www.yoursite.com to obtain web content for the requested URL
I've described how the translation proxy works in further detail in the footnotes (procedural code).
The service is configured using two methods: DNS subdomains, and DIV tags embedded within web pages. This makes it possible to run this either as a hosted SaaS application, or as a DIY self-hosted service.
With DNS, a website owner will map domains such as: translate.yoursite.com or fr.yoursite.com to the proxy server, while www.yoursite.com goes to the original site. The proxy server receives requests to fr.yoursite.com (French) and tunnels through to www.yoursite.com by default unless otherwise specified. There are many possible configurations like this, but this is a good example of a default setup.
You can also specify, on a per-page or per-text basis, settings like whether to allow machine translations, whether to allow anonymous translations, whether to allow the viewer to edit or score translations, minimum quality scores, etc. This allows a site to opt for a default, auto-pilot implementation with no HTML changes, or to exert a lot of control over how translations are displayed and edited simply by inserting DIV tags where desired.
While this approach does make some compromises (it will not work for secure web pages, for example), it eliminates the need to install translation software on either the browser or server, so this will work automatically for the majority of users on the majority of pages, especially for open websites where secure web connections are only used occasionally (for user authentication for example).
Ultimately the goal is to make best effort translation an ambient, and mostly invisible, service that is embedded in most web servers and browsers and is enabled by default. That will take a few years to sort itself out, but in the meanwhile we can use techniques like this to make almost any website translatable via both humans and machines. If you would like to beta test these tools when they are ready, send me an email at bsmcconnell /at/ gmail to sign up as a guinea pig (also fetch a copy of our Firefox Translator to get a sense of how inline translation works). The translation proxy, like our translation memory and Firefox Translator, will be published as open source (BSD license), so if you'd like to help test it, drop a line.
Procedural Steps Executed By Proxy Server
- Wait for incoming HTTP request
- Examine URL and domain
- If domain is in the form nn.www.yoursite.com, source site is www.yoursite.com where nn is an ISO language code.
- If domain is in the form translate.www.yoursite.com, source site is www.yoursite.com
- If domain is in the form www.yoursite.com, source site is defined in config file
- If URL is in the form, www.proxyserver.com?u=http://www.foo.com, proxy to the URL cited in the 'u' parameter'
- Tunnel HTTP request to source website, relay response to client browser
- Examine Accept-Language header in HTTP request, and look for a domain specific cookie tl=iso_language_code, to detect user language preference
- If user language != web page language and content-type == text/html, replace in HTTP response with
- Serve rest of response unchanged