TECH STEW

Google Speaks in Tongues

Print Version
E-Mail Article
Reprints

It takes lots of patience and millions of examples to teach a computer a new language. It's actually easier to program such tasks as landing on the moon than to help a computer figure out the meanings of words and how they translate into other languages, said Peter Norvig, Google's director of research.


Rackspace now offers green hosting solutions at the same cost without sacrificing performance. We make it easy for our customers to choose a green configuration or customize one that works for your business needs. Make the eco-friendly choice.

Teaching a computer to understand languages isn't rocket science -- it's not nearly that easy, said Peter Norvig, director of research at Google (Nasdaq: GOOG) Latest News about Google.

It takes a limited number of calculations to send a spacecraft Latest News about spacecraft to the moon, Mars or other planets. And while the calculations aren't so simple, they are fairly easily managed by a computer, he said.

But learning what words mean, how they fit together and how they translate into other languages is much more challenging, he said.

Rules and Exceptions

"In physics, we've been able to use computers very well for a long time. We can get our spacecraft to the moon or Mars very accurately," Norvig said. "But part of the problem with language is there's lots and lots of rules, and there are lots and lots of exceptions to those rules."

Rather than using grammar, about two years ago Google started to take a different approach to teach a computer how to understand languages, which is more like the way humans learn them, he said.

Every Word Counts

What the strategy comes down to is programming the computer to learn through examples. By exposing it to an abundance of texts in a specific language, it can learn to pick out patterns, Norvig said.

And if you teach it to compare two different languages side by side, it can figure out which words or characters generally correspond to one another.

"Most of the answer to how you do this is counting -- it's just the fancy phrase for counting is 'probability theory,'" Norvig said.

What Google's language tools do, for example, is let you do a word or phrase search in English. Then it will find results for that search among Web sites written in Spanish. And it will translate them so the English-language user can sort through those links in English.

Building a Collection

So far, it works with about 15 languages, but the hope is to add more soon, he said.

The tools also let you translate Web pages and text, among other things.

The key to building the language tools program was to feed it lots and lots of texts, gathering them from groups that already have documents translated into several languages, such as international news sites and United Nations archives, Norvig said.

"Then we build a model that says, 'Here's all these translations, and we know this page is a translation of that page, but we don't know exactly which corresponds to which,'" Norvig said. "What we have, though, is probabilities. Like the first sentence in English is similar to the first sentence in Chinese, but it could be the first two sentences, the first three, or it could be one to one."

After one example, the computer is still confused. But after a million examples, it starts to make associations that make sense, he said.

For instance, a Chinese character may come up often in relation to the English word "dog" or "terrier." And from that the computer learns to make a connection, he said.

"We've been able to do this, and our translation software is usually right at the top of a search," Norvig said. "And we've even been able to do this in some languages where nobody on the team speaks the language."

Not Perfect

The resulting translations aren't perfect, but they do get the general point across, he added.

"They come out understandable, but you don't go more than three or four sentences before you realize this was not written by a native speaker," Norvig said.

Still, the more examples it gets, the better it translates, although Norvig said he suspects there's a ceiling to how well it will work. But that ceiling is still pretty far away, he said.

Google is also working on a similar method to sort through images.

Vision of the Future

Right now, image search programs generally just look at words around images.

But the Google program will collect some features of images, like horizontal or vertical lines that might be similar in a million pictures that come up when somebody searches for an image of dolphins, Norvig said.

"It collects a range, then looks at which pictures are nearest to the center of that range," Norvig said. "And in a search we try to bring up the center of that range first."

The image programming is still a work in progress. That part is even harder than language, Norvig said.

"The vision stuff is not quite there yet," he said, adding with more examples, perhaps accurate image searches won't be too far behind his company's language translation program.

© 2008 The Santa Fe New Mexican. All rights reserved.
© 2008 ECT News Network. All rights reserved.

Social Networking Toolbox:

Print Version E-Mail Article Reprints   RSS

Related Resources

Don't miss a story -- sign up for our FREE e-mail newsletters and view the latest headlines at a glance.
Tech News Flash [ View Sample ]
E-Commerce Minute [ View Sample ]
ECT News Network Weekly Newsletter [ View Sample ]