Last Tuesday (Oct. 11, the feast of St. Jerome) was International Translation Day. By chance on that day somebody gave me a link to an item in the Paris newspaper Le Figraro and, when clicked, the link jumped me to a Google translation of the article. As usual, every sentence of the translation seemed ok, if a little off, but the whole effect was disastrously bad. I've been trying to grasp what it is that the translation misses and, in keeping with the theme of this blog, where such a thing might have come from.
It is tempting to say the translation missed the article's spirit, but I'm trying to get a more concrete understanding of what went wrong. What is the spirit that turns a collection of sentences into a unit?
The article translated was a review of the new Tintin movie. (Opening in France this month, and in December in the USA.) Part of the problem was the translation machine's insensitivity to idioms. The article's title is given as "We saw Tintin and it does not disappoint," which is correct (although perhaps have seen is more accurate than saw) but carries with it the taste of French idioms. If I were translating, I would make it, "We've seen Tintin and were not disappointed." Another version might be, "We saw Tintin and it did not disappoint us." The trick is that the French has a change in tense and a change in subject. A French writer can get away with these switches because the sentence uses a cliché (does not disappoint) of Frenchcritical opinion. A translation that includes this cliché-based freedom might go, "We've seen Tintin and it holds up." There we've got the cliché, the tense shift, and the change in subject, but it sounds like English.
Every language has its clichés and perhaps like idioms they need to be translated whole rather than by parsing it, but a phrase's context becomes even more important when translating. The cliché in I had that meeting with my grouchy aunt and it did not disappoint cannot be translated as it held up.
There is an amusing and incomprehensible example of what you get when you translate an idiom's parts, piece by piece. What do you make of this passage: Verdict hot most anticipated film of the year? Idioms are, by definition, phrases that cannot be understood literally, but Google has translated an idiom (Verdict a chaud) literally when it means something like immediate reaction or first impression. It's a rare phrase (Google offers only 14 instances) and is typically used without a verb, making translation even harder. To make matters still more obscure the translator also skipped a preposition, du. The French translates as Here's a first impression of the year's most awaited film. (Since Tintin is a hero of French comics, the coming movie about him has caused more excitement in France than elsewhere.) The passage is a bit of pawing the earth. The reviewer should just get on with it, but bad writing still deserves s a clear translation.
Translating that simple little bit requires the translator has to look beyond the French-English dictionary if he is to make a decent go of rendering a French sentence into English. But what might that beyond be? If a language can be defined by the rules that produce its sentences, then why should we ever need to check beyond the sentences themselves to get the best translation?
Regulars on this blog may feel a couple of familiar themes lurking just off stage. One is the speech triangle, specifically the corner with the topic. The translated essay is a review of a movie, moreover a movie about a beloved comic character familiar to every person who grew up in France, moreover a movie produced by a much admired filmmaker, Stephen Spielberg. The more knowledge the translator can bring about Tintin and Tintin's place in French culture to the project, the better the translation is apt to be. This issue is specific to this bit of writing, but every translation task includes a topic and the translator. So every sentence generator needs syntactic rules, a dictionary, and a reference work on the topic. Rules plus dictionary won't suffice. And topics have, by definition, been part of language from the beginning. Even if we imagine a group of Homo erectus exchanging no more than phrases we would have to know what they know about their topic to understand what they said. If they are looking for a good stone to work, with they might well be assuming a trove of knowledge about stone tools that most of us do not have. Thus, even with a protolanguage-English dictionary we might have a hard time getting exactly what they are saying.
A second recurring theme on this blog is how language directs attention. That is to say, language constantly turns the reader/listener's attention outside the sentences. A translation like "verdict hot" may be literally correct, but we know immediately that something is wrong because it provides no point, abstract or real, to direct our attention. Computers, of course, have no attention to direct and therefore need some work-around solution to discover when a translation is simply hopeless.
Attention directing too has been part of language from the beginning. Hopping into our time machine and watching that group of erectus discussing tools, we would do well to direct our attention to the stones. Closing your eyes and focusing just on the words spoken will not do.
Another element in the text but not in either the syntax or the dictionary is the author's excitement about a Tintin movie by Spielberg. If the author was speaking, we could hear that excitement in his voice. That is what we would have gone by when eavesdropping on our erectus. Do you think they were excited or matter of fact when they examined stones? In written language tone of voice has to be replaced by something else.
In the Tintin review the tone is established in the title and persists through the whole review. The author wants the movie to be fine but feared it would be bad; he is relieved to discover that the movie is pretty good. The Google translator misses the whole of this tension, naturally. This deafness to tone leads to some funny translations. One section is bafflingly headed, "Snowy to life also," when a glance at the French makes it clear that a better translation is, "Snowy too comes to life." Snowy is Tintin's dog, called Milou in French. It is typical that the machine was able to translate the dog's name correctly, but unable to give us a comprehensible sentence expressing the reviewer's enthusiasm for the movie's accomplishment.
Particularly striking is the machine's problem with "come to life." In the review's body, the machine gives us, "As for Snowy, he takes life too …" Takes life? Doesn't that mean kill? So there is another difficulty, metaphor. Metaphors are never literal. The French might say a fictitious character "takes life" but in English fictitious characters "come to life." Sadly, a dictionary cannot just say translate take as come when followed by life. People can take to life as well: e.g., Luke took to life in the two-street town surprisingly well. To translate the metaphor accurately the Google translator has to know what it is talking about, but of course it never does.
Meaning is the great mystery of language. This blog has a running hypothesis that it works by directing attention toward some detail of a topic. The problem is, we don't understand how attention works, so we cannot make a machine that can duplicate the function. The Google translator, like all translation machines, has been built to work around the fact that it doesn't know what the text means. So it uses a dictionary, a set of syntactical rules, and a statistical analysis of a large corpus of translated documents.
When language began the human lineage could already pay attention, focus on simple topics, and express an attitude in tone of voice. So from the beginning Google translator likely would have been missing the subtleties of the original speech.