An orangutan gives the thumbs up sign. (Story here.)
In his book Origins of Human Communication Michael Tomasello argues that language began as gesture alone and at some point vocalization joined in and eventually took over as the main linguistic “modality” although gesture has remained and important. (This blog has made four posts about Tomasello’s book [#1, #2, #3, #4]). The evidence offered for this scenario is threefold:
First, many ape gestures are individually learned and flexibly used, including in combination, whereas this is not true of ape vocalizations.
Second, many ape gestures are used with attention to the attentional state of the recipient, which is mostly not even relevant in ape vocal communication. …
It is also important evolutionarily that gestural communication is more sophisticated in apes (humans’ closest relatives) than in monkeys and other mammals, whereas something close to the opposite is true of vocal communication. [p. 35]
A paper in the September 11 issue of Cerebral Cortex, however, suggests that Tomasello may have been underestimating the nature of ape vocalizations.
The paper, “Visualizing Vocal Perception in the Chimpanzee Brain” (abstract here), by Jared P. Taglialatela and others at the Yerkes National Primate Center presents brain-imaging evidence that suggests chimpanzee vocalizations are more complex than most researchers have supposed. They do not all serve the same general function nor are they all processed in the same region of a chimpanzee’s brain.
The paper also cites earlier work denying that there is any flexibility in ape vocalization. For example a 2004 paper (abstract here) by Catherine Crockford and others at the Max Planck Institute for Evolutionary Anthropology (where Tomasello works!) reporting that chimpanzees in different groups in the wild can vary the sounds of their calls in accordance with group membership:
These results could not be accounted for by genetic or habitat differences, suggesting that the male chimpanzees may be actively modifying the structure of their calls to facilitate group identification.
This finding also reminds us of the limitations of our own perceptions. When we see two groups of apes facing off in the forest and both sides are howling at each other, it sounds to human ears like loud noise. But for the apes the cries may be more informative, creating an immediate impression of which group is more energetically represented in the confrontation.
The central point here is that Tomasello, in his first argument, may have overstated the inflexibility of ape vocalizations and the complete absence of learning supporting ape vocalizations.
The second point, about attention, is more important in establishing that apes are aware of attention in others. Ape gestures do not use joint attention, and therefore are no more like human speech than are ape vocalizations.
It is the third point, about the backwardness of ape vocalization, that is most directly challenged by Taglialatela’s team. Their investigation of ape brain processing while hearing vocalizations suggests ape vocalization is more complex than monkey vocalization. They distinguish 'broadcast’ and ‘proximal' vocalizations. Broadcast vocalizations are high-pitched calls that, if directed to anyone, appear to be directed to distant apes. Proximal vocalizations are lower-pitched and appear to be directed toward a nearby chimpanzee.
I e-mailed Taglialatela and requested clarification. He writes that broadcast calls are distance calls, what Jane Goodall calls pant hoots. Proximal calls are softer, more locally directed cries. (Taglialatela also generously provided this blog with recordings of different vocalizations. You can download the broadcast cry (BCV) or the proximal call (PRV) to compare them.)
Taglialatela’s team recorded broadcast and proximal vocalizations. For control purposes they also made a reverse playback recording of ape vocalizations. They fed the apes a marker that travels in the blood, played one of the test sounds, and then made a PET scan of the ape’s brain to determine which portions had been stimulated by hearing the different sound types. They then made a two-step comparison.
First they identified the brain regions for each type of stimulus and determined which regions were more active when hearing a normal vocalization than when hearing a backward version of the call.
Second they compared the more active regions of broadcast vocalizations with the more active regions of proximal vocalizations.
The authors found that hearing proximal vocalizations produces more right-brain activity than left in the chimpanzees, but no such regional preference occurs when they hear broadcast vocalizations. Previous work with primates, notably macaque monkeys, has not found anyregional preferences in the brain (known technically as lateralization) and the authors conclude:
These results suggest that there may be marked differences in the way in which chimpanzees and macaque monkeys perceive and process conspecific vocalizations.
Normally, no one would be astonished to learn that ape brains appear to handle sounds in a way that is somewhat between the way human and monkey brains function, but Tomasello suggests that, regarding vocalization, there has been some devolution between monkey and ape. Presumably, Tomasello’s argument rests on the fact that some monkeys make very specific calls, signaling, the approach of a leopard or, with a different cry, a hawk. Folk linguistics can say these calls are monkey talk, but they don’t serve as words in the technical sense used on this blog (i.e., they do not pilot attention). As Tomasello reports:
vervet monkeys quite often persist in giving their alarm calls even when all the individuals of the group are already in some safe position looking at the predator. [20-21]
It is notable that the regional specialization in the apes is on the right side, whereas a popular idea has proposed that speech is a left-brain activity. I think I have even seen TV commercials that play around with this popular distinction. The authors point out, however, that both sides of the brain are important to handling speech. The two regions are sensitive to different time scales. Thus, the words and phonemes (which come quickly) are processed on one side (the left) while the emotional rhythm is processed on the other (right) side.
The paper’s authors doubt that this change in ape processing is a simple reflection of differences in a vocalization’s acoustical qualities. They note that the control vocalizations (cries played backward) do have the same acoustical qualities, only in reverse order, but they are processed differently. They conclude that “not all [chimpanzee] vocalizations are functionally equivalent, and [the research reported] point to a heretofore unrecognized level of complexity in the chimpanzee vocal repertoire.”
This finding seems to undercut a critical thesis in Tomasello’s new book, but as I have noted before there is a strange contradiction to the book, and this makes it somewhat immune to such damage. The most exciting part of the Tomasello’s account is in his opening chapter, as he describes the “infrastructure” that supports language. This brilliant piece stands unchallenged by Taglialatella’s research. Even the logical scenario Tomasello proposes (the bulk of the book) is undamaged by the Yerkes team because the logic could just as easily be applied to vocalization as gesture. It is a little hard to grasp why Tomasello is insistent on his gesture-only theory of beginnings, as nothing essential seems lost if we say that from its beginning straight on to today, the speech triangle has used both vocal and gestural elements.