Yesterday's post looked at theoretical, empirical, and pragmatic reasons for thinking of speech as an expansion of the powers of joint attention rather than as a system for communicating ideas. It was beginning to run a bit long, so I'm continuing with other pragmatic reasons on this post.
Speech Evolution
Part of the mystery of the origins of speech has been the absence of any likely pathway. We could imagine a series of steps: speaking words, speaking phrases, speaking sentences, but at no point could we imagine a reason for making any of these steps. Joint attention provides a reason.
Joint attention depends on three elements, a speaker, a listener, and a topic. If we think of speech as a means of communicating about a topic, the problem seems to be how to represent the topic so that the listener understands what the speaker says. If we think about it as a system of joint attention, however, the first step is something else. We have to get listeners who care what is on a speaker’s mind and speakers who care to express their thoughts. In other words, joint attention has to go from being a minor behavior in the ape world, to something important in the Homo line. The first steps is not intellectual, but emotional, creating some kind of bonds between potential speakers and listeners so that they will listen when spoken to and speak when they have something to say.
Only after that bond exists can the attention triplet expand to include a topic. The first task is to develop words and phrases that will direct attention to the salient element of a scene. The second is to develop a technique for combining both figure and ground into a sentence so that a listener can attend to a topic’s full gestalt.
Those are the steps to language that naturally flow from considering joint attention as a triplet: (a) existence of joint attention, (b) emotional interest in what each other attends to, (c) directing attention to topic (figure), and (d) directing attention to figure and ground.
Step a is found in apes today, and presumably existed at the time of the last common ancestor of chimpanzees, bonobos, and humans. Perhaps it expanded under Australopithecus. Steps b through d can be neatly, and perhaps a little arbitrarily, tied to three stages of Homo evolution: (b) Homo habilis expands emotional relations, (c) Homo erectus draws attention to topics, (c) Homo heidelbergensis draws attention to figure and ground.
Thus, we have a set of hypotheses about what happened and can start looking for supporting or contradictory evidence.
Syntax
As long as language was considered to be a system for the communication of ideas, syntax appeared to be strangely arbitrary. We’ve probably learned more about the abstract properties of syntax in the past 50 years than western grammarians had learned in the previous 2000 years, but this knowledge deepens the puzzle. Why does it work this way? In terms of speech origins, the question is not just how did we evolve syntax, but why did we evolve the syntax we have? No obvious answer jumps out because there is no logical reason to communicate ideas in this way. The syntax of logical languages, for example programming languages, is far less ambiguous and more utilitarian.
These confusions evaporate when we think of speech as a means of joint attention. Sentences can usually be spoken in a variety of ways but whatever form is used, it must draw a listener’s attention to a figure-ground relationship. I happen to be reading some Elmore Leonard at the moment, so I’ll use the book’s opening sentence to show what I mean:
Carlos Webster was fifteen the day he witnessed the robbery and killing at Deering’s drugstore.
As a minimum this sentence needs to say, “Carlos Webster witnessed a robbery and killing.” Carlos is the figure, the robbery and killing form the ground, and witnessed links the two. Leonard has highlighted figure by giving us Carlos’s age and then fleshed out the robbery and killing too by locating it in Deering’s drugstore.
A sentence like this one suggests a variety of hypotheses that can either make sense of syntax or abolish the joint-attention hypothesis and send us all back to the drawing board.
- Sentences consist of minimal units where attention is focused, identifying a figure and a ground. Breaking up these units leads to confusion; e.g Carlos the witness of a robbery Webster, oh yeah, and of a killing too, was fifteen that day. I can imagine a certain kind of literary writer offering a sentence of this type, but the result will be to force readers to go back and figure out what they are being told. We can call units that identify figure and ground an attentional pair.
- Sentences use another minimal unit (always a verb or verb phrase?) to link the figure and ground. Omitting this unit costs a sentence its coherence; e.g., Carlo Webster was fifteen the day of the robbery and killing at Deering’s drugstore. Again, it is easy to imagine a certain type of literary writer putting out this sentence, leaving it to the reader to assume that eventually some sort of linkage will be provided, but the absence of a linking unit takes the robbery and killing out of the background and makes it a second figure. The sentence works like protolanguage: focus on one figure, Carlos Webster, then on a second figure, the robbery and killing. We can call the unit that links an attentional pair an attentional link.
- Sentences can elaborate on the minimal units, but anything else is a distraction. Carlos Webster was fifteen and Jake McGraff was twenty six on the day he (Carlos) witnessed the robbery and killing at Deering’s drugstore. The business about Jake is, in some sense, syntactically correct, but it merely confuses the sentence, as demonstrated by the resulting confusion over the pronoun. A grammarian might allow this sentence, but no editor would.
- Elaborations on part of an attentional pair highlight the unit, while elaborations on the linking unit highlight the whole sentence. In the sample sentence was fifteen elaborates on the figure and at Deering’s drugstore elaborates on the ground, but the day elaborates on witnessed and frames the whole sentence. The coherence that witnessed gives the sentence by linking ground and figure is transferred to the day, making it plain that the day was a pivotal one in the life of Carlos Webster.
- Syntactical evolution concerned the introduction of attentional units and linking units. Infants today are liable to begin speech by using either unit. Some children concentrate on figure units, saying single nouns like book, car, and dada. Other children favor linking units, speaking words that in infant-speech serve as verbs up (pick me up), go (something goes), and sit (somebody sat down). At this stage they are not linking figure and ground, but concentrating on action rather than things. Chimpanzees can use both kinds of words too, saying banana or tickle (tickle me), so there is no reason to assume that Homo erectus could not use either type. When children begin speaking two-word sentences they typically use a figure unit and a linking unit although the ground remains unspoken; e.g., dada up (daddy pick me up). This kind of proto-gestalt appears to be absent from chimpanzee signs and may have taken a long time to appear in the evolution of speech.
If any of these hypotheses stand up to the rigors of investigation, we will indeed be making good ground.



You are equating syntactic structure with information structure here, but the two must be kept separate, however they may seem to line up in certain languages or constructions.
There is a lot about syntax that has nothing to do with structuring information; the structure of information in a sentence may be mapped onto certain syntactic constructions, but it doesn't require those constructions.
--------------------------------------
BLOGGER: What people choose to study is a question of taste and my tastes have never run to pure abstractions (e.g., mathematics for mathematics sake). The aspect of syntax that interests me is the way it solves a problem, rather like the Mercator projection solves a problem. You could study Mercator maps closely and find many things, but the critical thing for me is what the projection maps and what it distorts. You could also look for why a Mercator projection is used over some other solution, and find that despite the distortions there are good reasons to sail the seas with such a map. The aspect of syntax that interests me is the way it solves the problem of how to express a perception in linear form. The great thing about a blog, however, is the way other people with other interests can have their say as well.
Posted by: TLTB | April 04, 2007 at 01:38 PM