A caveat: equating gesture with hand signs sets up a false dichotomy that is potentially very misleading. Formal signed languages crucially utilize orofacial and body movements, and Taglialatela discusses orofacial gestures through the paper.

The important distinction is not between hands and voice, but between visual or audio productions organized linguistically, versus those of either type that are not so organized.

ASL for example uses the lips and tongue as articulators, both syntactically and lexically. Obviously, the same lips and tongue also produce random visual images and random noises, as well as linguistically organized sounds.

The main argument for the first language having been a sign language comes from the fact that chimpanzees make voluntary gestures but do not make voluntary vocalizations.

Is that your own assessment, or if not, whose is it? My ranking of arguments would look more like this:
(1) The massively greater iconicity in existing signed languages (not merely for concrete nouns and verbs, but for things like transitivity, verbal aspect, mental verbs, and about a bazillion others) shows that the manuofacial-visual channel is well suited for a gradual transition from simple and very iconic constructions, through various intermediate stages, through to more complex and arbitrary-seeming symbolic constructions. (Contemporary signed languages are of course massively more complex than a visual protolanguage could have been, a point which is in no way denied by pointing out that they are far more iconic than spoken languages. Languages in both channels exploit iconicity when they can, but it's available for a much broader range of meanings in the visual channel.)

(2) The very fact of native, unschooled, visual language acquisiton -- following an even earlier timecourse than auditory languages -- strongly suggests that at some point the underlying perceptual and cognitive mechanisms involved were relevant to the whole species, not just the deaf ones.

(3) The famous, substantial deficiencies of the human laryngeal tract are more understandable if there was already a really great cognitive system in place that precise speech sounds could tap into, versus having the gradually increasing efficiency of spoken language have to overcome increasing selective obstacles at each step of the way. (If rich symbolic manual-visual language were already in place, I take it that the evolutionary advantages of switching to speech in the ancestral environment are too obvious to mention.)

(4) Research about the effects of motion or visualization tasks upon choice of spoken langauge constructions.

(5) The supposed involuntariness of ape vocalizations.

I suppose one's ranking has almost everything to do with what what considers to be the most challenging stage in the development of language. For me, it's mostly about complex, schematic constructions, many of which may be used to construe the same objective scene in very different ways. That's where I think the iconicity of the visual channel offers a crucial bootstrapping. Voluntary calls could have existed in the hominid line for a long time, but so what? The gestural origin story would be that the calls would have remained mostly atomic and specific until the visual channel led to proto-syntax.

How does one determine whether the behaviors or sets of behavior of an animal, including human beings, are voluntary or involuntary? Or to put it another way, under what conditions do we say that a certain behavior is involuntary and under what conditions do we say that a certain behavior is voluntary? Also is it possible that some behaviors are voluntary and involuntary at the same time? If so, then under what conditions does this happen? I would prefer that these questions be discussed in a scientific, empirical framework, rather than a philosophical or metaphysical framework, unless those frameworks are empirically grounded.

To get things started off it might be worth listing some definitions of voluntary and involuntary. These are from the Merriam-Webster Dictionary online:

Definition of VOLUNTARY
1: proceeding from the will or from one's own choice or consent
2: unconstrained by interference : self-determining
3: done by design or intention : intentional
4: of, relating to, subject to, or regulated by the will
5: having power of free choice
6: provided or supported by voluntary action
7: acting or done of one's own free will without valuable consideration or legal obligation
Definition of INVOLUNTARY
1: done contrary to or without choice
2: compulsory
3: not subject to control of the will : reflex

The third definition for involuntary seems appropriate for a scientific definition of involuntary behavior. Psychologists have studied reflex behavior in a systematic way since Pavlov. However, none of the definitions for voluntary, however, seem to be good candidates as a scientific definition. They all seem to carry very subjective connotations. Certainly “free will” or “free choice” doesn’t seem to be particularly helpful. Are they even scientific concepts? I know they are used a lot in reference to human beings, but are they applicable in discussing the behavior of other animals?

I also checked some recent books on animal behavior and don’t even find the terms voluntary and involuntary, or even volitional, listed in their indices. The Dictionary of Ethology and Animal Learning did have an entry for voluntary behavior, but its discussion was about the differences in how behavioral scientists use the term.

So when we say that the action of a human being or other animal is voluntary what do we mean? It seems to me that there is little point of arguing about the origin of language and why other animals seem to lack language as we usually think of it in terms of voluntary and involuntary behavior unless we agree to using these terms in the same scientifically sound way. I have my own view on what is called voluntary behavior, but I doubt that others on this list would necessarily agree with it.

