Wednesday 09/17

VOICE: NaturallySpeaking

by Tyler Coburn, tagged with VOICE, R&D Season, Tyler Coburn
Cover Image:

Tyler Coburn, NaturallySpeaking, 2013–14 (still). Single-channel video with voiceover by Susan Bennett, the original voice of Siri; depicts Pantagruel’s ship thawing; 25 min

“The more you speak COMMA the better it learns to listen PERIOD”

The following VOICE contribution, NaturallySpeaking (2013–14) by artist and writer Tyler Coburn, takes the form of an experimental essay and was first commissioned and published in You Are Here: Art After the Internet (Cornerhouse Books, 2014). Coburn’s work often makes visible contemporary omniscient technologies and their effects on labor, communication, and language. In this piece, he interpolates the training script of a popular speech recognition software, with various accounts of how technology has shaped human speech over time. At issue are the means by which certain forms of speech become dominant in different historical periods, overshadowing nonlinguistic, unintelligible, and affective communication. By denaturalizing these speech forms, Coburn reveals how notions of progress and efficiency inform their operations.

The text begins with the basics of how speech recognition software “learns” speech—via a feedback system based in algorithms and dependent on evenness of pacing and tone, etc.—alongside that of childhood speech development, from a baby’s nonlinguistic utterances to babble to mimicry. Coburn continues by interrogating what constitutes the difference between noise and speech (and what is lost in the filter), as well as between seeing and hearing (the hierarchy of the senses), while questioning how users assume increasingly oral relationships to technology.

As subjectification and communication are increasingly influenced by how we are rendered communicable to machines, NaturallySpeaking resuscitates various proverbial ghosts in the machine to examine this dynamic. One reference is to Thomas Edison, “the man who made a prisoner of echo,” who was memorably fictionalized in Auguste Villiers de L’Isle-Adam’s nineteenth-century novel L’Ève future [Tomorrow’s Eve] (1886). The author plotted Edison’s attempts to make his phonograph a device through which every sound in the history of the world might be heard again. Another reference is to a scene in Gargantua and Pantagruel, a sixteenth-century satirical novel by Francois Rabelais. As Pantagruel’s ship moves through a wintry climate, its crew encounters a landscape of frozen words and sounds. The air gradually warms, and the sounds of a battle from the previous year thaw to clamorous effect.

NaturallySpeaking was presented earlier in 2014 as an installation for “La Voix Humaine” at the Kunstverein Munich, where it consisted of a midcentury daybed and two computer monitors: one playing Coburn’s manipulated software training script and the other a screensaver of a melting ice ship. Following that, it screened as a single-channel video with voiceover by Susan Bennett, the original voice of Apple’s Siri, in “Art Post-Internet” at the Ullens Center for Contemporary Art, Beijing.

We would like you to read aloud for a few minutes

while the computer listens to you and learns how you speak PERIOD

When you have finished reading COMMA we’ll make some adjustments COMMA

and then you will be able to talk to your computer and see the words appear on your screen PERIOD

In the meantime COMMA we would like to explain why talking to a computer is not the same as talking to a person

and then give you a few tips about how to speak when dictating PERIOD


Understanding spoken language is something that people often take for granted PERIOD

Most of us develop the ability to recognize speech when we’re very young PERIOD

As infants COMMA we are experts at babble COMMA making noises unknown to any language PERIOD It is said that even a polyglot couldn’t approximate the articulations of a baby EXCLAMATION MARK

When learning to speak COMMA children continue to prattle and mimic the noises around them PERIOD The clang of a trolley and the buzz of a bee seem no different than the teaching voice PERIOD In becoming experts at intelligible speech COMMA however COMMA they gradually lose their capacity to babble PERIOD

This scenario demonstrates that there are two voices COMMA not one PERIOD There is the voice of logos and the voice of alterity COLON the acoustic mirror that initiates self-recognition COMMA and the medium that penetrates COMMA exposes COMMA and binds us together PERIOD The ears COMMA after all COMMA have no lids PERIOD


The first challenge in speech recognition is to identify what is speech and what is just noise PERIOD People can filter out noise fairly easily COMMA which lets us talk to each other almost anywhere PERIOD We have conversations in busy train stations COMMA across the dance floor COMMA and in crowded restaurants PERIOD It would be very dull if we had to sit in a quiet room every time we wanted to talk to one other EXCLAMATION MARK Unfortunately COMMA a quiet room is the optimal setting to talk to your computer PERIOD It is also the optimal setting to talk to many other machines PERIOD

In recording singers for his phonograph COMMA for example COMMA Thomas Edison tried to suppress the squeaking of flute keys COMMA the thumping of piano felts COMMA the turning of pages COMMA and especially breathing PERIOD Other sonic imperfections once attributed to the recording apparatus were actually caused COMMA the inventor argued COMMA by the human voice PERIOD It should come as no surprise that Edison was called OPEN QUOTE the man who made a prisoner of echo PERIOD CLOSE QUOTE

The phonograph could not heal the deficiencies of the body COMMA but it did improve them in significant ways PERIOD The nearly deaf Edison claimed to hear many things through his machine COMMA though lamented its inability to sound the depths of history PERIOD OPEN QUOTE Dead voices COMMA lost sounds COMMA forgotten noises COMMA vibrations lockstepping into the abyss COMMA CLOSE QUOTE he wrote COMMA were OPEN QUOTE too distant ever to be recaptured PERIOD CLOSE QUOTE

We don’t expect everyone to share his pessimism PERIOD In fact COMMA you may favor Petron’s belief that we live in one of many worlds COMMA which intersect in an equilateral triangle PERIOD The touching point is called OPEN QUOTE the dwelling of truth COMMA CLOSE QUOTE filled with words COMMA ideas COMMA copies COMMA and images of all things past and all to come PERIOD

The vibrations of a word may bring entire universes into existence PERIOD With a few sentences COMMA Agathos once birthed blazing spheres COMMA brilliant flowers COMMA and the oceans and volcanoes of wild stars PERIOD

A year before Edison invented the phonograph COMMA Florence McLandburgh wrote a story about a great OPEN QUOTE Ear of the World CLOSE QUOTE capable of hearing every past vibration PERIOD So secretive was the inventor of this marvelous device COMMA however COMMA that he solicited a mute COMMA illiterate woman to verify its workings PERIOD And when the woman was carried off in aural reverie DASH while carrying off the machine itself DASH the inventor took her life PERIOD

The device continued to operate after this incident COMMA indiscriminately transmitting the Alpine shepherd COMMA the organ fugue COMMA and the cries of the dying woman PERIOD Try as he might COMMA the inventor couldn’t filter out her death rattle COMMA turning his beloved machine into a worthless object DASH an OPEN QUOTE absolute horror PERIOD CLOSE QUOTE


Unlike people COMMA computers need help separating speech sounds from other sounds PERIOD When you speak to a computer COMMA you should be in a place without too much noise PERIOD Then COMMA you must speak clearly into a microphone that has been placed in the right position PERIOD If you do this COMMA the computer will hear you just fine COMMA and not get confused by the other noises around you PERIOD

A second challenge is to recognize speech from more than one speaker PERIOD People do this very naturally PERIOD We have no problem chatting one moment with Aunt Grace COMMA who has a high COMMA thin voice COMMA and the next moment with Cousin Paul COMMA who has a voice like a foghorn PERIOD People easily adjust to the unique characteristics of every voice PERIOD

One of the ways we can identify speakers is by following the movements of their lips PERIOD If Uncle Phil opens his mouth and a foghorn sounds COMMA we can reasonably assume that the foghorn is coming from him PERIOD This is as much as we can assume COMMA however COMMA for we can’t see the true origin of his voice PERIOD Speech COMMA in this sense COMMA is ventriloquial PERIOD Sound unseen is acousmatic PERIOD

In the sixth century BC COMMA Pythagoras was known to give acousmatic lectures from behind a veil to students sitting in the dark PERIOD We would say that a veil separates you from your computer PERIOD Darkness also divides you PERIOD


When people first start using speech recognition software COMMA they might be surprised that the computer makes mistakes PERIOD Maybe unconsciously we compare the computer to another person PERIOD But the computer is not like a person PERIOD What the computer does when it listens to speech is different from what a person does PERIOD

The computer has difficulty COMMA for example COMMA distinguishing between two or more phrases that sound alike PERIOD People use common sense and context DASH knowledge of the topic being talked about DASH to decide whether a speaker said OPEN QUOTE ice cream CLOSE QUOTE or OPEN QUOTE I scream PERIOD CLOSE QUOTE Unfortunately COMMA the computer can’t use common sense the way people do PERIOD

People might also be surprised that the computer has difficulty distinguishing between two or more words that sound alike PERIOD We say OPEN QUOTE acousmatic COMMA CLOSE QUOTE and the computer hears OPEN QUOTE acousmate PERIOD CLOSE QUOTE The difference is not insignificant PERIOD

OPEN QUOTE Acousmate CLOSE QUOTE first appeared in an article from 1730 describing a strange event in the parish of Ansacq PERIOD One night COMMA the air filled with a multitude of OPEN QUOTE human voices of different sounds COMMA sizes and brightness COMMA of all ages COMMA of all sexes COMMA speaking and crying all at once PERIOD CLOSE QUOTE

Several causes were proposed over the ensuing months PERIOD Natural scientists attributed the event to air masses striking the uneven landscape COMMA while the god fearing detected the work of demon spirits PERIOD Some parishioners even suspected that a ventriloquist had played an elaborate joke COMMA for the racket ended in OPEN QUOTE peals of delicate laughter COMMA as if there had been three or four hundred people who began to laugh with all their force PERIOD CLOSE QUOTE Certainly COMMA it would not be the last time that a multitude of human voices irrupted in the ether PERIOD


Speech recognition software works best when the computer has a chance to adjust to each new speaker PERIOD The process of teaching the computer to recognize your voice is called OPEN QUOTE training COMMA CLOSE QUOTE and it’s what you are doing now COLON You are Pythagoras COMMA and the computer is your pupil PERIOD The more you speak COMMA the better it learns to listen PERIOD

The computer will keep track of how frequently words occur by themselves and in the context of the other words PERIOD This information helps the computer choose the most likely word or phrase among several possibilities PERIOD Its method COMMA known as brute force computing COMMA relies on statistical learning algorithms to construct models from your data PERIOD Massive amounts of unintelligent computation COMMA in short COMMA gauge the probability of the sound samples we know as words PERIOD

Brute force computing was an unlikely child of the 1960s PERIOD At the time COMMA scientists still dreamed of true artificial intelligence that could learn and understand human languages PERIOD A mechanical model of the ear and vocal tract COMMA they proposed COMMA would someday perform to the measure of their biological counterparts PERIOD Until technology advances to match this dream COMMA however COMMA speech recognition software can ignorantly but competently listen PERIOD


The training process takes only a few minutes for most people PERIOD If COMMA after you begin using the program COMMA you find that the computer is making more mistakes than you expect COMMA use the tools provided in the TOOLS menu to improve the recognition accuracy PERIOD

Additionally COMMA people sometimes mumble COMMA slur their words COMMA or leave words out altogether PERIOD They assume COMMA usually correctly COMMA that their listeners will be able to fill in the gaps PERIOD Unfortunately COMMA computers won’t understand mumbled speech or missing words PERIOD They only understand what was actually spoken and don’t know enough to fill in the gaps by guessing what was meant PERIOD

In some cases COMMA what was spoken may not be heard PERIOD Chilling temperatures COMMA for example COMMA can freeze language PERIOD If you experience this problem COMMA try adjusting the room temperature or warming the words in your hands PERIOD As the ice melts COMMA you will hear what you said COMMA and the computer will too PERIOD

The last time words froze COMMA computers were still called people PERIOD Journeying through the northern climes of Nova Zembla COMMA Sir John Mandeville’s crew fell prey to this silent spectacle COMMA OPEN QUOTE nodding and gaping at one another COMMA every man talking and no man heard PERIOD CLOSE QUOTE Three weeks elapsed until a turn of wind warmed the air PERIOD If a modern computer were aboard the ship COMMA it would have transcribed the crackling of consonants COMMA the lovelorn sighs of lonesome sailors COMMA and the tardy epilogue of a bear PERIOD

If a computer accompanied Pantagruel through the Frozen Sea COMMA it would have run aground a land of prattle COMMA ignorantly COMMA competently COMMA and indiscriminately recording the thawing din of a great battle COLON OPEN QUOTE hin COMMA hin COMMA hin COMMA hin COMMA his COMMA tick COMMA tock COMMA taack COMMA brcdelin HYPHEN brededack COMMA frr COMMA frr COMMA frr COMMA bou COMMA bou COMMA bou COMMA bou COMMA bon COMMA bou COMMA track COMMA track COMMA trr COMMA trr COMMA trr COMMA trrr COMMA trrrrr on COMMA on COMMA on COMMA on COMMA on COMMA ououououon COMMA gog COMMA magog PERIOD CLOSE QUOTE

And if a computer were a young student of Plato COMMA then it would live life as a very long winter COMMA at the end of which COMMA old and obsolete COMMA it could finally warm to his teachings PERIOD

But a computer had none of these experiences PERIOD Computers are none of these things PERIOD


To understand what it means to speak both clearly and naturally COMMA listen to the way newscasters read the news PERIOD If you copy this style when you dictate COMMA the program should successfully recognize what you say PERIOD

One of the most effective ways to make speech recognition work better is to practice speaking clearly and evenly when you dictate PERIOD Try thinking about what you want to say before you start to speak PERIOD This will help you speak in longer COMMA more natural phrases PERIOD

Speak at your normal pace without slowing down PERIOD When another person is having trouble understanding you COMMA speaking more slowly usually helps PERIOD It doesn’t help COMMA however COMMA to speak at an unnatural pace when you are talking to a computer PERIOD This is because the program listens for predictable sound patterns when matching sounds to words PERIOD If you speak in syllables COMMA each syllable is likely to be transcribed as a separate word PERIOD

With a little practice COMMA you will develop the habit of dictating in a clear COMMA steady voice COMMA and the computer will understand you better PERIOD


When you read this training text COMMA the program adapts to the pitch and volume of your voice PERIOD For this reason COMMA when you dictate COMMA you should continue to speak at the pitch and volume you are speaking with right now PERIOD If you shout or whisper when you dictate COMMA the program won’t understand you as well PERIOD

With a shout or a whisper COMMA the program comes undone PERIOD Semes give way to intensities of force that expose COMMA penetrate COMMA and bind us together PERIOD So important are these aspects of human communication that Daniel Heller-Roazen can imagine OPEN QUOTE the primary form of human speech to be not a statement COMMA a question COMMA but an exclamation PERIOD CLOSE QUOTE Language is most itself COMMA he claims COMMA when it leaves OPEN QUOTE the terrain of its sound and sense COMMA CLOSE QUOTE opening itself to the surrounding babble PERIOD

This proposition runs counter to what most people learn COMMA so take a moment to consider its ramifications PERIOD If Aristotle felt compelled to exclude prayers and cries from the realm of logic COMMA for example COMMA then he must have sensed that there was something dangerous DASH even radical DASH about affect PERIOD PERIOD PERIOD

If in the beginning there was the exclamation COMMA what follows would be a history of the limit COLON the OPEN QUOTE murky speech COMMA CLOSE QUOTE Dina Al-Kassim writes COMMA that sometimes gathers itself into a counterdiscourse PERIOD This unsovereign COMMA unintelligible speech fills the mouths of ranters COMMA noisemakers COMMA and dissenters PERIOD It has failed to father a lineage COMMA though in different ages and for different peoples COMMA irrupts nonetheless PERIOD


We are not obligated to train our software COMMA though doing so can remind us of the norms we are dictated to keep PERIOD One of the first speech recognizers was a dog named Rex PERIOD Created in the 1920s COMMA he responded not only to his master’s voice COMMA but to any speaker who called his name at a prescribed frequency PERIOD Speech recognition software has remained on a tight leash to this day PERIOD It will not let you be a noisemaker COMMA but if you speak clearly at a normal pace COMMA it will understand and obey PERIOD

The true origin of the voice is hidden from view PERIOD Speech COMMA in this sense COMMA is acousmatic PERIOD Michel Chion has described how COMMA in the passage from acousmetre to acousmachine COMMA the image OPEN QUOTE peels off CLOSE QUOTE the person COLON A living person dies so that OPEN QUOTE the image that is pure mechanical recording may live PERIOD CLOSE QUOTE The computer may be less menacing than the acousmachine or the phonograph COMMA yet it takes something from us all the same PERIOD We are not all born to be newscasters PERIOD Something must be peeled off PERIOD

Who is the subject who is supposed to speak to the computer QUESTION MARK We know where she must place her microphone PERIOD We know how she must speak PERIOD A lingua franca COMMA writes Édouard Glissant COMMA OPEN QUOTE is always apoetical PERIOD CLOSE QUOTE The subject supposed to speak to the computer may be as well PERIOD


There are at least two voices COMMA not one PERIOD If we are citizens of the Monoglot Millennium COMMA we are also witnesses to The Great Thaw PERIOD The planet warms with a crackling of consonants COMMA and a multitude of voices irrupts in the air PERIOD Words melt in the palms of our hands COMMA as phrases never known and thus never forgotten ride the updrafts COMMA vibrating new worlds into existence PERIOD Even our machines are no longer silent scribes PERIOD
The multitude never begins nor ends COMMA and we enjoy losing our voice among the others COMMA though sometimes enjoy it less PERIOD At last COMMA dead noises can climb out of the abyss COLON the Laugh of the Augurs and the Song of the Swan play as if on the very chord of our being DASH intimate COMMA impersonal sounds PERIOD
Over time COMMA prattle once known and forcibly forgotten may also begin to melt PERIOD We will hear the echoes of unbounded babble PERIOD We may slowly unlearn to speak PERIOD

We hope you have enjoyed reading about the different ways that people and computers recognize spoken language as well as some tips for effective dictating PERIOD



The following are some of the source materials Coburn used for NaturallySpeaking.

MacSpeech. MacSpeech Dictate International (Version 1.5.9) [Computer program]. 2010.

Addison, Joseph. “No. 254. Thursday, November 23. 1710.” In The Tatler: By the Right Honourable Joseph Addison, Esq;. Ann Arbor, MI: University of Michigan Library, 2007: 220–24. Read the text here

Al-Kassim, Dina. On Pain of Speech: Fantasies of the First Order and the Literary Rant. Berkeley, CA: University of California Press, 2010. Read the text here

Chion, Michel. La voix au cinéma [The voice in cinema]. Paris: Cahier du Cinéma, Editions de l’Etoile, 1982. Read a description of the text here

de Certeau, Michel. “Utopies vocales: glossolalies” [Vocal utopias: glossolalias]. Traverses 20 (November 1980): 26–37. Read the text here

Dolar, Mladen. A Voice and Nothing More. Cambridge, MA: MIT Press, 2006. Read a description of the text here

Glissant, Édouard. “Transparency and Opacity.” In Poetics of Relation. Trans. Betsy Wing. Ann Arbor, MI: University of Michigan Press, 1997: 111–20. Read the text here

Heller-Roazen, Daniel. Echolalias: On the Forgetting of Language. Brooklyn, NY: Zone Books, 2008. Read a description of the text here

Kahn, Douglas. Noise, Water, Meat: A History of Sound in the Arts. Cambridge, MA: MIT Press, 2001. Read the text here

Kane, Brian. “Acousmate: History and de-visualised sound in the Schaefferian tradition.” Organised Sound 17, no. 2 (Fall 2012): 179–88. Read the text here

McLandburgh, Florence. “The Automaton Ear.” In The Automaton Ear and Other Sketches. Chicago, IL: Jansen, McClurg and Co., 1876: 7–43. Read the text here

Pieraccini, Roberto. The Voice in the Machine: Building Computers That Understand Speech. Cambridge, MA: MIT Press, 2012. Read a description of the text here

Poe, Edgar Allan. “The Power of Words.” In Thomas Ollive Mabbott, ed., The Collected Works of Edgar Allan Poe — Vol. III: Tales and Sketches 1843–1849. Cambridge, MA: Harvard University Press, 1978: 1210–17. Read the text here

Rabelais, François. Gargantua and Pantagruel. c. 1532–64. Reprint: Trans. Burton Raffel. New York: W. W. Norton & Company, 1991. Read the text here

Weiss, Allen S. “Narcissistic Machines and Erotic Prostheses.” In Richard Allen and Malcolm Turvey, eds., Camera Obscura, Camera Lucida: Essays in Honor of Annette Michelson. Amsterdam: Amsterdam University Press, 2003: 51–74. Read the text here

Get Updates

We want to hear from you!

Help us improve our website by taking a 5-minute survey with a chance to win $100!

Take Survey
Back to mobile site