18 June 2012 ~ Comments

Transcription: From Man or Machine?

One of the most common questions asked about the closed captioning process centers around transcription.  Many people inquire about whether transcription is an automated process using voice recognition, or if it is still truly a function done best by humans.

There are certainly some great software and equipment available today that are taking voice recognition to a whole new level.  The most-effective voice recognition option, though, is one that is programmed and catered to one specific speaker with his or her own unique speaking style and dialect.

Interestingly, this can be an issue as television shows feature people from different locations and backgrounds, resulting in different speaking styles.  In fact, the topic of various dialects in the United States has been featured in recent years on television programs such as a special on PBS called Do You Speak American? and an episode of How the States Got Their Shapes on History.  These programs demonstrate the vastness of American English including differences between at least 24 regional accents and the diverse meanings that certain words hold in different parts of the country.

If that doesn’t make our language complicated enough, consider the results of a recent collaborative effort by Google and Harvard University.  The study found that the English language has doubled in size in the last century and continues to expand by 8,500 new words every year.   Constantly evolving vernaculars and a sharp growth in vocabulary can certainly make it difficult to keep track of what is being said.

Think this isn’t an issue for computer software?  Run spell check the next time you type a document.  Product names, religious terms, and new words buzzing through popular culture are all examples of what may come up as unrecognized.  These types of words will likely be misinterpreted or skipped over by voice recognition.  Of course, the Internet will make it easier for computers and software to be updated more frequently in the future, but this is definitely an obstacle for voice recognition at present.

While human transcriptionists may have to make extra effort deciphering other dialects or learning new terms, they still maintain many advantages over computers.  The best voice recognition is trained to the pattern and speaking style of a specific voice, whereas most humans have experience listening to and interacting with people that speak in various dialects.  Humans also benefit from collective memory, which means that new terms and phrases can get communicated and remembered very quickly across culture.

Voice recognition and other new technologies are certainly playing a large role in making media more accessible to all users.  For now, though, it is still preferred that a transcript for use in closed captioning  be created by a professional, human transcriptionist.

blog comments powered by Disqus