How to Convert Audio Files Into Text

Whether transcribing a legal deposition or conducting an interview for a newspaper, finding the best way to convert spoken-word audio into a text document is an age-old problem. Transcribing services are expensive and not always prompt. Audio-to-text conversion software often produces such inaccurate documents that it takes longer to correct them than to do a manual transcription. Though there is no magic bullet solution, a little technical savvy can streamline if not altogether eliminate the labor intensive aspects of transcription.

Woman using laptop computer
credit: Jupiterimages/Brand X Pictures/Getty Images

Step

Acquire audio editing software. Pro Tools and GoldWave are two popular applications but a plethora of shareware and freeware applications will do the trick.

Step

Open your audio file in your chosen editing software. Your application will display the audio file as a visual waveform that allows you to “scrub” back and forth across the recording and avoid losing your place. If you cannot open the file, you may need to acquire audio conversion software to convert it into a file format the application can recognize. Consult the link in the Resources section if you need assistance with this issue.

Step

Acquire a computer headset and plug it into the appropriate audio jack of your computer.

Step

Open the audio file in your editing software, play a few seconds of your audio recording and memorize the words.

Step

Repeat the words into your voice-recognition application. At first, your software may not transcribe your speech with complete accuracy. Don’t panic. Voice-recognition applications are able to “learn” your speech habits over time until they work about 99 percent of the time. Consult the instructions for your chosen application on how to help it recognize what you are saying.

Step

Repeat steps 4 and 5 until you finish your transcription. Try to gauge how many seconds of spoken-word audio you can memorize for transcription. The more speech you can memorize, the faster you can transcribe. Don’t get impatient. Unless you have a photographic memory, doing more than 30 seconds at a time may result in mistakes or omissions.