All articles

How to Improve Audio Quality for Transcription: A Complete Guide

·15 min read

Audio quality is the single biggest factor that determines transcription accuracy. Even the most advanced speech recognition models, including OpenAI Whisper, produce significantly worse results on noisy, quiet, or distorted recordings. This guide covers concrete steps to record clean audio and prepare your files for transcription.


Why Audio Quality Matters

The relationship between recording quality and transcription accuracy is direct and measurable. The industry standard metric is WER (Word Error Rate) — the percentage of incorrectly recognized words.

Typical WER benchmarks:

The difference between 5% and 25% WER is the difference between "copy and use" and "spend an hour on manual corrections." Investing 10 minutes in recording preparation saves you hours of editing.


How to Record Clean Audio

Choosing a Microphone

Your laptop's built-in microphone is the worst option for transcription. It picks up every room sound: keyboard clicks, fan noise, street sounds. Even a budget external microphone will produce dramatically better results.

USB microphones (for desk recording):

Lavalier microphones (for interviews and conversations):

For meetings and group recordings:

Recording Best Practices

Even with a great microphone, you can get a bad recording if you ignore basic rules.

Room selection:

Distance to microphone:

Volume levels:

Recording format:

Recording Meetings and Calls

In-person meetings:

Recording Zoom/Teams/Google Meet:

Recording phone calls:


Audio Preprocessing Before Transcription

If the recording is already done and the quality is not ideal, all is not lost. Basic processing can significantly improve transcription results.

Noise Reduction

Audacity (free, Windows/Mac/Linux):

Audacity is the most popular free audio editor. Here is a step-by-step noise reduction guide:

  1. Open your file in Audacity
  2. Find a section where nobody is speaking but background noise is audible (at least 1-2 seconds)
  3. Select that section with your mouse
  4. Menu: Effects → Noise Reduction → "Get Noise Profile"
  5. Select the entire recording (Ctrl+A / Cmd+A)
  6. Menu: Effects → Noise Reduction → adjust parameters:
    • Noise reduction: 12-18 dB (start at 12, increase if noise persists)
    • Sensitivity: 6-8
    • Frequency smoothing: 3-6
  7. Click "Preview" to check, then "OK"

Adobe Podcast Enhance (free online tool):

Adobe offers a free speech enhancement tool at podcast.adobe.com/enhance. Upload your file — the AI automatically removes noise, adds voice clarity, and normalizes volume. Limit: files up to 1 hour. The results are impressive — often better than manual processing.

FFmpeg (command line):

For those who prefer automation, FFmpeg offers powerful filters. The afftdn filter provides adaptive noise reduction based on FFT. For more aggressive noise removal, increase the noise reduction parameter to 30-40. The silenceremove filter helps trim long pauses, which is useful for saving processing time.

Volume Normalization

Normalization evens out the recording volume — quiet speech gets louder, peaks get smoothed.

Why it matters:

How to do it in Audacity:

  1. Open your file
  2. Select the entire recording (Ctrl+A / Cmd+A)
  3. Menu: Effects → Normalize
  4. Set peak amplitude to: -1.0 dB
  5. Click "OK"

For more advanced normalization, use the Compressor (Effects → Compressor) — it evens out the difference between quiet and loud sections without clipping peaks.

Format Conversion

There is an optimal audio format for transcription. Diktovka automatically converts uploaded files, but if you are processing manually, here are the ideal parameters:

Optimal parameters for transcription:

Why mono is better than stereo:

In Audacity: Tracks → Mix → Mix Stereo Down to Mono. Then: Project → Rate → 16000 Hz. Export: File → Export → WAV 16-bit.


Common Problems and Solutions

ProblemCauseSolution
Background noise (hum, hiss)HVAC, electronics, trafficNoise reduction in Audacity or Adobe Enhance
Echo and reverbEmpty room, bare wallsDe-reverb filter; for future recordings, use a room with soft furnishings
Quiet voiceToo far from microphoneNormalization; when recording, move closer to the mic
Overlapping speakersPeople talking simultaneouslyCannot be fully fixed, but diarization in Diktovka helps separate speakers
Background musicRadio, ambient musicVocal isolation tools (UVR5, Demucs); best solution: turn off music during recording
Pops and clicksToo close to mic, no pop filterDe-click filter in Audacity; use a pop filter or angle the mic 45 degrees
Distortion (clipping)Microphone overloadCannot be fixed after the fact; lower the input level before recording
Phone qualityCompressed voice codecNormalization + light noise reduction; use VoIP when possible for better quality

Diktovka Automatically Optimizes Your Audio

The Diktovka platform automatically performs key preparation steps when you upload a file:

The platform handles even imperfect recordings — phone calls, noisy meeting recordings, voice messages. But the better the source quality, the more accurate the result. Investing 10 minutes in preparation yields a significantly more accurate transcription.


Pre-Recording Checklist

Print this out or save it — check before every important recording:

  1. Microphone is connected and selected as the input device in your system settings
  2. Test recording done — listen to 10 seconds, verify the audio is clean
  3. Room is quiet — windows closed, noisy devices off
  4. Distance to microphone — 15-30 cm (or lavalier clipped 15-20 cm from mouth)
  5. Recording level — peaks between -12 and -6 dB, not hitting the red zone
  6. Recording format — WAV or FLAC (not MP3 128 kbps)
  7. Sufficient disk space — WAV uses ~10 MB/min
  8. Ask participants not to interrupt and to speak clearly
  9. Pop filter in place (for desktop microphones)
  10. Recording is running — sounds obvious, but it gets forgotten more often than you think

Conclusion

Improving audio quality for transcription is not rocket science. A decent microphone for $25-100, a quiet room, and proper recording settings deliver 80% of the result. The remaining 20% is post-processing in Audacity or Adobe Enhance.

Upload your prepared audio to Diktovka — and get a transcription that barely needs editing.

FAQ

What microphone is best for transcription?

For desk recording, a USB microphone works best: the budget Fifine K669 (~$25) or Blue Yeti (~$100) for top quality. For interviews, a lavalier like Boya BY-M1 (~$20). For meetings, a speakerphone like Jabra Speak 510. Even a budget external microphone is dramatically better than a laptop's built-in mic.

How do I remove noise from an audio recording before transcription?

In free Audacity: find a silent section with background noise, select it, apply 'Get Noise Profile', then select the entire recording and run 'Noise Reduction' (12-18 dB). An easier option is Adobe Podcast Enhance (free online tool), which automatically cleans audio using AI.

What is the minimum audio quality needed for good transcription?

For 5-8% WER (minimal editing needed), record in a quiet room with an external microphone 15-30 cm away. Use WAV or MP3 320 kbps format. With noisy recordings, WER rises to 15-25%, and with poor quality (echo, quiet voice) to 25-40%, requiring extensive manual editing.

What audio format is best for transcription?

Optimal settings: mono, 16 kHz, 16-bit WAV. Mono is better than stereo — speech recognition models work with single-channel signal, voice is stronger relative to background noise, and the file is half the size. Avoid MP3 128 kbps and below due to noticeable quality loss.

How can I improve a recording using FFmpeg?

FFmpeg offers the afftdn filter for adaptive noise reduction based on FFT. For aggressive noise reduction, increase the noise reduction parameter to 30-40. The silenceremove filter removes long pauses, saving processing time. For optimal format conversion: mono, 16 kHz, 16-bit.