Why Neat Audio Is the Best For Video Transcription
Øystein Birkenes, Dec 5, 2024
The results speak for themselves! We tested the audio capabilities of four competing video devices to determine their compatibility with the transcription features of several leading video platforms. Neat Bar Pro was the clear winner compared to the three competitor devices.
Neat’s mission to perfect natural speech and verbal expression in video meetings is brilliant for both humans and AI. Here’s why:
As Neat’s audio team lead, I can affirm that Neat’s audio vision is simple. As a Neat user, you should hear and understand people on the other end of a video call clearly without effort. You should also be confident they can hear and understand you equally well. It should feel as though you’re physically in the same room. In other words, as natural as possible.
That people should be able to hear each other is perhaps obvious. However, I also want to emphasize the importance of people understanding each other. When we engage in person, we’re not always able to focus 100% on the conversation. Everyone’s brain has limited processing capacity, so when your mind drifts off to thoughts about what you’ll have for dinner, whether your dog is OK being alone in the house, or maybe that last email you received, you have less capacity for mentally grasping what people are saying.
Addressing audio details for more natural-sounding video meetings
In a video meeting, you have extra distractions, like “Can you hear me?” or “What’s that annoying echo?” In addition, your brain uses more processing power because of a lesser degree of “naturalness,” which could be due to several factors. For example, it could be due to speech processing artifacts in the video device. It could also be because the speech is not coming from the mouth of the person speaking or because of the unnaturally long delay, making it more challenging to have a lively, cohesive, free-flowing conversation.
Conquering echo, noise, and reverberation for greater clarity
The fraction of our brain power used for speech understanding in a video meeting is less than in an in-person meeting. Hence, in addition to providing solutions so people can hear each other well, Neat’s mission is to maximize people’s ability to grasp the moment. At least up to the amount we have when we’re physically together. The first step is overcoming fundamental audio challenges, including echo, noise, and reverberation.
Since founding Neat in February 2019, we’ve taken deliberate steps toward ensuring everyone understands every word spoken during a video meeting. Along the way, we’ve entirely suppressed echo down to an inaudible level and carefully reduced noise and reverberation so that the naturalness of speech isn’t compromised. We’re continuously working toward maximizing hearing and understanding while minimizing people’s listening efforts.
Achieving a high degree of natural speech for both people and AI to understand
After all, our human neural networks have evolved over millions of years to understand natural speech. Similarly, people typically train artificial neural networks with millions of hours of natural speech. That’s why our goal of achieving a high degree of natural speech in video meetings is equally superb for humans and AI. Maximizing hearing and understanding while minimizing listening effort benefits both.
Hence, Neat Audio works incredibly well with many of today’s video platform’s leading AI transcription features. Such tools can present information for upcoming meetings, transcribe and caption meetings, and provide recaps if anyone joins a meeting late. After a meeting, they can summarize discussion points, list key actions considered, and answer questions covered. They also offer live translations in numerous languages and enable AI-generated notes and tasks.
Video meetings where people utilize these features on Neat devices are more accurate because of our innovative audio capabilities.
Making it easy for speech recognition tools to transcribe dialogue accurately
Although the level at which AI can understand human speech is a matter of ongoing research, AI systems must at least be able to quickly and easily transcribe speech into text or interpret spoken language into actionable intents. Thus, superb audio clarity without interference is vital. Hence, we conducted a small experiment with four competing devices to assess the accuracy of the speech-to-text feature on several leading video platforms.
With low ambient noise levels, there were few errors. However, the number of mistakes grew as we increased the ambient noise. The tests corresponded well with our subjective ability to understand speech with minimal effort: the higher the noise level, the more difficult it was to understand what someone was saying. Double-talk (if two or more people talk over each other, and the audio becomes jumbled up and incomprehensible) with loud output volumes was the most difficult for human hearing and AI due to the low naturalness caused by aggressive audio processing. In short, like humans, AI gets confused with scrambled dialogue.
Testing four video devices to see which was more compatible with AI
We repeated the same experiment with our competitors’ devices (with the same loudness) to learn whether Neat’s unique audio processing capabilities have also paid off regarding AI readiness. We happily discovered that Neat Bar Pro was the clear winner among the four video devices we tested, especially regarding double talk, which all the other devices struggled to address effectively.
When a device’s double-talk performance is poor, people may not always hear what you’re saying. Or worse, they probably won’t even realize you’re saying anything at all. It means you can’t quickly jump into a conversation without most other vendors’ devices clipping or dampening your voice. Neat devices enable you to share in lively debates without that worry.
The other devices’ transcriptions were often quite funny. For example, using one particular video platform’s transcription feature, when we tested for audio clarity with the spoken line, “The shaky barn fell with a loud crash,” Neat Bar Pro was spot on, and the platform’s transcription feature captioned it perfectly. However, the results from the three competitor devices missed the mark to varying degrees. Competitor A’s device’s audio led to “Fell with a loud crutch.” Meanwhile, competitor B’s audio resulted in “Shaking allowed,” and competitor C’s “A shaky bone fell with a loud crap.” I kid you not!
Maximizing hearing and understanding while minimizing effort
Our test was just a tiny experiment with less than 100 words. Still, it at least indicates that we’re on the right track with Neat’s laser focus on maximizing hearing and understanding while simultaneously minimizing listening effort. David Bowie once said, “The future belongs to those who can hear it coming.”
Sounds like Neat!
Discover our award-winning devices at neat.no, or book a demo to hear how Neat Audio makes video meetings more understandable and AI-compatible.