The Art of Audio

Øystein Birkenes, Sep 30, 2021

How Neat purposefully designs devices with audio as close as possible to natural in-person speech and why we stand out from other video hardware solutions.

As we talked about a while ago in our Design as a Way of Working blog, Neat thrives amid a design-driven culture. We design EVERYTHING with the sole purpose of delivering delightful customer experiences while striving to make video communications more and more true to life. And a vital part of that is audio.

Our approach to designing world-class audio is twofold. First, as a Neat user, you should clearly and easily hear what people say. Second, you should also be confident they can understand you equally well.

Needless to say, the result of millions of years of biological evolution means that nothing truly beats meeting someone face-to-face. Primarily, that’s because our human auditory system is brilliant at focusing on whoever we’re talking to while ignoring all other noises around us. So neat’s mission is to get as close as possible to replicating a natural in-person auditory encounter through our video devices.

Eliminating acoustic echoes

In general, hardware providers must overcome several challenges to replicate natural auditory encounters with video devices. Most important is eliminating acoustic echoes, an occurrence that results in you hearing your voice reverberate back at you after a short delay. Although they’re perfectly natural when shouting towards a mountain or valley a short distance away, not only are acoustic echoes unnatural in conversation, but they’re annoying and distracting, even at a low level.

All Neat devices have an excellent echo canceller that suppresses echo and not someone’s speech. As for the other echo cancellers out there, even though many do a great job of suppressing echo, they unintentionally squash speech too, which typically happens if people talk over each other – a phenomenon known as ‘double-talk.’

When ‘double-talk’ performance is poor, people may not always hear what you’re saying. Or worse, they probably won’t even realize you’re saying anything at all. It means that you can’t just quickly jump into a conversation without most devices dampening out your voice. Neat devices enable you to share in lively debates without that worry.

Reducing the effect of background noise

Background noises are another issue. When you transmit sound to a far-end participant, you need to compress it by sampling the sound field in time and space. During this process, you lose a lot of information, such as spatial cues (knowing where a sound originates from and where to focus your attention), which our auditory system heavily relies on to hear someone talk while ignoring any peripheral noises.

On the far end of a video call, these noises become more amplified, which can be extra unsettling, and, therefore, need to be gotten rid of or suppressed. Other conferencing systems tend to focus on completely erasing background noises, often to the detriment of vocal clarity where it becomes muffled.

But Neat’s primary objective is striking the right balance between reducing them and simultaneously preserving high-quality speech. That way, although present, background noises are far less prevalent, and people can always easily hear each other.

Tightly combining hardware and software

Neat firmly believes that the best approach for ensuring outstanding audio quality and speech intelligibility is to tightly combine hardware and software. That’s why we always design our video devices from scratch. We create our hardware for our software and vice versa. In other words, we develop our hardware and software together to fulfil a common goal.

We choose high-quality hardware components such as loudspeakers and microphones, then carefully integrate them into the device specifically to use so-called online machine learning algorithms in the software.

These simple yet super effective algorithms continuously learn and adapt to a room’s acoustics and noises, only on the device, not in the cloud. Again, this is different from other system providers, who use conventional signal processing algorithms (with all sorts of false assumptions and sub-optimal heuristics) and offline machine learning methods (that can’t adapt to the actual acoustic environment).

To sum up, Neat is edging ever closer to replicating the realness of people meeting face-to-face. And although we’re not quite there yet, certain aspects are already making meeting online a more rewarding experience than meeting in-person, such as automatic captioning and live translations.

As they say, the best way to predict the future is to design it. Or, as pioneering rock artist David Bowie once put it, “Tomorrow belongs to those who can hear it coming!”

To learn how to get the best out of your Neat meeting space, read our Sound Advice from Neat article. For all other info on Neat devices, visit neat.no. We offer worldwide shipping and can deliver in just a few days.