audio

Lenovo Tab Plus review. An audiophile’s tablet 

If watching movies, listening to music and catching up on social media are your primary needs, then the Lenovo Tab Plus is definitely worth considering




audio

Netflix, Sennheiser together to boost audio experience

Spatial audio feature rolls out on Netflix for shows like Stranger Things, The Witcher, Locke & Key, and more




audio

Belkin expands its audio portfolio in India

Belkin earbuds will provide comfortable in-ear fit, 31 hours of playtime, and are sweat and splash resistant Recent launche wireless earbuds makes its debut with 31 hours of playtime




audio

Thomson invests ₹50 crore in Noida plant for audio speakers manufacturing

With a focus on capturing a 10% market share by 2028, Thomson’s CEO, Avneet Singh Marwah, has revealed ambitious expansion plans for the brand’s speaker range, aiming to offer over 80 models within the next three years




audio

Sennheiser PXC 550-II headphones: Now travel with superior audio

The PXC 550-II Wireless features a triple microphone array that makes it easy to stay connected while on the move.




audio

Aha™ Cranks up the Entertainment Factor with Dozens of News, Music, Talk, Lifestyle and Children's Audio Stations

LAS VEGAS-- Aha by HARMAN today announced further expansion of entertainment and lifestyle programming available on its platform through partnerships with streaming innovators Entertainment Radio Network, the Kaliki Audio Newsstand, and Storynory. Aha brings a world of infotainment to its users on their smart phones and in their cars with more than 30,000 stations of content spanning from the most popular mainstream programs to unique niche interests. By the end of 2013, Aha will be installed into vehicles by more than 10 auto manufacturers which in total represent more than 50 percent of all cars sold in the USA/Canada and up to 30 percent in Europe.




audio

Jeep Grand Cherokee drivers sit back and enjoy – with new HARMAN infotainment and audio systems

Sciacca, Italy, May 2013 – A brand new connectivity experience is awaiting drivers of the new Jeep Grand Cherokee: The new Uconnect™ infotainment system by HARMAN will help drivers stay connected to their vehicles and the world around them, featuring increased voice recognition capabilities and more realistic navigation. They can even stream off-board entertainment and other content through the Connected Media Center (CMC).




audio

New Harman Kardon® Audio/Video Receivers Accomplish Flawless Versatility and Performance

STAMFORD, Conn. – HARMAN International Industries, Incorporated, introduces three new audio/video receivers that seamlessly mesh versatility, quality and efficiency to create a peerless multimedia experience. The Harman Kardon® AVR 1510, AVR 1610 and AVR 1710 (right) feature the brand’s iconic styling and unmatched sound reproduction in addition to enhanced support for streaming and external devices. Harman Kardon launched the world’s first audio receiver in 1953 and the first stereo receiver in 1958.




audio

Harman Enriches Harman Kardon® AVR 1x1s Series of Audio/Video Receivers with Spotify Connect, HDMI 2.0

CES 2015, LAS VEGAS – HARMAN, the premium global audio, infotainment and enterprise automation group (NYSE:HAR), introduces its enhanced Harman Kardon AVR 1x1 Series with the additions of Spotify Connect in all models and HDMI 2.0 in the two top models. The AVR 1x1s series, which includes Harman Kardon® AVR 1510, AVR 1610 and AVR 1710, was released last year and stands unprecedented in its versatility, quality and efficiency. Each Harman Kardon AVR 1x1s features the brand's iconic styling and unmatched sound reproduction and offers enhanced support for streaming and external devices.




audio

windows server and hyper v no audio




audio

Windows Server 2016: Audio In/Out through Remote Desktop to Thinclients




audio

Audio not working at all on ThinkPad R30




audio

How To Change Your Audio Volume




audio

How To Mute Your Audio




audio

Changing Audio File Formats

MP3 -> Wave or Wave -> MP3




audio

Using Audacity To Convert Audio Files




audio

Why Fake Video, Audio May Not Be As Powerful In Spreading Disinformation As Feared

"Deepfakes" are digitally altered images that make incidents appear real when they are not. Such altered files could have broad implications for politics.; Credit: /Marcus Marritt for NPR

Philip Ewing | NPR

Sophisticated fake media hasn't emerged as a factor in the disinformation wars in the ways once feared — and two specialists say it may have missed its moment.

Deceptive video and audio recordings, often nicknamed "deepfakes," have been the subject of sustained attention by legislators and technologists, but so far have not been employed to decisive effect, said two panelists at a video conference convened on Wednesday by NATO.

One speaker borrowed Sherlock Holmes' reasoning about the significance of something that didn't happen.

"We've already passed the stage at which they would have been most effective," said Keir Giles, a Russia specialist with the Conflict Studies Research Centre in the United Kingdom. "They're the dog that never barked."

The perils of deepfakes in political interference have been discussed too often and many people have become too familiar with them, Giles said during the online discussion, hosted by NATO's Strategic Communications Centre of Excellence.

Following all the reports and revelations about election interference in the West since 2016, citizens know too much to be hoodwinked in the way a fake video might once have fooled large numbers of people, he argued: "They no longer have the power to shock."

Tim Hwang, director of the Harvard-MIT Ethics and Governance of AI Initiative, agreed that deepfakes haven't proven as dangerous as once feared, although for different reasons.

Hwang argued that users of "active measures" (efforts to sow misinformation and influence public opinion) can be much more effective with cheaper, simpler and just as devious types of fakes — mis-captioning a photo or turning it into a meme, for example.

Influence specialists working for Russia and other governments also imitate Americans on Facebook, for another example, worming their way into real Americans' political activities to amplify disagreements or, in some cases, try to persuade people not to vote.

Other researchers have suggested this work continues on social networks and has become more difficult to detect.

Defense is stronger than attack

Hwang also observed that the more deepfakes are made, the better machine learning becomes at detecting them.

A very sophisticated, real-looking fake video might still be effective in a political context, he acknowledged — and at a cost to create of around $10,000, it would be easily within the means of a government's active measures specialists.

But the risks of attempting a major disruption with such a video may outweigh an adversary's desire to use one. People may be too media literate, as Giles argued, and the technology to detect a fake may mean it can be deflated too swiftly to have an effect, as Hwang said.

"I tend to be skeptical these will have a large-scale impact over time," he said.

One technology boss told NPR in an interview last year that years' worth of work on corporate fraud protection systems has given an edge to detecting fake media.

"This is not a static field. Obviously, on our end we've performed all sorts of great advances over this year in advancing our technology, but these synthetic voices are advancing at a rapid pace," said Brett Beranek, head of security business for the technology firm Nuance. "So we need to keep up."

Beranek described how systems developed to detect telephone fraudsters could be applied to verify the speech in a fake clip of video or audio.

Corporate clients that rely on telephone voice systems must be wary about people attempting to pose as others with artificial or disguised voices. Beranek's company sells a product that helps to detect them and that countermeasure also works well in detecting fake audio or video.

Machines using neural networks can detect known types of synthetic voices. Nuance also says it can analyze a recording of a real known voice — say, that of a politician — and then contrast its characteristics against a suspicious recording.

Although the world of cybersecurity is often described as one in which attackers generally have an edge over defenders, Beranek said he thought the inverse was true in terms of this kind of fraud detection.

"For the technology today, the defense side is significantly ahead of the attack side," he said.

Shaping the battlefield

Hwang and Giles acknowledged in the NATO video conference that deepfakes likely will proliferate and become lower in cost to create, perhaps becoming simple enough to make with a smartphone app.

One prospective response is the creation of more of what Hwang called "radioactive data" — material earmarked in advance so that it might make a fake easier to detect.

If images of a political figure were so tagged beforehand, they could be spotted quickly if they were incorporated by computers into a deceptive video.

Also, the sheer popularity of new fakes, if that is what happens, might make them less valuable as a disinformation weapon. More people could become more familiar with them, as well as being detectable by automated systems — plus they may also have no popular medium on which to spread.

Big social media platforms already have declared affirmatively that they'll take down deceptive fakes, Hwang observed. That might make it more difficult for a scenario in which a politically charged fake video went viral just before Election Day.

"Although it might get easier and easier to create deepfakes, a lot of the places where they might spread most effectively, your Facebooks and Twitters of the world, are getting a lot more aggressive about taking them down," Hwang said.

That won't stop them, but it might mean they'll be relegated to sites with too few users to have a major effect, he said.

"They'll percolate in these more shady areas."

Copyright 2020 NPR. To see more, visit https://www.npr.org.

This content is from Southern California Public Radio. View the original story at SCPR.org.




audio

Audio Clipping




audio

Should I be concerned that "WsAudioDevice_383S(1)" is UNSIGNED?




audio

Better audio meetings from BT MeetMe with Dolby Voice

Make your audio meetings are more inclusive, easier to participate in and easier to manage. With HD quality voice, noise suppression and voice separation, our new BT MeetMe with Dolby Voice service takes audio conferencing to a different level. And because this is an IP call, it complements your Unified Communications strategy. Integrating with Cisco WebEx and Microsoft Lync, so that you can use it with what you have already invested in and save money on access costs.




audio

Are audiobooks the secret to a calmer dog?

Celebrity dog trainer Cesar Millan and Audible have partnered to offer audiobooks for dogs.




audio

Beyonce's use of Challenger disaster audio clip upsets astronauts' families

Beyoncé ended the year on a sour note with members of the NASA community.




audio

Online audio book rental services - Who needs it anyway?

So what exactly are these online audio book rental services and will you get benefit from them?. Read on and find out the answers for your questions...




audio

Are audio books expensive?

Are audio books expensive? Well, it's a tough question. What kind of audio books - downloaded audio books, audio books on CD or books on tape, which titles? And compared to what - the old fashioned book or to other audio titles? Let us examine the prices and then take a look at the factors that affect the price of audio books...




audio

High-End Audio Surveillance Equipment Uncovered

This article will familiarize your with the technology behind audio surveillance. What phone bugs there exist? How is it possible for parents to spy on their teens and even know where they are at them moment with unsusceptible gadgets.




audio

JBL Reflect Fit Superbly Designed Sports Earphones with Excellent Audio Performance and Accurate Heart Rate Sensing Technology

Using probably the World's Smallest Heart Rate Sensing Technology (ActivHearts)




audio

American Council of the Blind Presents Audio Description Award to Dr. Brett Oppegaard, University of Hawaii

Award Honors Work in Audio Description with National Park Service




audio

DGI Communications Acquires ACT Associates; Strengthens Its National Leadership in Audio Visual Design and Consulting

DGI Communications continues growth as they join the wave of AV industry consolidation and expansion.




audio

Ondesoft 2017 Halloween Treats - Audio Recorder Giveaway and iTunes DRM Removal Tools 50% OFF

Ondesoft Software offers giveaway of Ondesoft Audio Recorder and a 50% discount on iTunes DRM Audio/Video Converter for both Mac and Windows during 2017 Halloween season.




audio

COMMONTARY Receives Patent For New Alternate Audio Syncing Technology

Patent Covers Ability for Replacing a Portion of Video Broadcast Audio With a Customized, Live-Streaming Alternate Media Stream




audio

Microsoft/Xbox Receives ACB's 2019 Achievement Award in Audio Description-Media

Inside Xbox events include audio description for blind, visually impaired




audio

375- Audio Guide to the Imperfections of a Perfect Masterpiece

To help celebrate its 60th anniversary, the Guggenheim Museum teamed up with 99% Invisible to offer visitors a guided audio experience of the museum. Even if you've never been to the Guggenheim Museum, you probably recognize it. From the outside, the building is a light gray spiral, and from the inside, the art is displayed on one long ramp that curves up towards a glass skylight in the ceiling. We’re going to take the greatness of this building as a given. What we’re going to focus on are the oddities, the accretions, the interventions that reveal a different kind of genius. Not just the genius of Frank Lloyd Wright, and his bold, original vision, but the genius of all the people that made this building function, adapt, and grow over the decades.

Audio Guide to the Imperfections of a Perfect Masterpiece




audio

Video: Behind the beautiful audio of PSVR game Paper Beast

In this 2020 GDC Virtual Talk, Pixel Reef's Clement Duquesne shares how they crafted the detailed systems that breathe aural life into Paper Beast's animated creatures and dynamic landscapes. ...




audio

Video: Behind the beautiful audio of PSVR game Paper Beast

In this 2020 GDC Virtual Talk, Pixel Reef's Clement Duquesne shares how they crafted the detailed systems that breathe aural life into Paper Beast's animated creatures and dynamic landscapes. ...




audio

Edison Research, NPR Release 2020 Smart Audio Report

EDISON RESEARCH and NPR released the findings in its 2020 Smart Audio Report on smart speaker and voice-controlled device usage THURSDAY (4/30)  in a webinar hosted by EDISON's TOM … more




audio

TopLine By Futuri Presents Nielsen Audio April '20 PPMs Released Monday

NIELSEN AUDIO PPM APRIL '20 MONTHLY results arrive MONDAY, MAY 11th for NEW YORK; LOS ANGELES; CHICAGO; SAN FRANCISCO; DALLAS; HOUSTON; PHILADELPHIA; ATLANTA; NASSAU-SUFFOLK; … more




audio

AudioSweets Make New PopCore Volume Available

AUDIOSWEETS has released the latest in its imaging POPCORE series, POPCORE VOL. 14 from ASX. POPCORE VOL. 14 features 220 imaging elements with 11 categories in the update including Artist … more




audio

John Harrington-WHAT WE USE - Audio and sound kit

Here's a video segment on the Audio and Sound Kit that we use. A transcription of the video is available after the jump.




audio

John Harrington-WHAT WE USE - Audio Entertainment Kit

Here's a video segment on the Audio Entertainment Kit that we use. A transcription of the video is available after the jump.




audio

TopLine By Futuri Presents Nielsen Audio March '20 Ratings Released Today

NIELSEN AUDIO MARCH '20 results arrive TODAY for SYRACUSE; AKRON; MONTEREY-SALINAS-SANTA CRUZ; and CHARLESTON, SC. Find the ratings for the subscribing stations in the ALLACCESS.COM … more




audio

TopLine By Futuri Presents Nielsen Audio March '20 Ratings Released Today

NIELSEN AUDIO MARCH '20 results arrive TODAY for DES MOINES; COLORADO SPRINGS; MOBILE; WICHITA; and SPOKANE. Find the ratings for the subscribing stations in the ALLACCESS.COM NIELSEN … more




audio

TopLine By Futuri Presents Nielsen Audio March '20 Ratings Released Today

NIELSEN AUDIO MARCH '20 arrive TODAY for CHATTANOOGA; MADISON; HUNTSVILLE, AL; and JACKSON, MS. Find the ratings for the subscribing stations in the ALLACCESS.COM NIELSEN AUDIO  … more




audio

WordPress Audio Player Plugin

I recently went looking for a good audio player for WordPress. I came across WPAudioPlayer from 1 pixel out. The plugin is extremely simple to use and has a really awesome automatic color detention tool which will match to your site with ease. For more info visit the demo page at http://www.1pixelout.net/code/audio-player-wordpress-plugin/

The post WordPress Audio Player Plugin appeared first on WPCult.




audio

Facebook Live Streaming and Audio/Video Hosting connected to Auphonic

Facebook is not only a social media giant, the company also provides valuable tools for broadcasting. Today we release a connection to Facebook, which allows to use the Facebook tools for video/audio production and publishing within Auphonic and our connected services.

The following workflows are possible with Facebook and Auphonic:
  • Use Facebook for live streaming, then import, process and distribute the audio/video with Auphonic.
  • Post your Auphonic audio or video productions directly to the news feed of your Facebook Page or User.
  • Use Facebook as a general media hosting service and share the link or embed the audio/video on any webpage (also visible to non-Facebook users).

Connect to Facebook

First you have to connect to a Facebook account at our External Services Page, click on the "Facebook" button.

Select if you want to connect to your personal Facebook User or to a Facebook Page:

It is always possible to remove or edit the connection in your Facebook Settings (Tab Business Integrations).

Import (Live) Videos from Facebook to Auphonic

Facebook Live is an easy (and free) way to stream live videos:

We implemented an interface to use Facebook as an Incoming External Service. Please select a (live or non-live) video from your Facebook Page/User as the source of a production and then process it with Auphonic:

This workflow allows you to use Facebook for live streaming, import and process the audio/video with Auphonic, then publish a podcast and video version of your live video to any of our connected services.

Export from Auphonic to Facebook

Similar to Youtube, it is possible to use Facebook for media file hosting.
Please add your Facebook Page/User as an External Service in your Productions or Presets to upload the Auphonic results directly to Facebook:

Options for the Facebook export:
  • Distribution Settings
    • Post to News Feed: The exported video is posted directly to your news feed / timeline.
    • Exclude from News Feed: The exported video is visible in the videos tab of your Facebook Page/User (see for example Auphonic's video tab), but it is not posted to your news feed (you can do that later if you want).
    • Secret: Only you can see the exported video, it is not shown in the Facebook video tab and it is not posted to your news feed (you can do that later if you want).
  • Embeddable
    Choose if the exported video should be embeddable in third-party websites.

It is always possible to change the distribution/privacy and embeddable options later directly on Facebook. For example, you can export a video to Facebook as Secret and publish it to your news feed whenever you want.


If your production is audio-only, we automatically generate a video track from the Cover Image and (possible) Chapter Images.
Alternatively you can select an Audiogram Output File, if you want to add an Audiogram (audio waveform visualization) to your Facebook video - for details please see Auphonic Audiogram Generator.

Auphonic Title and Description metadata fields are exported to Facebook as well.
If you add Speech Recognition to your production, we create an SRT file with the speech recognition results and add it to your Facebook video as captions.
See the example below.

Facebook Video Hosting Example with Audiogram and Automatic Captions

Facebook can be used as a general video hosting service: even if you export videos as Secret, you will get a direct link to the video which can be shared or embedded in any third-party websites. Users without a Facebook account are also able to view these videos.

In the example below, we automatically generate an Audiogram Video for an audio-only production, use our integrated Speech Recognition system to create captions and export the video as Secret to Facebook.
Afterwards it can be embedded directly into this blog post (enable Captions if they don't show up per default) - for details please see How to embed a video:

It is also possible to just use the generated result URL from Auphonic to share the link to your video (also visible to non-Facebook users):
https://www.facebook.com/auphonic/videos/1687244844638091/

Important Note:
Facebook needs some time to process an exported video (up to a few minutes) and the direct video link won't work before the processing is finished - please try again a bit later!
On Facebook Pages, you can see the processing progress in your Video Library.

Conclusion

Facebook has many broadcasting tools to offer and is a perfect addition to Auphonic.
Both systems and our other external services can be used to create automated processing and publishing workflows. Furthermore, the export and import to/from Facebook is also fully supported in the Auphonic API.

Please contact us if you have any questions or further ideas!




audio

Auphonic Audio Inspector Release

At the Subscribe 9 Conference, we presented the first version of our new Audio Inspector:
The Auphonic Audio Inspector is shown on the status page of a finished production and displays details about what our algorithms are changing in audio files.

A screenshot of the Auphonic Audio Inspector on the status page of a finished Multitrack Production.
Please click on the screenshot to see it in full resolution!

It is possible to zoom and scroll within audio waveforms and the Audio Inspector might be used to manually check production result and input files.

In this blog post, we will discuss the usage and all current visualizations of the Inspector.
If you just want to try the Auphonic Audio Inspector yourself, take a look at this Multitrack Audio Inspector Example.

Inspector Usage

Control bar of the Audio Inspector with scrollbar, play button, current playback position and length, button to show input audio file(s), zoom in/out, toggle legend and a button to switch to fullscreen mode.

Seek in Audio Files
Click or tap inside the waveform to seek in files. The red playhead will show the current audio position.
Zoom In/Out
Use the zoom buttons ([+] and [-]), the mouse wheel or zoom gestures on touch devices to zoom in/out the audio waveform.
Scroll Waveforms
If zoomed in, use the scrollbar or drag the audio waveform directly (with your mouse or on touch devices).
Show Legend
Click the [?] button to show or hide the Legend, which describes details about the visualizations of the audio waveform.
Show Stats
Use the Show Stats link to display Audio Processing Statistics of a production.
Show Input Track(s)
Click Show Input to show or hide input track(s) of a production: now you can see and listen to input and output files for a detailed comparison. Please click directly on the waveform to switch/unmute a track - muted tracks are grayed out slightly:

Showing four input tracks and the Auphonic output of a multitrack production.

Please click on the fullscreen button (bottom right) to switch to fullscreen mode.
Now the audio tracks use all available screen space to see all waveform details:

A multitrack production with output and all input tracks in fullscreen mode.
Please click on the screenshot to see it in full resolution.

In fullscreen mode, it’s also possible to control playback and zooming with keyboard shortcuts:
Press [Space] to start/pause playback, use [+] to zoom in and [-] to zoom out.

Singletrack Algorithms Inspector

First, we discuss the analysis data of our Singletrack Post Production Algorithms.

The audio levels of output and input files, measured according to the ITU-R BS.1770 specification, are displayed directly as the audio waveform. Click on Show Input to see the input and output file. Only one file is played at a time, click directly on the Input or Output track to unmute a file for playback:

Singletrack Production with opened input file.
See the first Leveler Audio Example to try the audio inspector yourself.

Waveform Segments: Music and Speech (gold, blue)
Music/Speech segments are displayed directly in the audio waveform: Music segments are plotted in gold/yellow, speech segments in blue (or light/dark blue).
Waveform Segments: Leveler High/No Amplification (dark, light blue)
Speech segments can be displayed in normal, dark or light blue: Dark blue means that the input signal was very quiet and contains speech, therefore the Adaptive Leveler has to use a high amplification value in this segment.
In light blue regions, the input signal was very quiet as well, but our classifiers decided that the signal should not be amplified (breathing, noise, background sounds, etc.).

Yellow/orange background segments display leveler fades.

Background Segments: Leveler Fade Up/Down (yellow, orange)
If the volume of an input file changes in a fast way, the Adaptive Leveler volume curve will increase/decrease very fast as well (= fade) and should be placed in speech pauses. Otherwise, if fades are too slow or during active speech, one will hear pumping speech artifacts.
Exact fade regions are plotted as yellow (fade up, volume increase) and orange (fade down, volume decrease) background segments in the audio inspector.

Horizontal red lines display noise and hum reduction profiles.

Horizontal Lines: Noise and Hum Reduction Profiles (red)
Our Noise and Hiss Reduction and Hum Reduction algorithms segment the audio file in regions with different background noise characteristics, which are displayed as red horizontal lines in the audio inspector (top lines for noise reduction, bottom lines for hum reduction).
Then a noise print is extracted in each region and a classifier decides if and how much noise reduction is necessary - this is plotted as a value in dB below the top red line.
The hum base frequency (50Hz or 60Hz) and the strength of all its partials is also classified in each region, the value in Hz above the bottom red line indicates the base frequency and whether hum reduction is necessary or not (no red line).

You can try the singletrack audio inspector yourself with our Leveler, Noise Reduction and Hum Reduction audio examples.

Multitrack Algorithms Inspector

If our Multitrack Post Production Algorithms are used, additional analysis data is shown in the audio inspector.

The audio levels of the output and all input tracks are measured according to the ITU-R BS.1770 specification and are displayed directly as the audio waveform. Click on Show Input to see all the input files with track labels and the output file. Only one file is played at a time, click directly into the track to unmute a file for playback:

Input Tracks: Waveform Segments, Background Segments and Horizontal Lines
Input tracks are displayed below the output file including their track names. The same data as in our Singletrack Algorithms Inspector is calculated and plotted separately in each input track:
Output Waveform Segments: Multiple Speakers and Music
Each speaker is plotted in a separate, blue-like color - in the example above we have 3 speakers (normal, light and dark blue) and you can see directly in the waveform when and which speaker is active.
Audio from music input tracks are always plotted in gold/yellow in the output waveform, please try to not mix music and speech parts in music tracks (see also Multitrack Best Practice)!

You can try the multitrack audio inspector yourself with our Multitrack Audio Inspector Example or our general Multitrack Audio Examples.

Ducking, Background and Foreground Segments

Music tracks can be set to Ducking, Foreground, Background or Auto - for more details please see Automatic Ducking, Foreground and Background Tracks.

Ducking Segments (light, dark orange)
In Ducking, the level of a music track is reduced if one of the speakers is active, which is plotted as a dark orange background segment in the output track.
Foreground music parts, where no speaker is active and the music track volume is not reduced, are displayed as light orange background segments in the output track.
Background Music Segments (dark orange background)
Here the whole music track is set to Background and won’t be amplified when speakers are inactive.
Background music parts are plotted as dark organge background segments in the output track.
Foreground Music Segments (light orange background)
Here the whole music track is set to Foreground and its level won’t be reduced when speakers are active.
Foreground music parts are plotted as light organge background segments in the output track.

You can try the ducking/background/foreground audio inspector yourself: Fore/Background/Ducking Audio Examples.

Audio Search, Chapters Marks and Video

Audio Search and Transcriptions
If our Automatic Speech Recognition Integration is used, a time-aligned transcription text will be shown above the waveform. You can use the search field to search and seek directly in the audio file.
See our Speech Recognition Audio Examples to try it yourself.
Chapters Marks
Chapter Mark start times are displayed in the audio waveform as black vertical lines.
The current chapter title is written above the waveform - see “This is Chapter 2” in the screenshot above.

A video production with output waveform, input waveform and transcriptions in fullscreen mode.
Please click on the screenshot to see it in full resolution.

Video Display
If you add a Video Format or Audiogram Output File to your production, the audio inspector will also show a separate video track in addition to the audio output and input tracks. The video playback will be synced to the audio of output and input tracks.

Supported Audio Formats

We use the native HTML5 audio element for playback and the aurora.js javascript audio decoders to support all common audio formats:

WAV, MP3, AAC/M4A and Opus
These formats are supported in all major browsers: Firefox, Chrome, Safari, Edge, iOS Safari and Chrome for Android.
FLAC
FLAC is supported in Firefox, Chrome, Edge and Chrome for Android - see FLAC audio format.
In Safari and iOS Safari, we use aurora.js to directly decode FLAC files in javascript, which works but uses much more CPU compared to native decoding!
ALAC
ALAC is not supported by any browser so far, therefore we use aurora.js to directly decode ALAC files in javascript. This works but uses much more CPU compared to native decoding!
Ogg Vorbis
Only supported by Firefox, Chrome and Chrome for Android - for details please see Ogg Vorbis audio format.

We suggest to use a recent Firefox or Chrome browser for best performance.
Decoding FLAC and ALAC files also works in Safari and iOS with the help of aurora.js, but javascript decoders need a lot of CPU and they sometimes have problems with exact scrolling and seeking.

Please see our blog post Audio File Formats and Bitrates for Podcasts for more details about audio formats.

Mobile Audio Inspector

Multiple responsive layouts were created to optimize the screen space usage on Android and iOS devices, so that the audio inspector is fully usable on mobile devices as well: tap into the waveform to set the playhead location, scroll horizontally to scroll waveforms, scroll vertically to scroll between tracks, use zoom gestures to zoom in/out, etc.

Unfortunately the fullscreen mode is not available on iOS devices (thanks to Apple), but it works on Android and is a really great way to inspect everything using all the available screen space:

Audio inspector in horizontal fullscreen mode on Android.

Conclusion

Try the Auphonic Audio Inspector yourself: take a look at our Audio Example Page or play with the Multitrack Audio Inspector Example.

The Audio Inspector will be shown in all productions which are created in our Web Service.
It might be used to manually check production result/input files and to send us detailed feedback about audio processing results.

Please let us know if you have some feedback or questions - more visualizations will be added in future!







audio

Audio Manipulations and Dynamic Ad Insertion with the Auphonic API

We are pleased to announce a new Audio Inserts feature in the Auphonic API: audio inserts are separate audio files (like intros/outros), which will be inserted into your production at a defined offset.
This blog post shows how one can use this feature for Dynamic Ad Insertion and discusses other Audio Manipulation Methods of the Auphonic API.

API-only Feature

For the general podcasting hobbyist, or even for someone producing a regular podcast, the features that are accessible via our web interface are more than sufficient.

However, some of our users, like podcasting companies who integrate our services as part of their products, asked us for dynamic ad insertions. We teamed up with them to develop a way of making this work within the Auphonic API.

We are pleased therefore to announce audio inserts - a new feature that has been made part of our API. This feature is not available through the web interface though, it requires the use of our API.

Before we talk about audio inserts, let's talk about what you need to know about dynamic ad insertion!

Dynamic Ad Insertion

There are two ways of dealing with adverts within podcasts. In the first, adverts are recorded or edited into the podcast and are fixed, or baked in. The second method is to use dynamic insertion, whereby the adverts are not part of the podcast recording/file but can be inserted into the podcast afterwards, at any time.

This second approach would allow you to run new ad campaigns across your entire catalog of shows. As a podcaster this allows you to potentially generate new revenue from your old content.

As a hosting company, dynamic ad insertion allows you to choose up to date and relevant adverts across all the podcasts you host. You can make these adverts relevant by subject or location, for instance.

Your users can define the time for the ads and their podcast episode, you are then in control of the adverts you insert.

Audio Inserts in Auphonic

Whichever approach to adverts you are taking, using audio inserts can help you.

Audio inserts are separate audio files which will be inserted into your main single or multitrack production at your defined offset (in seconds).

When a separate audio file is inserted as part of your production, it creates a gap in the podcast audio file, shifting the audio back by the length of the insert. Helpfully, chapters and other time-based information like transcriptions are also shifted back when an insert is used.

The biggest advantage of this is that Auphonic will apply loudness normalization to the audio insert so, from an audio point of view, it matches the rest of the podcast.

Although created with dynamic ad insertion in mind, this feature can be used for any type of audio inserts: adverts, music songs, individual parts of a recording, etc. In the case of baked-in adverts, you could upload your already processed advert audio as an insert, without having to edit it into your podcast recording using a separate audio editing application.

Please note that audio inserts should already be edited and processed before using them in production. (This is usually the case with pre-recorded adverts anyway). The only algorithm that Auphonic applies to an audio insert is loudness normalization in order to match the loudness of the entire production. Auphonic does not add any other processing (i.e. no leveling, noise reduction etc).

Audio Inserts Coding Example

Here is a brief overview of how to use our API for audio inserts. Be warned, this section is coding heavy, so if this isn't your thing, feel free to move along to the next section!

You can add audio insert files with a call to https://auphonic.com/api/production/{uuid}/multi_input_files.json, where uuid is the UUID of your production.
Here is an example with two audio inserts from an https URL. The offset/position in the main audio file must be given in seconds:

curl -X POST -H "Content-Type: application/json" 
    https://auphonic.com/api/production/{uuid}/multi_input_files.json 
    -u username:password 
    -d '[
            {
                "input_file": "https://mydomain.com/my_audio_insert_1.wav",
                "type": "insert",
                "offset": 20.5
            },
            {
                "input_file": "https://mydomain.com/my_audio_insert_2.wav",
                "type": "insert",
                "offset": 120.3
            }
        ]'

More details showing how to use audio inserts in our API can be seen here.

Additional API Audio Manipulations

In addition to audio inserts, using the Auphonic API offers a number of other audio manipulation options, which are not available via the web interface:

Cut start/end of audio files: See Docs
In Single-track productions, this feature allows the user to cut the start and/or the end of the uploaded audio file. Crucially, time-based information such as chapters etc. will be shifted accordingly.
Fade In/Out time of audio files: See Docs
This allows you to set the fade in/out time (in ms) at the start/end of output files. The default fade time is 100ms, but values can be set between 0ms and 5000ms.
This feature is also available in our Auphonic Leveler Desktop App.
Adding intro and outro: See Docs
Automatically add intros and outros to your main audio input file, as it is also available in our web interface.
Add multiple intros or outros: See Docs
Using our API, you can also add multiple intros or outros to a production. These intros or outros are played in series.
Overlapping intros/outros: See Docs
This feature allows intros/outros to overlap either the main audio or the following/previous intros/outros.

Conclusion

If you haven't explored our API already, the new audio inserts feature allows for greater flexibility and also dynamic ad insertion.
If you offer online services to podcasters, the Auphonic API would also then allow you to pass on Auphonic's audio processing algorithms to your customers.

If this is of interest to you or you have any new feature suggestions that you feel could benefit your company, please get in touch. We are always happy to extend the functionality of our products!







audio

Leveler Presets, LRA Target and Advanced Audio Parameters (Beta)

Lots of users have asked us about more customization and control over the sound of our audio algorithms in the past, so today, we have introduced some advanced algorithm parameters for our singletrack version in a private beta program!

The following new parameters are available:

UPDATE Nov. 2018:
We released a complete rework of the Adaptive Leveler parameters and the description here is not valid anymore!
Please see Auphonic Adaptive Leveler Customization (Beta Update)!

Please join our private beta program and let us know how you use these new features or if you need even more control!

Leveler Presets

Our Adaptive Leveler corrects level differences between speakers, between music and speech and will also apply dynamic range compression to achieve a balanced overall loudness. If you don't know about the Leveler yet, take a look at our Audio Examples.

Leveler presets are basically complete new leveling algorithms, which we have been working on in the past few months:
Our current Leveler tries to normalize all speakers to the same loudness. However, in some cases, you might want more or less loudness differences (dynamic range / loudness range) between the speakers and music segments, or more or less compression, etc.
For these use cases, we have developed additional Leveler Presets and the parameter Maximum Loudness Range.

The following Leveler presets are now available:
Preset Medium:
This is our current leveling algorithm as demonstrated in the Audio Examples.
Preset Hard:
The hard preset reacts faster and applies more gain and compression compared to the medium preset. It is built for recordings with extreme loudness differences, for example very quiet questions from the audience in a lecture recording, extremely soft and loud voices within one audio track, etc.
Preset Soft:
This preset reacts slower, applies less gain and compression compared to the medium preset. Use it if you want to keep more loudness differences (dynamic narration), if you want your voices to sound "less compressed/processed", for dynamic music (concert/classical recordings), background music, etc.
Preset Softer:
Like soft, but softer :)
Preset Speech Medium, Music Soft:
Uses the medium preset in speech segments and the soft preset in music segments. It is built for music live recordings or dynamic music mixes, where you want to amplify all speakers but keep the loudness differences within and between music segments.
Preset Medium, No Compressor:
Like the medium preset, but only (mid-term) leveling and no (short-term) compression is applied. This preset is optimal if you just use a Maximum Loudness Range Target and want to avoid any additional compression as much as possible.
Please let us know your use case, if you need more/other controls or if anything is confusing. The Leveler presets are still in private beta and can be changed as necessary!

Maximum Loudness Range (LRA) Target

The loudness range (LRA) indicates the variation of loudness over the course of a program and is measured in LU (loudness units) - for more details see Loudness Measurement and Normalization or EBU Tech 3342.

The parameter Max Loudness Range controls how much leveling is applied:
volume changes of our Adaptive Leveler will be restricted so that the loudness range of the output file is below the selected value.
High loudness range values will result in very dynamic output files, low loudness range values in compressed output audio. If the LRA value of your input file is already below the maximum loudness range value, no leveling at all will be applied.

It is also important which Leveler Preset you select, for example, if you use the soft(er) preset, it won't be possible to achieve very low loudness range targets.

Also, the Max Loudness Range parameter is not such a precise target value as the Loudness Target. The LRA of your output file might be off a few LU, as it is not reasonable to reach the exact target value.

Use Cases: The Maximum LRA parameter allows you to control the strength of our leveling algorithms, in combination with the parameter Leveler Preset. This might be used for automatic mixdowns with different LRA values for different target platforms (very compressed ones like mobile devices or Alexa, very dynamic ones like home cinema, etc.).

Maximum True Peak Level

This parameter sets the maximum allowed true peak level of the processed output file, which is controlled by the True Peak Limiter after our Global Loudness Normalization algorithms.

If set to Auto (which is the current default), a reasonable value according to the selected loudness target is used: -1dBTP for 23 LUFS (EBU R128) and higher, -2dBTP for -24 LUFS (ATSC A/85) and lower loudness targets.

The maximum true peak level parameter is already available in our desktop program.

Better Hum and Noise Reduction Controls

In addition to the parameter (Noise) Reduction Amount, we now offer two more parameters to control the combination of our Noise and Hum Reduction algorithms:
Hum Base Frequency:
Set the hum base frequency to 50Hz or 60Hz (if you know it), or use Auto to automatically detect the hum base frequency in each speech region.
Hum Reduction Amount:
Maximum hum reduction amount in dB, higher values remove more noise.
In Auto mode, a classifier decides how much hum reduction is necessary in each speech region. Set it to a custom value (> 0), if you prefer more hum reduction or want to bypass our classifier. Use Disable Dehum to disable hum reduction and use our noise reduction algorithms only.

Behavior of noise and hum reduction parameter combinations:

Noise Reduction Amount Hum Base Frequency Hum Reduction Amount
Auto Auto Auto Automatic hum and noise reduction
Auto or > 0 * Disabled No hum reduction, only denoise
Disabled 50Hz Auto or > 0 Force 50Hz hum reduction, no denoise
Disabled Auto Auto or > 0 Automatic dehum, no denoise
12dB 60Hz Auto or > 0 Always do dehum (60Hz) and denoise (12dB)

Advanced Parameters Private Beta and Feedback

At the moment the advanced algorithm parameters are for beta users only. This is to allow us to get user feedback, so we can change the parameters to suit user needs.
Please let us know your case studies, if you need any other algorithm parameters or if you have any questions!

Here are some private beta invitation codes:

y6KCBI4yo0 ksIFEsmI1y BDZec2a21V i4XRGLlVm2 0UDxuS0vbu aaBxi35sKN aaiDSZUbmY bu8lPF80Ih eMsSl6Sf8K DaWpsUnyjo
2YM00m8zDW wh7K2pPmSa jCX7mMy2OJ ZGvvhzCpTF HI0lmGhjVO eXqVhN6QLU t4BH0tYcxY LMjQREVuOx emIogTCAth 0OTPNB7Coz
VIFY8STj2f eKzRSWzOyv 40cMMKKCMN oBruOxBkqS YGgPem6Ne7 BaaFG9I1xZ iSC0aNXoLn ZaS4TykKIa l32bTSBbAx xXWraxS40J
zGtwRJeAKy mVsx489P5k 6SZM5HjkxS QmzdFYOIpf 500AHHtEFA 7Kvk6JRU66 z7ATzwado6 4QEtpzeKzC c9qt9Z1YXx pGSrDzbEED
MP3JUTdnlf PDm2MOLJIG 3uDietVFSL 1i7jZX0Y9e zPkSgmAqqP 5OhcmHIZUP E0vNsPxZ4s FzTIyZIG2r 5EywA0M7r5 FMhpcFkVN5
oRLbRGcRmI 2LTh8GlN7h Cjw6Z3cveP fayCewjE55 GbkyX89Lxu 4LpGZGZGgc iQV7CXYwkH pGLyQPgaha e3lhKDRUMs Skrei1tKIa
We are happy to send further invitation codes to all interested users - please do not hesitate to contact us!

If you have an invitation code, you can enter it here to activate the advanced audio algorithm parameters:
Auphonic Algorithm Parameters Private Beta Activation







audio

Advanced Multitrack Audio Algorithms Release (Beta)

Last weekend, at the Subscribe10 conference, we released Advanced Audio Algorithm Parameters for Multitrack Productions:

We launched our advanced audio algorithm parameters for Singletrack Productions last year. Now these settings (and more) are available for Multitrack Algorithms as well, which gives you detailed control for each track of your production.

The following new parameters are available:

Please join our private beta program and let us know how you use these new features or if you need even more control!

Fore/Background Settings

The parameter Fore/Background controls whether a track should be in foreground, in background, ducked, or unchanged, which is especially important for music or clip tracks.
For more details, please see Automatic Ducking, Foreground and Background Tracks .

We now added the new option Unchanged and a new parameter to set the level of background segments/tracks:
Unchanged (Foreground):
We sometimes received complaints from users, which produced very complex music or clip tracks, that Auphonic changes the levels too hard.
If you set the parameter Fore/Background to the new option Unchanged (Foreground), Level relations within this track won’t be changed at all. It will be added to the final mixdown so that foreground/solo parts of this track will be as loud as (foreground) speech from other tracks.
Background Level:
It is now possible to set the level of background segments/tracks (compared to foreground segments) in background and ducking tracks. By default, background and ducking segments are 18dB softer than foreground segments.

Leveler Parameters

Similar to our Singletrack Advanced Leveler Parameters (see this previous blog post), we also released leveling parameters for Multitrack Productions now.
The following advanced parameters for our Multitrack Adaptive Leveler can be set for each track and allow you to customize which parts of the audio should be leveled, how much they should be leveled, how much dynamic range compression should be applied and to set the stereo panorama (balance):

Leveler Preset:
Select the Speech or Music Leveler for this track.
If set to Automatic (default), a classifier will decide if this is a music or speech track.
Dynamic Range:
The parameter Dynamic Range controls how much leveling is applied: Higher values result in more dynamic output audio files (less leveling). If you want to increase the dynamic range by 3dB (or LU), just increase the Dynamic Range parameter by 3dB.
For more details, please see Multitrack Leveler Parameters.
Compressor:
Select a preset for Micro-Dynamics Compression: Auto, Soft, Medium, Hard or Off.
The Compressor adjusts short-term dynamics, whereas the Leveler adjusts mid-term level differences.
For more details, please see Multitrack Leveler Parameters.
Stereo Panorama (Balance):
Change the stereo panorama (balance for stereo input files) of the current track.
Possible values: L100, L75, L50, L25, Center, R25, R50, R75 and R100.

If you understand German and want to know more about our Advanced Leveler Parameters and audio dynamics in general, watch our talk at the Subscribe10 conference:
Video: Audio Lautheit und Dynamik.

Better Hum and Noise Reduction Controls

We now offer three parameters to control the combination of our Multitrack Noise and Hum Reduction Algorithms for each input track:
Noise Reduction Amount:
Maximum noise and hum reduction amount in dB, higher values remove more noise.
In Auto mode, a classifier decides if and how much noise reduction is necessary (to avoid artifacts). Set to a custom (non-Auto) value if you prefer more noise reduction or want to bypass our classifier.
Hum Base Frequency:
Set the hum base frequency to 50Hz or 60Hz (if you know it), or use Auto to automatically detect the hum base frequency in each speech region.
Hum Reduction Amount:
Maximum hum reduction amount in dB, higher values remove more noise.
In Auto mode, a classifier decides how much hum reduction is necessary in each speech region. Set it to a custom value (> 0), if you prefer more hum reduction or want to bypass our classifier. Use Disable Dehum to disable hum reduction and use our noise reduction algorithms only.

Behavior of noise and hum reduction parameter combinations:

Noise Reduction Amount Hum Base Frequency Hum Reduction Amount
Auto Auto Auto Automatic hum and noise reduction
Auto or > 0 * Disabled No hum reduction, only denoise
Disabled 50Hz Auto or > 0 Force 50Hz hum reduction, no denoise
Disabled Auto Auto or > 0 Automatic dehum, no denoise
12dB 60Hz Auto or > 0 Always do dehum (60Hz) and denoise (12dB)

Maximum True Peak Level

In the Master Algorithm Settings of your multitrack production, you can set the maximum allowed true peak level of the processed output file, which is controlled by the True Peak Limiter after our Loudness Normalization algorithms.

If set to Auto (which is the current default), a reasonable value according to the selected loudness target is used: -1dBTP for 23 LUFS (EBU R128) and higher, -2dBTP for -24 LUFS (ATSC A/85) and lower loudness targets.

Full API Support

All advanced algorithm parameters, for Singletrack and Multitrack Productions, are available in our API as well, which allows you to integrate them into your scripts, external workflows and third-party applications.

Singletrack API:
Documentation on how to use the advanced algorithm parameters in our singletrack production API: Advanced Algorithm Parameters
Multitrack API:
Documentation of advanced settings for each track of a multitrack production:
Multitrack Advanced Audio Algorithm Settings

Join the Beta and Send Feedback

Please join our beta and let us know your case studies, if you need any other algorithm parameters or if you have any questions!

Here are some private beta invitation codes:

8tZPc3T9pH VAvO8VsDg9 0TwKXBW4Ni kjXJMivtZ1 J9APmAAYjT Zwm6HabuFw HNK5gF8FR5 Do1MPHUyPW CTk45VbV4t xYOzDkEnWP
9XE4dZ0FxD 0Sl3PxDRho uSoRQxmKPx TCI62OjEYu 6EQaPYs7v4 reIJVOwIr8 7hPJqZmWfw kti3m5KbNE GoM2nF0AcN xHCbDC37O5
6PabLBRm9P j2SoI8peiY olQ2vsmnfV fqfxX4mWLO OozsiA8DWo weJw0PXDky VTnOfOiL6l B6HRr6gil0 so0AvM1Ryy NpPYsInFqm
oFeQPLwG0k HmCOkyaX9R G7DR5Sc9Kv MeQLSUCkge xCSvPTrTgl jyQKG3BWWA HCzWRxSrgW xP15hYKEDl 241gK62TrO Q56DHjT3r4
9TqWVZHZLE aWFMSWcuX8 x6FR5OTL43 Xf6tRpyP4S tDGbOUngU0 5BkOF2I264 cccHS0KveO dT29cF75gG 2ySWlYp1kp iJWPhpAimF
We are happy to send further invitation codes to all interested users - please do not hesitate to contact us!

If you have an invitation code, you can enter it here to activate the Multitrack Advanced Audio Algorithm Parameters:
Auphonic Algorithm Parameters Private Beta Activation







audio

Dynamic Range Processing in Audio Post Production

If listeners find themselves using the volume up and down buttons a lot, level differences within your podcast or audio file are too big.
In this article, we are discussing why audio dynamic range processing (or leveling) is more important than loudness normalization, why it depends on factors like the listening environment and the individual character of the content, and why the loudness range descriptor (LRA) is only reliable for speech programs.

Photo by Alexey Ruban.

Why loudness normalization is not enough

Everybody who has lived in an apartment building knows the problem: you want to enjoy a movie late at night, but you're constantly on the edge - not only because of the thrilling story, but because your index finger is hovering over the volume down button of your remote. The next loud sound effect is going to come sooner rather than later, and you want to avoid waking up your neighbors with some gunshot sounds blasting from your TV.

In our previous post, we talked about the overall loudness of a production. While that's certainly important to keep in mind, the loudness target is only an average value, ignoring how much the loudness varies within a production. The loudness target of your movie might be in the ideal range, yet the level differences between a gunshot and someone whispering can still be enormous - having you turn the volume down for the former and up for the latter.

While the average loudness might be perfect, level differences can lead to an unpleasant listening experience.

Of course, this doesn't apply to movies alone. The image above shows a podcast or radio production. The loud section is music, the very quiet section just breathing, and the remaining sections are different voices.

To be clear, we're not saying that the above example is problematic per se. There are many situations, where a big difference in levels - a high dynamic range - is justified: for instance, in a movie theater, optimized for listening and without any outside noise, or in classical music.
Also, if the dynamic range is too small, listening can be tiring.

But if you watch the same movie in an outdoor screening in the summer on a beach next to the crashing waves or in the middle of a noisy city, it can be tricky to hear the softer parts.
Spoken word usually has a smaller dynamic range, and if you produce your podcast for a target audience of train or car commuters, the dynamic range should be even smaller, adjusting for the listening situation.

Therefore, hitting the loudness target has less impact on the listening experience than level differences (dynamic range) within one file!
What makes a suitable dynamic range does not only depend on the listening environment, but also on the nature of the content itself. If the dynamic range is too small, the audio can be tiring to listen to, whereas more variability in levels can make a program more interesting, but might not work in all environments, such as a noisy car.

Dynamic range experiment in a car

Wolfgang Rein, audio technician at SWR, a public broadcaster in Germany, did an experiment to test how drivers react to programs with different dynamic ranges. They monitored to what level drivers set the car stereo depending on speed (thus noise level) and audio dynamic range.
While the results are preliminary, it seems like drivers set the volume as low as possible so that they can still understand the content, but don't get distracted by loud sounds.

As drivers adjust the volume to the loudest voice in a program, they won't understand quieter speakers in content with a high dynamic range anymore. To some degree and for short periods of time, they can compensate by focusing more on the radio program, but over time that's tiring. Therefore, if the loudness varies too much, drivers tend to switch to another program rather than adjusting the volume.
Similar results have been found in a study conducted by NPR Labs and Towson University.

On the other hand, the perception was different in pure music programs. When drivers set the volume according to louder parts, they weren't able to hear softer segments or the beginning of a song very well. But that did not matter to them as much and didn't make them want to turn up the volume or switch the program.

Listener's reaction in response to frequent loudness changes. (from John Kean, Eli Johnson, Dr. Ellyn Sheffield: Study of Audio Loudness Range for Consumers in Various Listening Modes and Ambient Noise Levels)

Loudness comfort zone

The reaction of drivers to variable loudness hints at something that BBC sound engineer Mike Thornton calls the loudness comfort zone.

Tests (...) have shown that if the short-term loudness stays within the "comfort zone" then the consumer doesn’t feel the need to reach for the remote control to adjust the volume.
In a blog post, he highlights how the series Blue Planet 2 and Planet Earth 2 might not always have been the easiest to listen to. The graph below shows an excerpt with very loud music, followed by commentary just at the bottom of the green comfort zone. Thornton writes: "with the volume set at a level that was comfortable when the music was playing we couldn’t always hear the excellent commentary from Sir David Attenborough and had to resort to turning on the subtitles to be sure we knew what Sir David was saying!"

Planet Earth 2 Loudness Plot Excerpt. Colored green: comfort zone of +3 to -5LU around the loudness target. (from Mike Thornton: BBC Blue Planet 2 Latest Show In Firing Line For Sound Issues - Are They Right?)

As already mentioned above, a good mix considers the maximum and minimum possible loudness in the target listening environment.
In a movie theater the loudness comfort zone is big (loudness can vary a lot), and loud music is part of the fun, while quiet scenes work just as well. The opposite was true in the aforementioned experiment with drivers, where the loudness comfort zone is much smaller and quiet voices are difficult to understand.

Hence, the loudness comfort zone determines how much dynamic range an audio signal can use in a specific listening environment.

How to measure dynamic range: LRA

When producing audio for various environments, it would be great to have a target value for dynamic range, (the difference between the smallest and largest signal values of an audio signal) as well. Then you could just set a dynamic range target, similarly to a loudness target.

Theoretically, the maximum possible dynamic range of a production is defined by the bit-depth of the audio format. A 16-bit recording can have a dynamic range of 96 dB; for 24-bit, it's 144 dB - which is well above the approx. 120 dB the human ear can handle. However, most of those bits are typically being used to get to a reasonable base volume. Picture a glass of water: you want it to be almost full, with some headroom so that it doesn't spill when there's a sudden movement, i.e. a bigger amplitude wave at the top.

Determining the dynamic range of a production is easier said than done, though. It depends on which signals are included in the measurement: for example, if something like background music or breathing should be considered at all.
The currently preferred method for broadcasting is called Loudness Range, LRA. It is measured in Loudness Units (LU), and takes into account everything between the 10th and the 95th percentile of a loudness distribution, after an additional gating method. In other words, the loudest 5% and quietest 10% of the audio signal are being ignored. This way, quiet breathing or an occasional loud sound effect won't affect the measurement.

Loudness distribution and LRA for the film 'The Matrix'. Figure from EBU Tech Doc 3343 (p.13).

However, the main difficulty is which signals should be included in the loudness range measurement and which ones should be gated. This is unfortunately often very subjective and difficult to define with a purely statistical method like LRA.

Where LRA falls short

Therefore, only pure speech programs give reliable LRA values that are comparable!
For instance, a typical LRA for news programs is 3 LU; for talks and discussions 5 LU is common. LRA values for features, radio dramas, movies or music very much depend on the individual character and might be in the range between 5 and 25 LU.

To further illustrate this, here are some typical LRA values, according to a paper by Thomas Lund (table 2):

ProgramLoudness Range
Matrix, full movie25.0
NBC Interstitials, Jan. 2008, all together (3:30)9.4
Friends Episode 166.6
Speak Ref., Male, German, SQUAM Trk 546.2
Speak Ref., Female, French, SQUAM Trk 514.8
Speak Ref., Male, English, Sound Check3.3
Wish You Were Here, Pink Floyd22.1
Gilgamesh, Battle of Titans, Osaka Symph.19.7
Don’t Cry For Me Arg., Sinead O’Conner13.7
Beethoven Son in F, Op17, Kliegel & Tichman12.0
Rock’n Roll Train, AC/DC6.0
I.G.Y., Donald Fagen3.6

LRA values of music are very unpredictable as well.
For instance, Tom Frampton measured the LRA of songs in multiple genres, and the differences within each genre are quite big. The ten pop songs that he analyzed varied in LRA between 3.7 and 12 LU, country songs between 3.6 and 14.9 LU. In the Electronic genre the individual LRAs were between 3.7 and 15.2 LU. Please see the tables at the bottom of his blog post for more details.

We at Auphonic also tried to base our Adaptive Leveler parameters on the LRA descriptor. Although it worked, it turned out that it is very difficult to set a loudness range target for diverse audio content, which does include speech, background sounds, music parts, etc. The results were not predictable and it was hard to find good target values. Therefore we developed our own algorithm to measure the dynamic range of audio signals.

In conclusion, LRA comparisons are only useful for productions with spoken word only and the LRA value is therefore not applicable as a general dynamic range target value. The more complex a production gets, the more difficult it is to make any judgment based on the LRA.
This is, because the definition of LRA is purely statistical. There's no smart measurement using classifiers that distinguish between music, speech, quiet breathing, background noises and other types of audio. One would need a more intelligent algorithm (as we use in our Adaptive Leveler), that knows which audio segments should be included and excluded from the measurement.

From theory to application: tools

Loudness and dynamic range clearly is a complicated topic. Luckily, there are tools that can help. To keep short-term loudness in range, a compressor can help control sudden changes in loudness - such as p-pops or consonants like t or k. To achieve a good mid-term loudness, i.e. a signal that doesn't go outside the comfort zone too much, a leveler is a good option. Or, just use a fader or manually adjust volume curves. And to make sure that separate productions sound consistent, loudness normalization is the way to go. We have covered all of this in-depth before.

Looking at the audio from above again, with an adaptive leveler applied it looks like this:

Leveler example. Output at the top, input with leveler envelope at the bottom.

Now, the voices are evened out and the music is at a comfortable level, while the breathing has not been touched at all.
We recently extended Auphonic's adaptive leveler, so that it is now possible to customize the dynamic range - please see adaptive leveler customization and advanced multitrack audio algorithms.
If you wanted to increase the loudness comfort zone (or dynamic range) of the standard preset by 10 dB (or LU), for example, the envelope would look like this:

Leveler with higher dynamic range, only touching sections with extremely low or extremely high loudness to fit into a specific loudness comfort zone.

When a production is done, our adaptive leveler uses classifiers to also calculate the integrated loudness and loudness range of dialog and music sections separately. This way it is possible to just compare the dialog LRA and loudness of complex productions.

Assessing the LRA and loudness of dialog and music separately.

Conclusion

Getting audio dynamics right is not easy. Yet, it is an important thing to keep in mind, because focusing on loudness normalization alone is not enough. In fact, hitting the loudness target often has less impact on the listening experience than level differences, i.e. audio dynamics.

If the dynamic range is too small, the audio can be tiring to listen to, whereas a bigger dynamic range can make a program more interesting, but might not work in loud environments, such as a noisy train.
Therefore, a good mix adapts the audio dynamic range according to the target listening environment (different loudness comfort zones in cinema, at home, in a car) and according to the nature of the content (radio feature, movie, podcast, music, etc.).

Furthermore, because the definition of the loudness range / LRA is purely statistical, only speech programs give reliable LRA values that are comparable.
More "intelligent" algorithms are in development, which use classifiers to decide which signals should be included and excluded from the dynamic range measurement.

If you understand German, take a look at our presentation about audio dynamic processing in podcasts for further information:







audio

Remix and make music with audio from the Library of Congress

Brian Foo is the current Innovator-in-Residence at the Library of Congress. His latest…

Tags: , ,