Voice news from UPLINQ 2013
This year, Qualcomm organized for the 4th time the annual UPLINQ Developer Conference in San Diego in California. VoiceGurus were present on-site to search for mobile voice-related news. Here’s what they found….
NXP Software showing VoiceExperience
NXP Software demonstrated their new VoiceExperience 4.0 which features a 2-microphone noise suppressor that works both in handset and speaker mode. Most smartphone and tablets nowadays feature 2 or more microphones, and these extra second and third microphones help to suppress more background noise than what generally can be achieved with 1 microphone, while at the same keeping the voice undistorted, at least theoretically. The demo showed the difference between two phones, one with and one without 2-mic noise suppression when making a live recording in the UPLINQ exhibition hall. The high ambient noise level in the hall was a good example of how much benefit a good noise suppressor can bring when making a phone call. Although the demo showed some degradation of the speech quality, it clearly improves the overall calling experience. A second demo was a live VoIP call using Samsung ChatON between two Galaxy S4s over WiFi. Even though the demo only featured 1-mic noise suppression, the call was crisp and clear, as you would expect from a wideband VoIP call. Both demos featured NXP Software’s voice solution on the Hexagon DSP in the Snapdragon platform, where the default Fluence voice solution from Qualcomm has been replaced by VoiceExperience from NXP Software. More information can be found on http://www.nxpsoftware.com/uplinq2013.
Qualcomm Fluence Pro
Qualcomm demonstrated a new version of Fluence Pro in combination with Google Now. As in a voice call, speech recognition systems like Google Now typically apply noise suppression to the microphone signal before the actual recognition takes place. In case of the Fluence demo at UPLINQ, we could not try the difference between with and without Fluence enabled, but Google Now seemed to work reasonable well under the noise conditions of UPLINQ. With a bit of practicing, the recognition is mostly correct. A second Fluence PRO demo showed how 3 microphones on a Snapdragon Mobile Development Platform can be used to do beam forming. This feature will be available on the new Snapdragon 800 platform. The most impressive part of this demo was the user interface showing a polar diagram which graphically indicates in real-time the direction of the talker relative to be phone. Additionally, the user can control the direction in which the phone is sensitive for capturing audio by increasing or decreasing the size of a sector on the 360 degree polar diagram. The result of these settings can be verified by listening to the processed result over headphones. The concept of the demo is really cool, but the voice quality failed to impress. More often than not the speech signal is heavily attenuated, even if you are speaking into the phone at the right angle, according to the polar diagram. The purpose of such a beam former is that you can make the phone sensitive for audio coming from one direction and less sensitive for audio from other directions, which can be useful for capturing camcorder audio, but it appears that this solution needs some more work before it will be practically useable.
Fraunhofer Cingo and Full-HD Voice
Not directly related to voice processing, but interesting nonetheless was Fraunhofer’s demo of Cingo. Cingo is their virtual surround sound solution for headphones and speakers, based on HE-AAC. Cingo has been launched simultaneously with the new Google Nexus 7. Google is now offering movies in the Play Store which are encoded using Cingo so if you happen to own a new Nexus 7 you can enjoy virtual surround in these movies. That is, if you happen to be a US resident. If not, the Play Store is unfortunately limited to applications and books. Fraunhofer is also promoting the low-delay variant of HE-AAC for VoIP calling, supporting sampling rates up to 48 kHz. Although interesting from a technical point of view, commercial deployment is not expected in the coming year because of lack of a standardized end-to-end solution.
Fortemedia voice solution
Fortemedia showed a 2-microphone beam forming solution on a Lenovo laptop, designed for far-field capture up to approx. 5 meters. The result after processing by the beam former can be monitored in real-time on the laptop screen. The result was absolutely impressive. Despite the severe noise conditions at UPLINQ, the beam former is able to restore most of the speech arriving straight at the microphone and to reject a considerable amount of noise from other directions.