Voice Processing

Comprehensive software that enables voice processing at the edge.

NXP offers a range of voice control, audio and communications software and solutions that provide high-quality, reliable embedded speech processing for human-to-human and human-to-machine local voice applications. NXP voice communication software offerings are designed for small-footprint, low-power applications running on our portfolio of MCUs, MPUs and DSPs.

Software
Voice Intelligent Technology
EdgeReady Voice Solutions
NXP Devices
Design Resources
FAQs
Contact Us
Documentation

Voice Processing Software

Voice Software	Features	Key Applications
Voice Communication Software
Conversa	Advanced full-duplex voice processing software that leverages traditional learning and machine learning Includes a real-time PC-based tuning tool that provides control of the audio signal path	Smart Watches and Wearables Gaming Headphones Smart Home Intercoms Personal and Group Conferencing Automotive Voice Communications Industrial Intercoms / Building Access
Voice UI Software
Voice Intelligent Technology Wake Word and Voice Commands	A complete audio front end, wake word and voice command solution Free and ready to use on supported NXP devices Easily customize wake words and commands using online tools Available through the MCUXpresso SDK	Smart Watches and Wearables Gaming Headphones Smart Home Intercoms Personal and Group Conferencing Automotive Voice Communications Industrial Intercoms / Building Access
Voice Intelligent Technology Speech to Intent	A natural language (NLU) understanding engine for local voice control Allows devices to understand a user’s intent without requiring exact phrasing or commands No cloud connection required	Smart Watches and Wearables Smart Appliances Home Automation Robots Industrial HMI systems Automotive HMI systems
VoiceSpot	A natural language understanding engine for local voice control Allows devices to understand a user’s intent without requiring exact phrasing or commands No cloud connection required	Smart Watches and Wearables Smart Appliances Home Automation Robots Remote Controls Industrial HMI systems Automotive HMI systems
VoiceSeeker	Multi-microphone audio front-end signal processing solution for low-power, always-on devices Optimized for noisy near-field and far-field voice pickup in the presence of playback audio Can be tightly integrated with VoiceSpot for keyword detection and payload capture	Smart Watches and Wearables Far-field Voice Control Systems Smart Home Controls Automotive Voice Controls Industrial Voice Controls

Voice Control and Communications Software Overview

Full-service, comprehensive suite of customizable voice enablement technologies for accelerated voice-based application development.

Voice Intelligent Technology

Voice Intelligent Technology (VIT) is a comprehensive local voice user interface software suite from NXP. The VIT suite includes several engines, including the free-to-use Wake Word engine (WWE) and a free Voice Command engine (VCE), as well as a premium Speech to Intent (S2I) engine.

The Wake Word and Command Engines are supported by online model creation tools, which allow users to quickly create customized model files according to their desired voice user experience.

VIT’s S2I engine is designed to be more like natural language understanding, wherein a user can use a broad range of voice commands to get the desired action from the device. Unlike the VCE, which requires a specific voice command (for example, a user must say “Lights On”), S2I allows users to speak in a natural way to get the desired action (for example, “I need more light” or “It’s too dark in here” or “Please turn on the lights”, etc.). The S2I engine will be supported with online model creation tools in mid-2024. In the meantime, qualifying customers will be supported by NXP’s professional voice services team to create customized voice user experiences.

EdgeReady Voice Solutions

Complete, production-grade software and hardware platform, certified by NXP, for fast development and turnkey solution.

SmartVoice UI

The SmartVoice solution for both local and online voice control leverages the i.MX RT106V crossover MCU with integrated Voice Intelligent Technology (VIT), Wake Word and Voice Command engines offering a voice user interface for touchless applications.
SmartHMI

The SmartHMI solution allows developers to quickly and easily enable multimodal, intelligent, hands-free capabilities including machine learning (ML), vision for face and gesture recognition, far-field voice control and 2D graphical user interface (GUI) in their products.

NXP Devices for VIT, VoiceSeeker, VoiceSpot and Conversa

NXP’s Voice Processing Portfolio is suitable for use on i.MX RT Crossover MCUs, LPC55S6x, RW61x MCUs and i.MX 8M Mini, i.MX 8M Plus, and i.MX 9x applications processors.

Design Resources

Application Software Pack: Conversa Voice Calling on i.MX RT1170

Software

Enable a complete voice call application using NXP's i.MX RT1170 crossover MCU and Conversa Voice Suite.

VIT Tool

Development Tool

VIT is based on state-of-the-art deep learning and speech recognition technologies and provides a complete audio front end / wake word / voice commands solution.

Voice Trainings

Training

Explore our trainings to learn more about voice enablement.

Voice Videos

Video

Learn more about our voice control and communications software and solutions.

Application Software Pack: Low-Power Voice UI Enablement on i.MX MPUs

Software

Deploy low-power voice UI enablement an i.MX MPU

EdgeReady Solutions

Technology

Learn more about our turnkey voice solutions.

NXP Voice Processing Software FAQs

How do I get started with VIT?

VIT Wake Word and Voice Command Engine can be accessed through online tools and our MCUXpresso SDK. For VIT Speech to Intent, please contact us at voice@nxp.com with your specific requests.

Does NXP have voice software application examples?

Yes, visit our application software pack page or our Application Code Hub. You can also view demo videos showcasing our voice software.

What is the difference between voice UI and voice communications?

Voice UI refers to “voice-first” devices that use voice as a user interface. NXP’s Voice UI software technologies are VIT, VoiceSpot and VoiceSeeker.

Voice communications refer to two-way person-to-person communication using voice; i.e., telephony. NXP’s Voice communications software technology is Conversa.

What is the difference between VoiceSpot and VIT? When should you use one versus the other?

VoiceSpot is a very accurate, highly optimized wake word and acoustic event detection engine. It is based on deep learning neural network techniques and requires large datasets for training. VoiceSpot is appropriate for customers who need the highest response rates with the fewest false alarms and is also appropriate for customers who need to run in ultralow power states while waiting for the voice / acoustic trigger.

VIT software suite is built on phoneme-based automatic speech recognition technology. This technology maps spoken phonemes (the basic building blocks of speech) into words, which can then be recognized as wake words and commands and then, transformed into intents and actions. Because VIT is based on phonemes, it is possible to create wake words and command models quickly with a keyboard and NXP’s online model creation tools. VIT Wake Word and Voice Command Engines are appropriate for customers who want to build custom wake words and voice commands independently or those who want to quickly experiment with voice as a user interface. VIT Speech to Intent is for customers who want to create a natural language understanding like experience on edge processors without the use of cloud connectivity and cloud ASR transcription services.

What is VoiceSeeker and how do you use it?

VoiceSeeker is a multi-microphone beamforming audio front end signal processing solution for voice user interfaces. VoiceSeeker discriminates between signal and noise and is especially effective in far-field, reverberant conditions. VoiceSeeker is offered in a standard free-to-use option and a premium option. VoiceSeeker without AEC is freely available via NXP’s MCUXpresso SDK and integrates easily with VoiceSpot or VIT. The premium VoiceSeeker option includes an acoustic echo canceller (AEC) and is available via controlled distribution from NXP. VoiceSeeker is frequently used in far-field voice control applications like smart speakers and home controllers but can also be used in the mid- and near-field where interfering noise needs to be cancelled.

Contact Us

Explore our Voice community for support or contact us at voice@nxp.com.

Additional Documents

Software for Voice Processing at the Edge

NXP offers a range of voice control, audio and communications software and systems solutions that provide high quality, reliable embedded speech processing for human-to-human and human-to-machine voice applications.

Read the factsheet