Blog about voice enhancements

Noise 101

Noise 101

This post gives a brief overview of the noise environment in which a mobile phone is used to make a voice call. Of course, it is obvious that the most important thing to keep in a call is the speech of the talker and that we want get rid of all the unwanted noises. But, this is not always so simple, and the different characteristics of noise around the device is one of the reasons why this can be difficult to achieve.

1. Noise

This post introduces the myriad of noises that can be present during a call, and gives an overview of their distinguishing features. In a mobile phone the speech and noises are captured with one or more microphones. But strange as it may seem not all of the signal captured by the microphones comes from audible sources. To give a more complete idea of all the sources of noise and speech we can visualize it as in the picture below.

Sources of noise and speech

2. Unwanted Noises

It is useful to categorize noises and there are two very important features that we use to do this:

Stationary / Non-Stationary. A Stationary noise is one that is not changing (or at least not quickly), like a hiss, the sound of air-conditioning or a machine hum. A Non-Stationary noise is one that is constantly changing such as music, someone talking or the sound of traffic on the road.

Diffuse / Non-Diffuse. Diffuse noise sources are sounds that come from many directions or many sources at once, and the sources of sound are far from the device. This can be many people talking at once in a cafe, road traffic or in a factory. Non-Diffuse noise sources are localized sounds coming from a single source near to the device. This can be someone talking, a radio or a beamer. The key thing about diffuse noise is that it is impossible to locate where it comes from, because it comes from everywhere.
Of course, noises are not just one or the other, they can also be anywhere in-between with these two features.

Unwanted Noises

2.1 Localized Noise Sources

A localized noise source is one that comes from a single well-defined point, such as a radio, a beamer or someone typing on a keyboard. Because it comes from one place the microphones will capture a different signal depending on the orientation of the device. If the device is close enough to the noise source, then some microphones may have a stronger signal than others.

2.2 Interfering Speaker

An interfering speaker is a localized noise source. This is a single individual talker near to the device. This is a special noise type since it has many of the characteristics of the desired speech we want to keep. In handset mode this noise type is undesirable. In speakerphone mode this type of noise is indistinguishable from the speech we want to keep. Imagine a group of people in a meeting room using the device for a conference call, everyone is a possible talker!

2.3 Instrumental Noise

Instrumental noise is not acoustic noise but noise that has somehow got onto the microphone signal. In a well-designed phone this will be at a low level, but it will be there. Depending on how this noise is caused it can be stationary or non-stationary, diffuse or non-diffuse and anywhere in between.

2.4 Wind Noise

Wind noise is interesting since wind is not a noise itself. It is causes turbulence around the microphones which creates noise. Since this is very unstable it changes frequently, often loud on one microphone then on another, sometimes on several microphones together…. It tends to be loud, especially at lower frequencies.

2.5 Body Noise

Just by holding the phone in the hand, this can create noise. For example scratching the device or changing the grip on the phone.

2.6 Echo

The echo comes from the other end of the call by their voice being played back over the phone’s internal speaker. This is a localized source, since it comes from a single well-defined point. It is inside the case of the device and hence is very close to the microphones. This is a special localized noise source, we already know a lot about this one, we have the signal as the far-end reference. In handset mode the playback volume is very low since the speaker is very close to the ear. In speakerphone mode the playback volume is as high as possible since the speaker is far from the ear of the listener(s). So, you can imagine echo is more of a problem in speakerphone mode.

3. Desired Speech

To conclude this discussion we cannot avoid considering the little girl talking. How this is captured by the phone is very dependent on how the phone is used. We generally consider two main use cases, handset and speakerphone mode.
In handset mode we have some advantages:

  • The phone is held as shown in the picture.
  • The received level of the desired speech is high so we normally have a good Signal to Noise Ratio, i.e. the speech is loud compared with the noise.
  • There is a significant difference in level between the microphones, because the distance between the microphones is large compared to the distance to the girl’s mouth. This difference in level is very useful when trying to discriminate between the girl’s speech and the more distant noise sources.

In speakerphone mode it is more challenging:

  • The phone is held in the hand at arm’s length in front of the face or placed on a table.
  • The received level of the desired speech is at a lower level than in handset mode. The Signal to Noise Ratio is much worse or the environment must be much quieter to have the same Signal to Noise Ratio.
  • There is little or no difference in the captured speech level at the different microphones. The distance between the microphones is small compared with the distance to the girl talking.
  • The character of the voice changes with distance. As the talker moves further away from the device the voice sounds more reverberant, this is particularly obvious in a large room.

Reverberation is not a noise source, but an accumulation of noises bouncing around the room, being reflected indefinitely and slowly fading away. We want to remove this as well as the unwanted noises since it also degrades the quality of the speech. As the talker moves further from the phone the more the (talker) reverberation becomes noticeable. When the talker is very close to the phone the talker is a localized, non-diffuse source. As the talker moves away from the phone, the talker comes more diffuse and less localized and much more like many of the un-wanted noise sources.

Leave a Reply