|

|
Our contributions
|
| |
|
| - |
Audio-signal conditioning
|
| - |
Adaptive beam-forming technology
|
| - |
Speech recognition |
| - |
Digital noise reduction |
| - |
Text-to-speech synthesis |
|
Voice-control systems are set to revolutionize
diagnostic and interventional procedures by allowing
clinicians to interact and control medical equipment
via speech even from several meters away. Clinical
trials of such systems are already proving highly
successful by providing clinicians with a
convenient, reliable and accurate means of
hands-free control that allows them greater freedom
to concentrate on their patients.
Audio-signal conditioning
The use of headsets or clip-ons to communicate with
the equipment, however, is inconvenient as it risks
compromising the sterile conditions. While the use
of ‘distant-talk’ microphones overcomes this
problem, they risk picking up other signals such as
background noise, conversations and reverberation,
all of which degrade the audio quality of the voice
commands, often to the level where they can no
longer be reliably processed by a speech-recognizer.
Drawing on its world-class expertise in audio-signal
conditioning, Philips Applied Technologies has
addressed this issue by developing a range of audio
preconditioning solutions to maintain the high
quality of the signal path from the speaker to the
microphones.
Adaptive beam-forming technology
The voice-control system we developed incorporates
sophisticated adaptive beam-forming technology that
uses an array of microphones to locate and track the
person speaking. Once the speaker has been
identified, digital noise reduction algorithms
filter out background and extraneous speech and
noise, enabling the system to ‘lock on’ to the
speaker, thus greatly improving the performance of
the voice recognizer. The system is triggered to
lock on to the speaker by means of a spoken wake-up
call, after which it is capable of following the
speaker over the limited distance necessary to
perform the clinical procedure. Once locked on, the
system enhances the user’s voice commands and
eliminates background noise and other people’s
voices, thus greatly improving the performance of
the voice recognizer. The head-tracking system
combined with the eye-tracking system allows the
orientation of the head to be measured.
An extra level of tracking reliability is provided
by supplementing the beam-forming technology with
video tracking. We are currently conducting
feasibility studies on a new face-identification
algorithm developed by Philips Research.
To
allow natural communication via the voice-control
system, between for example a surgeon and a patient
or other member of the clinical staff, Philips
Applied Technologies has also introduced techniques
to allow full duplex communication. These include a
proprietary acoustic echo cancellation technique for
suppressing acoustic feedback or ringing. This was
initially developed by Philips for the mobile-phone
and conference phone markets and has been optimized
by Philips Applied Technologies for use in voice
control.
Text-to-speech
A sophisticated text-to-speech synthesizer, we
developed in close cooperation with Philips
Research, is also being incorporated into the
voice-control system. This vocalizes selected
on-screen text to enable the clinician to receive
prompts and other information without the need to
look at the monitor screen. The synthesizer is
characterized by high flexibility – generating
naturally-sounding speech in a variety of voices,
accents and moods, and by very low implementation
complexity – running not only on PCs but also on
telephones and other equipment using relatively
low-cost (e.g. ARM 7) processors.
|