ISSP 2006: Coupling electromagnetic sensors and ultrasound images for tongue tracking: acquisition set up and preliminary results

Electromagnetic sensors

The tracking system is an Aurora miniature electromagnetic (EM) system. This systems includes a magnetic field generator (MFG), a system control unit which communicates with a pc on the serial port, and miniature coils. These coils provide three degrees of freedom (DOF) in position and two in angulation (in a quaternion format). Specifications given by the manufacturer quote a positionnal accuracy of 1-2 mm and an angular accuracy of 0.6 degree within the sensitive volume where the MFG emits (approx. 50 cm x 50 cm x 50 cm).

Ultrasound images

A Logiq5 ultrasound (US) machine from GE Healthcare is used. Our transducer is a microconvex 8C probe, producing ultrasound between 5MHz and 9MHz. The idea is to use this probe on a speaker chin to acquire images of the shape of the tongue. As the tongue is located between 3cm and 7cm from the chin during the speech production, the obtained acquisition for US images can vary between 50Hz and 100Hz, depending on the scanning area, the depth of penetration, the precision of the image...

Why mix US and EM data?

During speech production, there is often an air cavity below the tongue: the US technique is not able to get the whole shape of the tongue because US can't cross air cavities. Areas such as the tip of the tongue (the apex) can't be tracked with the ultrasound only. Therefore, the idea is to put an EM coil where the US can't track the tongue in order to be able to recover a larger part of the tongue.
The idea is to superimpose EM data on US images. One 5DOF coil is glued on the tongue apex. To check if EM sensor can track movements of the tongue on US images, another 5DOF coil is glued on the middle of the tongue.

EM-US coupling

EM data need tobe spatially and temporally calibrated into the US system. It means that the spatial transformation between the US and The EM system coordinate need to be computed (this a rigid transformation, i.e. a translation and a rotation), and a temporal calibration has to be done in order to get the same time sampling for the two modalities.

Experimentations on the tongue

  • Sequence 1: /au/, /atu/, /aku/, /ao/, /ako/, /ae/, /ake/, /ate/.
  • Sequence 2: "La bise et le soleil se disputaient, chacun assurant qu'il était le plus fort"
Sequence 1 with sound
Sequence 1 mixed with a video acquistion of the experimental set-up
Sequence 2 without sound
Sequence 2 with sound