Robust speech perception

Robust Speech Perception – Arman Savran

The fine temporal dynamics of ED vision can be exploited to implement a speech recognition system based on speech production related information (such as movement of the lips, opening, closure, shape, etc....) to improve models of temporal dynamics in speech and compensate for poor acoustic information due to noisy acoustic environments. The temporal features extracted from ED visual signal will be used for the yet unexplored cross-modal ED speech segmentation that will drive processing of speech. To increase the robustness to acoustic noise and atypical speech, acoustic and visual features will be combined to recover phonetic gestures of the inner vocal tract (articulatory features). Visual, acoustic and (recovered) articulatory features will be the observation domain of a novel speech recognition system for the robust recognition of key phrases.