Prof. Hervé Bourlard

SCALE Workshop, Saarbrücken, January 12, 2010

Prof. Hervé BourlardIdiap Research InstituteEPFL

Idiap Research InstituteCentre du ParcP.O Box 592CH – 1920 Martigny+41 27 721 77 11http://www.idiap.ch

Idiap ProfileIndependent, not-for-

profit research Institute• Founded in 1991• Around 100 collaborators (> 25 pays)• Budget: around 10 MCHF • Centre du Parc in Martigny (2300 m2)• 37 research programs (>130

publications/year)• Affiliated with EPFL (joint development

plan) and University of Geneva• Accredited (and co-funded) by the Federal

Government, State and City, as part of the « ETH Strategic Domain »

• Host institution of CH National Centre of Competence in Research on « interactive multimodal information management » (IM2)

HUMAN AND MEDIA COMPUTING

• Perceptual and cognitive systems– Speech processing– Document and text processing– Natural language understanding and

translation– Vision and scene analysis– Multimodal processing– Computational cognitive science

• Online learning & Categorization

• Social/human behavior– Web social media– Mobile social media– Social interaction sensing– Social signal processing– Verbal and nonverbal

communication analysis

• Information interfaces and presentation– Multimedia information systems– User interfaces– System evaluation

• Biometric person recognition– Speaker identification &

verification– Face detection, tracking &

recognition– Multimodal fusion• Machine learning– Statistical and neural network

based ML (strong)– Computational efficiency,

targeting real-time applications– Very large datasets– Online learning

All details of current activities available at: http://www.idiap.ch/scientific-research/themes

Activities in Perceptual and Cognitive Systems

http://www.idiap.ch/scientific-research/themes/perceptual-and-cognitive-systems

• Natural language understanding and translation– Semantic disambiguation using networks of concepts

extracted from Wikipedia [started 2008] – Identification of discourse markers in dialogues [finished

2009] – Normalizing the evaluation of machine translation– Improving statistical machine translation using discourse-

level information [Sinergia just accepted] • Multimodal object modeling • Semantic robot localization • Vision and scene analysis• Speech Processing (next slide)

Activities in Speech Processing• Speech/non-speech detection (including approaches

discarding all lexical and speaker ID information)• Speaker turn detection, segregation, and diarization

– Based on acoustic features (new BIC, information bottleneck)– Based on sound source localization (mic array)– Based on both (fusion)

• Speech localization, beamforming, overlapping and reverberant speech

• Speaker identification• Conversational speech recognition

– Improvement of the realtime Juicer LVCSR system, released as open source public library: http://juicer.amiproject.org/juicer/

– New acoustic features based on subword (phone) posterior distributions

– New ways to use those posterior features• Extraction of audio metadata, dialog acts, hesitations, etc• HMM-based speech synthesis

Template-based associative memoriesPhD student: Serena Soldo

Perceptual studies on humans suggest: Both verbal and non-verbal information are stored as

template and used during speech recognition Speech perception is usually explained in terms of

associations to concept.Project:

Jointly investigate the use of template-based approaches along with the application of associative memories techniques.

Template-based recognition Task

Isolated word recognition using Phonebook (PB) speech corpus

Posterior features estimated by MLP MLP trained on PB MLP trained on auxiliary corpus (Conversational Telephone

Speech, CTS) New type of template/HMM parametrized by posterior

distributions Investigated distance measure

Geometric measures (Euclidean distance, cosine angle)

Probabilistic measures (Kullback-Leibler divergence, Bhattacharya distance, Hellinger distance)

Linguistic class based measure (scalar product, cross entropy)

Some results Although scalar product

“theoretically optimal”, KL-based yield better performance.

Sufficient amount of training data from the auxiliary corpus can achieve comparable performance than the matched conditions. The amount of data also depends upon the choice of local score.

Future workContinuing the work on template-based ASR and extending it towards the binary representation and the investigation of associative memory techniques.

Sparse Component Analysis for Robust DSR

• Distant Speech Recognition (DSR) difficulties• Overlapping speech

• Reverberation

• Sparse Component Analysis• Number of sensors < Number of speakers

• The sparser the representation the more efficient the separation performance is expected to be

• What is the best sparse representation?• Time frequency representations

• Gabor features

Auditory Sparsity and Sparse Component Analysis

Long term goal: Incorporating Auditory Sparsity in SCA• Gabor filtering of the spectro-temporal representation of speech

• Deploy the detected Gabor patterns in blind source separation

• So far: DUET algorithm

Speech Recognition

Sparse Component Analysis (SCA)

Auditory Sparse Representation

Distant Speech Recognition Front-End

Degenerate Unmixing Estimation Technique (DUET)

• Clustering each source components based on delay and attenuation

• and separation by masking in spectro-temporal domain

• Synthesized stereo mixtures from Aurora2• M1= S1 + S2 + S3

• M2= a1×S1 + a2×S2 + a3×S3

• a1 = 1/1.3, a2 = 1.3, a3 = 1.08/1.23

Gabor-Posteriors Aurora2 Baseline DUET

Clean Training 14.18 93.38

Multi-Con. Training 19.35 91.66

Prof. Hervé Bourlard

Documents

Nouvelles topologies de cellules d ephaseuses a cout^ et ... · Prof. Emérite à l’Université Paris Ouest, Nanterre La Défense / Président Hervé AUBERT Professeur des Universités

JEAN-HERVÉ LORENZI AVECJEAN-HERVÉ LORENZI MICKAËL …

Amadis Hervé

Istanbul - Turkey - ADAMProf. Arif Ersoy Prof. Suraiya Faroqhi Mehmet Genç Prof. Cahit Güran Prof. M. Mehdi İlhan Prof. Cemal Kafadar Prof. Ahmet Kala Prof. Muhsin Kar Prof. Erdal

LE CONCERT SPIRITUEL HERVÉ NIQUET

Hervé Ascensio

Hervé ThisMises en bouche Hervé This à la Maison des Sciences de l’Homme (Amphi 219) De 15h30 à 17h30 Rencontre avec Hervé This (Groupe de Gastronomie moléculaire, Laboratoire

Hervé Queffélec - Dunod

Hervé SUHUBIETTE

Hervé Bourrier - EIA-FR 1 HEG071127 1. Hervé Bourrier - EIA-FR 2 HEG071127 2 Bonsoir

Hervé Di Rosa

Eco maisonbois Le Chêne Hervé

Hervé Gambs Paris book 13

Hervé Vaillant

Hervé Gambs Paris - book 14

VANESSA DOUTRELEAU HERVÉ JÉZÉQUEL

Prof accol

Prof. Marie-Christine Closon (UCL) et Prof. J.P. Bayens (KUL)

Verfgereedschap Outils du peintre - PGZ International | …€¢ Prof lyonse penseel plat / prof pinceau de lyon plat 20 • SAM prof lyonse penseel gebogen rond / pinceau prof incurvé

TABLE DES MATIERES - Hervé Thermique