11
SCALE Workshop, Saarbrücken, January 12, 2010 Prof. Hervé Bourlard Idiap Research Institute EPFL Idiap Research Institute Centre du Parc P.O Box 592 CH – 1920 Martigny +41 27 721 77 11 http://www.idiap.ch

Prof. Hervé Bourlard

  • Upload
    lucine

  • View
    30

  • Download
    7

Embed Size (px)

DESCRIPTION

Idiap Research Institute Centre du Parc P.O Box 592 CH – 1920 Martigny +41 27 721 77 11 http://www.idiap.ch. Prof. Hervé Bourlard. Idiap Research Institute EPFL. Idiap Profile. Independent, not-for-profit research Institute. Founded in 1991 Around 100 collaborators (> 25 pays) - PowerPoint PPT Presentation

Citation preview

Page 1: Prof. Hervé Bourlard

SCALE Workshop, Saarbrücken, January 12, 2010

Prof. Hervé BourlardIdiap Research InstituteEPFL

Idiap Research InstituteCentre du ParcP.O Box 592CH – 1920 Martigny+41 27 721 77 11http://www.idiap.ch

Page 2: Prof. Hervé Bourlard

Copyright © 2009 Idiap – www.idiap.ch 2

Idiap ProfileIndependent, not-for-

profit research Institute• Founded in 1991• Around 100 collaborators (> 25 pays)• Budget: around 10 MCHF • Centre du Parc in Martigny (2300 m2)• 37 research programs (>130

publications/year)• Affiliated with EPFL (joint development

plan) and University of Geneva• Accredited (and co-funded) by the Federal

Government, State and City, as part of the « ETH Strategic Domain »

• Host institution of CH National Centre of Competence in Research on « interactive multimodal information management » (IM2)

Page 3: Prof. Hervé Bourlard

Copyright © 2009 Idiap – www.idiap.ch 3

HUMAN AND MEDIA COMPUTING

• Perceptual and cognitive systems– Speech processing– Document and text processing– Natural language understanding and

translation– Vision and scene analysis– Multimodal processing– Computational cognitive science

• Online learning & Categorization

• Social/human behavior– Web social media– Mobile social media– Social interaction sensing– Social signal processing– Verbal and nonverbal

communication analysis

• Information interfaces and presentation– Multimedia information systems– User interfaces– System evaluation

• Biometric person recognition– Speaker identification &

verification– Face detection, tracking &

recognition– Multimodal fusion• Machine learning– Statistical and neural network

based ML (strong)– Computational efficiency,

targeting real-time applications– Very large datasets– Online learning

All details of current activities available at: http://www.idiap.ch/scientific-research/themes

Page 4: Prof. Hervé Bourlard

Copyright © 2009 Idiap – www.idiap.ch 4

Activities in Perceptual and Cognitive Systems

http://www.idiap.ch/scientific-research/themes/perceptual-and-cognitive-systems

• Natural language understanding and translation– Semantic disambiguation using networks of concepts

extracted from Wikipedia [started 2008] – Identification of discourse markers in dialogues [finished

2009] – Normalizing the evaluation of machine translation– Improving statistical machine translation using discourse-

level information [Sinergia just accepted] • Multimodal object modeling • Semantic robot localization • Vision and scene analysis• Speech Processing (next slide)

Page 5: Prof. Hervé Bourlard

Copyright © 2009 Idiap – www.idiap.ch 5

Activities in Speech Processing• Speech/non-speech detection (including approaches

discarding all lexical and speaker ID information)• Speaker turn detection, segregation, and diarization

– Based on acoustic features (new BIC, information bottleneck)– Based on sound source localization (mic array)– Based on both (fusion)

• Speech localization, beamforming, overlapping and reverberant speech

• Speaker identification• Conversational speech recognition

– Improvement of the realtime Juicer LVCSR system, released as open source public library: http://juicer.amiproject.org/juicer/

– New acoustic features based on subword (phone) posterior distributions

– New ways to use those posterior features• Extraction of audio metadata, dialog acts, hesitations, etc• HMM-based speech synthesis

Page 6: Prof. Hervé Bourlard

Copyright © 2009 Idiap – www.idiap.ch 6

Template-based associative memoriesPhD student: Serena Soldo

Perceptual studies on humans suggest: Both verbal and non-verbal information are stored as

template and used during speech recognition Speech perception is usually explained in terms of

associations to concept.Project:

Jointly investigate the use of template-based approaches along with the application of associative memories techniques.

Page 7: Prof. Hervé Bourlard

Copyright © 2009 Idiap – www.idiap.ch 7

Template-based recognition Task

Isolated word recognition using Phonebook (PB) speech corpus

Posterior features estimated by MLP MLP trained on PB MLP trained on auxiliary corpus (Conversational Telephone

Speech, CTS) New type of template/HMM parametrized by posterior

distributions Investigated distance measure

Geometric measures (Euclidean distance, cosine angle)

Probabilistic measures (Kullback-Leibler divergence, Bhattacharya distance, Hellinger distance)

Linguistic class based measure (scalar product, cross entropy)

Page 8: Prof. Hervé Bourlard

Copyright © 2009 Idiap – www.idiap.ch 8

Some results Although scalar product

“theoretically optimal”, KL-based yield better performance.

Sufficient amount of training data from the auxiliary corpus can achieve comparable performance than the matched conditions. The amount of data also depends upon the choice of local score.

Future workContinuing the work on template-based ASR and extending it towards the binary representation and the investigation of associative memory techniques.

Page 9: Prof. Hervé Bourlard

Copyright © 2009 Idiap – www.idiap.ch 9

Sparse Component Analysis for Robust DSR

• Distant Speech Recognition (DSR) difficulties• Overlapping speech

• Reverberation

• Sparse Component Analysis• Number of sensors < Number of speakers

• The sparser the representation the more efficient the separation performance is expected to be

• What is the best sparse representation?• Time frequency representations

• Gabor features

Page 10: Prof. Hervé Bourlard

Copyright © 2009 Idiap – www.idiap.ch 10

Auditory Sparsity and Sparse Component Analysis

Long term goal: Incorporating Auditory Sparsity in SCA• Gabor filtering of the spectro-temporal representation of speech

• Deploy the detected Gabor patterns in blind source separation

• So far: DUET algorithm

Speech Recognition

Sparse Component Analysis (SCA)

Auditory Sparse Representation

Distant Speech Recognition Front-End

Page 11: Prof. Hervé Bourlard

Copyright © 2009 Idiap – www.idiap.ch 11

Degenerate Unmixing Estimation Technique (DUET)

• Clustering each source components based on delay and attenuation

• and separation by masking in spectro-temporal domain

• Synthesized stereo mixtures from Aurora2• M1= S1 + S2 + S3

• M2= a1×S1 + a2×S2 + a3×S3

• a1 = 1/1.3, a2 = 1.3, a3 = 1.08/1.23

Gabor-Posteriors Aurora2 Baseline DUET

Clean Training 14.18 93.38

Multi-Con. Training 19.35 91.66