View
30
Download
7
Category
Preview:
DESCRIPTION
Idiap Research Institute Centre du Parc P.O Box 592 CH – 1920 Martigny +41 27 721 77 11 http://www.idiap.ch. Prof. Hervé Bourlard. Idiap Research Institute EPFL. Idiap Profile. Independent, not-for-profit research Institute. Founded in 1991 Around 100 collaborators (> 25 pays) - PowerPoint PPT Presentation
Citation preview
SCALE Workshop, Saarbrücken, January 12, 2010
Prof. Hervé BourlardIdiap Research InstituteEPFL
Idiap Research InstituteCentre du ParcP.O Box 592CH – 1920 Martigny+41 27 721 77 11http://www.idiap.ch
Copyright © 2009 Idiap – www.idiap.ch 2
Idiap ProfileIndependent, not-for-
profit research Institute• Founded in 1991• Around 100 collaborators (> 25 pays)• Budget: around 10 MCHF • Centre du Parc in Martigny (2300 m2)• 37 research programs (>130
publications/year)• Affiliated with EPFL (joint development
plan) and University of Geneva• Accredited (and co-funded) by the Federal
Government, State and City, as part of the « ETH Strategic Domain »
• Host institution of CH National Centre of Competence in Research on « interactive multimodal information management » (IM2)
Copyright © 2009 Idiap – www.idiap.ch 3
HUMAN AND MEDIA COMPUTING
• Perceptual and cognitive systems– Speech processing– Document and text processing– Natural language understanding and
translation– Vision and scene analysis– Multimodal processing– Computational cognitive science
• Online learning & Categorization
• Social/human behavior– Web social media– Mobile social media– Social interaction sensing– Social signal processing– Verbal and nonverbal
communication analysis
• Information interfaces and presentation– Multimedia information systems– User interfaces– System evaluation
• Biometric person recognition– Speaker identification &
verification– Face detection, tracking &
recognition– Multimodal fusion• Machine learning– Statistical and neural network
based ML (strong)– Computational efficiency,
targeting real-time applications– Very large datasets– Online learning
All details of current activities available at: http://www.idiap.ch/scientific-research/themes
Copyright © 2009 Idiap – www.idiap.ch 4
Activities in Perceptual and Cognitive Systems
http://www.idiap.ch/scientific-research/themes/perceptual-and-cognitive-systems
• Natural language understanding and translation– Semantic disambiguation using networks of concepts
extracted from Wikipedia [started 2008] – Identification of discourse markers in dialogues [finished
2009] – Normalizing the evaluation of machine translation– Improving statistical machine translation using discourse-
level information [Sinergia just accepted] • Multimodal object modeling • Semantic robot localization • Vision and scene analysis• Speech Processing (next slide)
Copyright © 2009 Idiap – www.idiap.ch 5
Activities in Speech Processing• Speech/non-speech detection (including approaches
discarding all lexical and speaker ID information)• Speaker turn detection, segregation, and diarization
– Based on acoustic features (new BIC, information bottleneck)– Based on sound source localization (mic array)– Based on both (fusion)
• Speech localization, beamforming, overlapping and reverberant speech
• Speaker identification• Conversational speech recognition
– Improvement of the realtime Juicer LVCSR system, released as open source public library: http://juicer.amiproject.org/juicer/
– New acoustic features based on subword (phone) posterior distributions
– New ways to use those posterior features• Extraction of audio metadata, dialog acts, hesitations, etc• HMM-based speech synthesis
Copyright © 2009 Idiap – www.idiap.ch 6
Template-based associative memoriesPhD student: Serena Soldo
Perceptual studies on humans suggest: Both verbal and non-verbal information are stored as
template and used during speech recognition Speech perception is usually explained in terms of
associations to concept.Project:
Jointly investigate the use of template-based approaches along with the application of associative memories techniques.
Copyright © 2009 Idiap – www.idiap.ch 7
Template-based recognition Task
Isolated word recognition using Phonebook (PB) speech corpus
Posterior features estimated by MLP MLP trained on PB MLP trained on auxiliary corpus (Conversational Telephone
Speech, CTS) New type of template/HMM parametrized by posterior
distributions Investigated distance measure
Geometric measures (Euclidean distance, cosine angle)
Probabilistic measures (Kullback-Leibler divergence, Bhattacharya distance, Hellinger distance)
Linguistic class based measure (scalar product, cross entropy)
Copyright © 2009 Idiap – www.idiap.ch 8
Some results Although scalar product
“theoretically optimal”, KL-based yield better performance.
Sufficient amount of training data from the auxiliary corpus can achieve comparable performance than the matched conditions. The amount of data also depends upon the choice of local score.
Future workContinuing the work on template-based ASR and extending it towards the binary representation and the investigation of associative memory techniques.
Copyright © 2009 Idiap – www.idiap.ch 9
Sparse Component Analysis for Robust DSR
• Distant Speech Recognition (DSR) difficulties• Overlapping speech
• Reverberation
• Sparse Component Analysis• Number of sensors < Number of speakers
• The sparser the representation the more efficient the separation performance is expected to be
• What is the best sparse representation?• Time frequency representations
• Gabor features
Copyright © 2009 Idiap – www.idiap.ch 10
Auditory Sparsity and Sparse Component Analysis
Long term goal: Incorporating Auditory Sparsity in SCA• Gabor filtering of the spectro-temporal representation of speech
• Deploy the detected Gabor patterns in blind source separation
• So far: DUET algorithm
Speech Recognition
Sparse Component Analysis (SCA)
Auditory Sparse Representation
Distant Speech Recognition Front-End
Copyright © 2009 Idiap – www.idiap.ch 11
Degenerate Unmixing Estimation Technique (DUET)
• Clustering each source components based on delay and attenuation
• and separation by masking in spectro-temporal domain
• Synthesized stereo mixtures from Aurora2• M1= S1 + S2 + S3
• M2= a1×S1 + a2×S2 + a3×S3
• a1 = 1/1.3, a2 = 1.3, a3 = 1.08/1.23
Gabor-Posteriors Aurora2 Baseline DUET
Clean Training 14.18 93.38
Multi-Con. Training 19.35 91.66
Recommended