14
Ministère de l’Education Nationale, de l’Enseigneme Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur nt Supérieur et de la et de la Recherche Recherche Language Technologies Language Technologies for a Multilingual Europe for a Multilingual Europe Joseph Mariani Joseph Mariani Director Director « Information & Communication Technologies » « Information & Communication Technologies » Department Department French Ministry of Research French Ministry of Research

Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

Embed Size (px)

Citation preview

Page 1: Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

Ministère de l’Education Nationale, de l’EnseignemeMinistère de l’Education Nationale, de l’Enseignement Supérieurnt Supérieur et de la Recherche et de la Recherche

Language TechnologiesLanguage Technologies

for a Multilingual Europefor a Multilingual Europe

Joseph MarianiJoseph MarianiDirectorDirector

« Information & Communication Technologies » Department« Information & Communication Technologies » Department

French Ministry of ResearchFrench Ministry of Research

Page 2: Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

May 28, 2006 Cocosda / WRITE Workshop 2

Support to LT: Techno-langueSupport to LT: Techno-langue

• Report to the Prime Minister (November 2000)Report to the Prime Minister (November 2000)• Techno-langue ActionTechno-langue Action

– Language technology survey and evaluation

• Articulate with related existing programsArticulate with related existing programs– ICT Research & Innovation Technological Networks (RRIT)

• Telecommunications, Software Engineering, Audiovisual & Multimedia

– Ministry of Research action on Business Intelligence Tools (VSE)

Page 3: Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

May 28, 2006 Cocosda / WRITE Workshop 3

Techno-langue structureTechno-langue structure

Infrastructure program to support core LT progress,while innovative application projects stay with RRIT (110 M€ / year)

TELECOM SOFT AMM VSE

Page 4: Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

May 28, 2006 Cocosda / WRITE Workshop 4

Techno-langue CallTechno-langue Call– Language resources (data, tools)

– Evaluation (technology, application)

– Standards

– Technological survey

Page 5: Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

May 28, 2006 Cocosda / WRITE Workshop 5

Techno-langue CallTechno-langue Call• Launched in 2002, 3 year durationLaunched in 2002, 3 year duration• Funding by 3 ministries (Research, Industry, Culture)Funding by 3 ministries (Research, Industry, Culture)• Same on Vision Technology (Techno-vision) in 2005 (MoD)Same on Vision Technology (Techno-vision) in 2005 (MoD)• International cooperationInternational cooperation

– Foreign entities may participate in the projects, with their own funding

• All funded projects completed in 2006All funded projects completed in 2006– Joint Techno-X workshop (ASTI conference, October 2005) – Paper at LREC’2006 (S. Chaudiron, J. Mariani) + 16 papers– Book under preparation– Public presentation of results (Fall 2006)

• Feedback to research and industry (RRIT, VSE/Business Intelligence) • Presentation to administration Agencies (DoD, MAE…)

• LT in 2006 « Data Masses and Ambient Intelligence » CfPLT in 2006 « Data Masses and Ambient Intelligence » CfP– Managed by ANR – 3 M€ funding for LT

Page 6: Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

May 28, 2006 Cocosda / WRITE Workshop 6

Results of the CallResults of the Call

• 52 proposals submitted52 proposals submitted• 21 projects funded21 projects funded• 94 participants94 participants

– 33 industry

– 39 public research

– 11 other categories (Associations, CEA, French DoD…)

– 11 foreign (Bell Labs (USA), NII (Japan), EPFL, LATL…)

• Budget: 20 M€ effort- 7.5 M€ public funding (3 years)Budget: 20 M€ effort- 7.5 M€ public funding (3 years)• Special attention to the distribution of Language Special attention to the distribution of Language

Resources and Evaluation packagesResources and Evaluation packages

Page 7: Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

May 28, 2006 Cocosda / WRITE Workshop 7

21 funded projects21 funded projects• 10 on Language Resources (data and tools)10 on Language Resources (data and tools)• 2 on Standards (Spoken / Written) : support to ISO TC37-SC42 on Standards (Spoken / Written) : support to ISO TC37-SC4• 1 on Technological survey (Portal) : 1 on Technological survey (Portal) : http://www.technolangue.nethttp://www.technolangue.net

• 8 on Technology Evaluation8 on Technology Evaluation– Written language processing (5)

• EASY: Syntactic parsing• ARCADE 2: Text alignment• CESART: Terminology extraction• EQUER: Information query• CESTA: Machine translation

– Spoken Language processing (3)• EVASY: Speech synthesis• MEDIA: Spoken dialog• ESTER: Speech transcription / automatic indexing

Page 8: Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

May 28, 2006 Cocosda / WRITE Workshop 8

ESTERESTER• Task: «Rich» speech transcription and indexing evaluationTask: «Rich» speech transcription and indexing evaluation

– Broadcast news data in French (radio/TV)• 100 h manually transcribed (1 MW,350 speakers) + 1600 h untranscribed• Second largest worldwide

– 13 participants (3 companies)• Written transcription (RT / non RT)• Segmentation (sound, speaker recognition / diarization) • Named Entity recognition (from speech / transcribed text)• Topic detection and tracking for indexing : postponed

• Final internal Workshop in March 2005Final internal Workshop in March 2005• Distribution of Evaluation PackageDistribution of Evaluation Package

– Development and Test data, scoring, results. Data used in EASY.

• Workshop for linguists in May 2005Workshop for linguists in May 2005– Data and tools available, Results– Open issues necessitating Basic Scientific Research investigations

Page 9: Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

May 28, 2006 Cocosda / WRITE Workshop 9

LT for a Multilingual EuropeLT for a Multilingual Europe• Language as a specific issue for EuropeLanguage as a specific issue for Europe

– Economical, cultural and political challenge with 2 dimensions:– A) Preserve the EU Member States cultures

• Preference for native language (Web sites in German (75%)...)• 50% of European citizens only speak one language• (3% of Japanese people speak a foreign language)

– B) Allow for communication across member states• 1170 translators at the EC - 1.3 Mpages translated in 2001• 30% European Parliament budget (300 M€) – 500 translators• EU: 25 countries, 20 languages / 380 language pairs

– Enormous cost for the EU, while mandatory– Need for the assistance of Language Technologies

• Huge effort (# LT * # languages), too large for the EC aloneHuge effort (# LT * # languages), too large for the EC alone• Should be shared with EU Member States (subsidiarity)Should be shared with EU Member States (subsidiarity)

Page 10: Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

May 28, 2006 Cocosda / WRITE Workshop 10

Language Technologies EU ProgramLanguage Technologies EU Program• European Research Area (ERA)European Research Area (ERA)

– Coordinate EC (< 15%) and MS (> 85%) research efforts– ERA-Net initiative in FP6 to coordinate MS national programs

• LT well fitted with ERALT well fitted with ERA– EC prime responsibility :

• the coordination: management, standards, technology evaluation, communication...

• the development cost of generic Language Technologies:– Speech recognition, synthesis, understanding, spoken dialog, language

tagging, parsing, analysis, generation, text retrieval, document understanding, machine translation...

– Each Member State would primarily have the responsibility of ensuring a proper coverage of its language(s):

• Language Resources (essential) : (annnotated) corpus (spoken / written), lexicon (including pronunciations), dictionaries…

• Language specific technology development/adaptation

Page 11: Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

May 28, 2006 Cocosda / WRITE Workshop 11

Lang-Net proposalLang-Net proposal• Build-up ERA-Net proposal of infrastructural natureBuild-up ERA-Net proposal of infrastructural nature

– Language Resources, LT evaluation, Standards, Survey • Share of information• Strategic activities and Best Practice• Implementation of joint activities• Transnational research activities

– Identify EU countries or regions having similar programs• 11 countries / regions in partnership : Germany, France, Italy, Trento region,

Czech Republic, Denmark, Norway, The Netherlands / Belgium-Flanders (Dutch Language Union), Spain, Basque region, Sweden

• Austria, Catalonia, Finland, Greece, Iceland, Portugal, Switzerland, UK (contacts)

– Extendable to other partners• NMS (Slovenia, Cyprus, Poland, Hungary, Malta, Baltic countries…)• AS (Romania, Bulgaria…)• USA, Japan, South Africa, Israel, Canada… (contacts)

Page 12: Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

May 28, 2006 Cocosda / WRITE Workshop 12

Joint LT program proposalJoint LT program proposal• DG Research (ERA-Net program)DG Research (ERA-Net program)

– Lang-Net proposal submitted in march 2005, not selected– Look forward for Thematic ERA-Net+ in FP7

• DG INFSO + MediaDG INFSO + Media– «Science & Technology Forum on Multilingualism»

• June 2005 and February 2006 in Luxembourg

• DG Education, culture and mulDG Education, culture and multilingualismtilingualism– « A new framework strategy for multilingualism » (Nov. 2005)

• http://europa.eu.int/languages/ Web site in the 20 EU languages• EC will set up a High Level Group on Multilingualism• A EU ministerial conference will be held• Further communication will be presented by EC to Parliament and Council

– Committee of the regions (use of regional Spanish languages)

• TC-Star report : Introduction signed by V. Reding & J. FigelTC-Star report : Introduction signed by V. Reding & J. Figel

Page 13: Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

May 28, 2006 Cocosda / WRITE Workshop 13

French support to LT in FP7French support to LT in FP7• Visit of a French delegation to EC E DirectorateVisit of a French delegation to EC E Directorate

– H. Forster & B. Smith (September 2005)

• French MFrench Memorandum for a emorandum for a DDigital Europe (i2010)igital Europe (i2010)

• EuropeaEuropeann DDigital igital LLibraryibrary

• EU ICT Directors meeting (Vienna, March 2006)EU ICT Directors meeting (Vienna, March 2006)

• FP7 ICT program (2007-2013)FP7 ICT program (2007-2013)– Technology pillar : Simulation, Visualization, Interaction, mixed

realities• « Multilingual and automatic machine translation systems »

– Replace / add LT• « Language technology, including multilingual and automatic MT systems »

– FP7 Budget reduction (12 B€ to 9 B€ for ICT)• «language-enabled … interaction & communication»

Page 14: Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director

May 28, 2006 Cocosda / WRITE Workshop 14

LT in FP7LT in FP7

• Article 169 large (several 100 M€) EC + MS + Article 169 large (several 100 M€) EC + MS + industry program) on LT in FP7 ?industry program) on LT in FP7 ?

• Present topics: SMEs, Metrology, Research in the Baltic sea…

• Joint Joint support to LT in FP7 support to LT in FP7 from MSfrom MS