14
Ministère de l’Education Nationale, de l’Enseigneme Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur nt Supérieur et de la et de la Recherche Recherche Language Technologies Language Technologies for a Multilingual Europe for a Multilingual Europe Joseph Mariani Joseph Mariani Director Director « Information & Communication Technologies » « Information & Communication Technologies » Department Department French Ministry of Research French Ministry of Research

Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur et de la Recherche

  • Upload
    floyd

  • View
    40

  • Download
    2

Embed Size (px)

DESCRIPTION

Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director « Information & Communication Technologies » Department French Ministry of Research. Support to LT: Techno-langue. - PowerPoint PPT Presentation

Citation preview

Page 1: Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur  et de la Recherche

Ministère de l’Education Nationale, de l’EnseignemeMinistère de l’Education Nationale, de l’Enseignement Supérieurnt Supérieur et de la Recherche et de la Recherche

Language TechnologiesLanguage Technologies

for a Multilingual Europefor a Multilingual Europe

Joseph MarianiJoseph MarianiDirectorDirector

« Information & Communication Technologies » Department« Information & Communication Technologies » Department

French Ministry of ResearchFrench Ministry of Research

Page 2: Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur  et de la Recherche

May 28, 2006 Cocosda / WRITE Workshop 2

Support to LT: Techno-langueSupport to LT: Techno-langue

• Report to the Prime Minister (November 2000)Report to the Prime Minister (November 2000)• Techno-langue ActionTechno-langue Action

– Language technology survey and evaluation

• Articulate with related existing programsArticulate with related existing programs– ICT Research & Innovation Technological Networks (RRIT)

• Telecommunications, Software Engineering, Audiovisual & Multimedia

– Ministry of Research action on Business Intelligence Tools (VSE)

Page 3: Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur  et de la Recherche

May 28, 2006 Cocosda / WRITE Workshop 3

Techno-langue structureTechno-langue structure

Infrastructure program to support core LT progress,while innovative application projects stay with RRIT (110 M€ / year)

TELECOM SOFT AMM VSE

Page 4: Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur  et de la Recherche

May 28, 2006 Cocosda / WRITE Workshop 4

Techno-langue CallTechno-langue Call– Language resources (data, tools)

– Evaluation (technology, application)

– Standards

– Technological survey

Page 5: Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur  et de la Recherche

May 28, 2006 Cocosda / WRITE Workshop 5

Techno-langue CallTechno-langue Call• Launched in 2002, 3 year durationLaunched in 2002, 3 year duration• Funding by 3 ministries (Research, Industry, Culture)Funding by 3 ministries (Research, Industry, Culture)• Same on Vision Technology (Techno-vision) in 2005 (MoD)Same on Vision Technology (Techno-vision) in 2005 (MoD)• International cooperationInternational cooperation

– Foreign entities may participate in the projects, with their own funding

• All funded projects completed in 2006All funded projects completed in 2006– Joint Techno-X workshop (ASTI conference, October 2005) – Paper at LREC’2006 (S. Chaudiron, J. Mariani) + 16 papers– Book under preparation– Public presentation of results (Fall 2006)

• Feedback to research and industry (RRIT, VSE/Business Intelligence) • Presentation to administration Agencies (DoD, MAE…)

• LT in 2006 « Data Masses and Ambient Intelligence » CfPLT in 2006 « Data Masses and Ambient Intelligence » CfP– Managed by ANR – 3 M€ funding for LT

Page 6: Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur  et de la Recherche

May 28, 2006 Cocosda / WRITE Workshop 6

Results of the CallResults of the Call

• 52 proposals submitted52 proposals submitted• 21 projects funded21 projects funded• 94 participants94 participants

– 33 industry

– 39 public research

– 11 other categories (Associations, CEA, French DoD…)

– 11 foreign (Bell Labs (USA), NII (Japan), EPFL, LATL…)

• Budget: 20 M€ effort- 7.5 M€ public funding (3 years)Budget: 20 M€ effort- 7.5 M€ public funding (3 years)• Special attention to the distribution of Language Special attention to the distribution of Language

Resources and Evaluation packagesResources and Evaluation packages

Page 7: Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur  et de la Recherche

May 28, 2006 Cocosda / WRITE Workshop 7

21 funded projects21 funded projects• 10 on Language Resources (data and tools)10 on Language Resources (data and tools)• 2 on Standards (Spoken / Written) : support to ISO TC37-SC42 on Standards (Spoken / Written) : support to ISO TC37-SC4• 1 on Technological survey (Portal) : 1 on Technological survey (Portal) : http://www.technolangue.nethttp://www.technolangue.net

• 8 on Technology Evaluation8 on Technology Evaluation– Written language processing (5)

• EASY: Syntactic parsing• ARCADE 2: Text alignment• CESART: Terminology extraction• EQUER: Information query• CESTA: Machine translation

– Spoken Language processing (3)• EVASY: Speech synthesis• MEDIA: Spoken dialog• ESTER: Speech transcription / automatic indexing

Page 8: Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur  et de la Recherche

May 28, 2006 Cocosda / WRITE Workshop 8

ESTERESTER• Task: «Rich» speech transcription and indexing evaluationTask: «Rich» speech transcription and indexing evaluation

– Broadcast news data in French (radio/TV)• 100 h manually transcribed (1 MW,350 speakers) + 1600 h untranscribed• Second largest worldwide

– 13 participants (3 companies)• Written transcription (RT / non RT)• Segmentation (sound, speaker recognition / diarization) • Named Entity recognition (from speech / transcribed text)• Topic detection and tracking for indexing : postponed

• Final internal Workshop in March 2005Final internal Workshop in March 2005• Distribution of Evaluation PackageDistribution of Evaluation Package

– Development and Test data, scoring, results. Data used in EASY.

• Workshop for linguists in May 2005Workshop for linguists in May 2005– Data and tools available, Results– Open issues necessitating Basic Scientific Research investigations

Page 9: Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur  et de la Recherche

May 28, 2006 Cocosda / WRITE Workshop 9

LT for a Multilingual EuropeLT for a Multilingual Europe• Language as a specific issue for EuropeLanguage as a specific issue for Europe

– Economical, cultural and political challenge with 2 dimensions:– A) Preserve the EU Member States cultures

• Preference for native language (Web sites in German (75%)...)• 50% of European citizens only speak one language• (3% of Japanese people speak a foreign language)

– B) Allow for communication across member states• 1170 translators at the EC - 1.3 Mpages translated in 2001• 30% European Parliament budget (300 M€) – 500 translators• EU: 25 countries, 20 languages / 380 language pairs

– Enormous cost for the EU, while mandatory– Need for the assistance of Language Technologies

• Huge effort (# LT * # languages), too large for the EC aloneHuge effort (# LT * # languages), too large for the EC alone• Should be shared with EU Member States (subsidiarity)Should be shared with EU Member States (subsidiarity)

Page 10: Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur  et de la Recherche

May 28, 2006 Cocosda / WRITE Workshop 10

Language Technologies EU ProgramLanguage Technologies EU Program• European Research Area (ERA)European Research Area (ERA)

– Coordinate EC (< 15%) and MS (> 85%) research efforts– ERA-Net initiative in FP6 to coordinate MS national programs

• LT well fitted with ERALT well fitted with ERA– EC prime responsibility :

• the coordination: management, standards, technology evaluation, communication...

• the development cost of generic Language Technologies:– Speech recognition, synthesis, understanding, spoken dialog, language

tagging, parsing, analysis, generation, text retrieval, document understanding, machine translation...

– Each Member State would primarily have the responsibility of ensuring a proper coverage of its language(s):

• Language Resources (essential) : (annnotated) corpus (spoken / written), lexicon (including pronunciations), dictionaries…

• Language specific technology development/adaptation

Page 11: Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur  et de la Recherche

May 28, 2006 Cocosda / WRITE Workshop 11

Lang-Net proposalLang-Net proposal• Build-up ERA-Net proposal of infrastructural natureBuild-up ERA-Net proposal of infrastructural nature

– Language Resources, LT evaluation, Standards, Survey • Share of information• Strategic activities and Best Practice• Implementation of joint activities• Transnational research activities

– Identify EU countries or regions having similar programs• 11 countries / regions in partnership : Germany, France, Italy, Trento region,

Czech Republic, Denmark, Norway, The Netherlands / Belgium-Flanders (Dutch Language Union), Spain, Basque region, Sweden

• Austria, Catalonia, Finland, Greece, Iceland, Portugal, Switzerland, UK (contacts)

– Extendable to other partners• NMS (Slovenia, Cyprus, Poland, Hungary, Malta, Baltic countries…)• AS (Romania, Bulgaria…)• USA, Japan, South Africa, Israel, Canada… (contacts)

Page 12: Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur  et de la Recherche

May 28, 2006 Cocosda / WRITE Workshop 12

Joint LT program proposalJoint LT program proposal• DG Research (ERA-Net program)DG Research (ERA-Net program)

– Lang-Net proposal submitted in march 2005, not selected– Look forward for Thematic ERA-Net+ in FP7

• DG INFSO + MediaDG INFSO + Media– «Science & Technology Forum on Multilingualism»

• June 2005 and February 2006 in Luxembourg

• DG Education, culture and mulDG Education, culture and multilingualismtilingualism– « A new framework strategy for multilingualism » (Nov. 2005)

• http://europa.eu.int/languages/ Web site in the 20 EU languages• EC will set up a High Level Group on Multilingualism• A EU ministerial conference will be held• Further communication will be presented by EC to Parliament and Council

– Committee of the regions (use of regional Spanish languages)

• TC-Star report : Introduction signed by V. Reding & J. FigelTC-Star report : Introduction signed by V. Reding & J. Figel

Page 13: Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur  et de la Recherche

May 28, 2006 Cocosda / WRITE Workshop 13

French support to LT in FP7French support to LT in FP7• Visit of a French delegation to EC E DirectorateVisit of a French delegation to EC E Directorate

– H. Forster & B. Smith (September 2005)

• French MFrench Memorandum for a emorandum for a DDigital Europe (i2010)igital Europe (i2010)

• EuropeaEuropeann DDigital igital LLibraryibrary

• EU ICT Directors meeting (Vienna, March 2006)EU ICT Directors meeting (Vienna, March 2006)

• FP7 ICT program (2007-2013)FP7 ICT program (2007-2013)– Technology pillar : Simulation, Visualization, Interaction, mixed

realities• « Multilingual and automatic machine translation systems »

– Replace / add LT• « Language technology, including multilingual and automatic MT systems »

– FP7 Budget reduction (12 B€ to 9 B€ for ICT)• «language-enabled … interaction & communication»

Page 14: Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur  et de la Recherche

May 28, 2006 Cocosda / WRITE Workshop 14

LT in FP7LT in FP7

• Article 169 large (several 100 M€) EC + MS + Article 169 large (several 100 M€) EC + MS + industry program) on LT in FP7 ?industry program) on LT in FP7 ?

• Present topics: SMEs, Metrology, Research in the Baltic sea…

• Joint Joint support to LT in FP7 support to LT in FP7 from MSfrom MS