Upload
emery-randall
View
216
Download
3
Embed Size (px)
Citation preview
Ministère de l’Education Nationale, de l’EnseignemeMinistère de l’Education Nationale, de l’Enseignement Supérieurnt Supérieur et de la Recherche et de la Recherche
Language TechnologiesLanguage Technologies
for a Multilingual Europefor a Multilingual Europe
Joseph MarianiJoseph MarianiDirectorDirector
« Information & Communication Technologies » Department« Information & Communication Technologies » Department
French Ministry of ResearchFrench Ministry of Research
May 28, 2006 Cocosda / WRITE Workshop 2
Support to LT: Techno-langueSupport to LT: Techno-langue
• Report to the Prime Minister (November 2000)Report to the Prime Minister (November 2000)• Techno-langue ActionTechno-langue Action
– Language technology survey and evaluation
• Articulate with related existing programsArticulate with related existing programs– ICT Research & Innovation Technological Networks (RRIT)
• Telecommunications, Software Engineering, Audiovisual & Multimedia
– Ministry of Research action on Business Intelligence Tools (VSE)
May 28, 2006 Cocosda / WRITE Workshop 3
Techno-langue structureTechno-langue structure
Infrastructure program to support core LT progress,while innovative application projects stay with RRIT (110 M€ / year)
TELECOM SOFT AMM VSE
May 28, 2006 Cocosda / WRITE Workshop 4
Techno-langue CallTechno-langue Call– Language resources (data, tools)
– Evaluation (technology, application)
– Standards
– Technological survey
May 28, 2006 Cocosda / WRITE Workshop 5
Techno-langue CallTechno-langue Call• Launched in 2002, 3 year durationLaunched in 2002, 3 year duration• Funding by 3 ministries (Research, Industry, Culture)Funding by 3 ministries (Research, Industry, Culture)• Same on Vision Technology (Techno-vision) in 2005 (MoD)Same on Vision Technology (Techno-vision) in 2005 (MoD)• International cooperationInternational cooperation
– Foreign entities may participate in the projects, with their own funding
• All funded projects completed in 2006All funded projects completed in 2006– Joint Techno-X workshop (ASTI conference, October 2005) – Paper at LREC’2006 (S. Chaudiron, J. Mariani) + 16 papers– Book under preparation– Public presentation of results (Fall 2006)
• Feedback to research and industry (RRIT, VSE/Business Intelligence) • Presentation to administration Agencies (DoD, MAE…)
• LT in 2006 « Data Masses and Ambient Intelligence » CfPLT in 2006 « Data Masses and Ambient Intelligence » CfP– Managed by ANR – 3 M€ funding for LT
May 28, 2006 Cocosda / WRITE Workshop 6
Results of the CallResults of the Call
• 52 proposals submitted52 proposals submitted• 21 projects funded21 projects funded• 94 participants94 participants
– 33 industry
– 39 public research
– 11 other categories (Associations, CEA, French DoD…)
– 11 foreign (Bell Labs (USA), NII (Japan), EPFL, LATL…)
• Budget: 20 M€ effort- 7.5 M€ public funding (3 years)Budget: 20 M€ effort- 7.5 M€ public funding (3 years)• Special attention to the distribution of Language Special attention to the distribution of Language
Resources and Evaluation packagesResources and Evaluation packages
May 28, 2006 Cocosda / WRITE Workshop 7
21 funded projects21 funded projects• 10 on Language Resources (data and tools)10 on Language Resources (data and tools)• 2 on Standards (Spoken / Written) : support to ISO TC37-SC42 on Standards (Spoken / Written) : support to ISO TC37-SC4• 1 on Technological survey (Portal) : 1 on Technological survey (Portal) : http://www.technolangue.nethttp://www.technolangue.net
• 8 on Technology Evaluation8 on Technology Evaluation– Written language processing (5)
• EASY: Syntactic parsing• ARCADE 2: Text alignment• CESART: Terminology extraction• EQUER: Information query• CESTA: Machine translation
– Spoken Language processing (3)• EVASY: Speech synthesis• MEDIA: Spoken dialog• ESTER: Speech transcription / automatic indexing
May 28, 2006 Cocosda / WRITE Workshop 8
ESTERESTER• Task: «Rich» speech transcription and indexing evaluationTask: «Rich» speech transcription and indexing evaluation
– Broadcast news data in French (radio/TV)• 100 h manually transcribed (1 MW,350 speakers) + 1600 h untranscribed• Second largest worldwide
– 13 participants (3 companies)• Written transcription (RT / non RT)• Segmentation (sound, speaker recognition / diarization) • Named Entity recognition (from speech / transcribed text)• Topic detection and tracking for indexing : postponed
• Final internal Workshop in March 2005Final internal Workshop in March 2005• Distribution of Evaluation PackageDistribution of Evaluation Package
– Development and Test data, scoring, results. Data used in EASY.
• Workshop for linguists in May 2005Workshop for linguists in May 2005– Data and tools available, Results– Open issues necessitating Basic Scientific Research investigations
May 28, 2006 Cocosda / WRITE Workshop 9
LT for a Multilingual EuropeLT for a Multilingual Europe• Language as a specific issue for EuropeLanguage as a specific issue for Europe
– Economical, cultural and political challenge with 2 dimensions:– A) Preserve the EU Member States cultures
• Preference for native language (Web sites in German (75%)...)• 50% of European citizens only speak one language• (3% of Japanese people speak a foreign language)
– B) Allow for communication across member states• 1170 translators at the EC - 1.3 Mpages translated in 2001• 30% European Parliament budget (300 M€) – 500 translators• EU: 25 countries, 20 languages / 380 language pairs
– Enormous cost for the EU, while mandatory– Need for the assistance of Language Technologies
• Huge effort (# LT * # languages), too large for the EC aloneHuge effort (# LT * # languages), too large for the EC alone• Should be shared with EU Member States (subsidiarity)Should be shared with EU Member States (subsidiarity)
May 28, 2006 Cocosda / WRITE Workshop 10
Language Technologies EU ProgramLanguage Technologies EU Program• European Research Area (ERA)European Research Area (ERA)
– Coordinate EC (< 15%) and MS (> 85%) research efforts– ERA-Net initiative in FP6 to coordinate MS national programs
• LT well fitted with ERALT well fitted with ERA– EC prime responsibility :
• the coordination: management, standards, technology evaluation, communication...
• the development cost of generic Language Technologies:– Speech recognition, synthesis, understanding, spoken dialog, language
tagging, parsing, analysis, generation, text retrieval, document understanding, machine translation...
– Each Member State would primarily have the responsibility of ensuring a proper coverage of its language(s):
• Language Resources (essential) : (annnotated) corpus (spoken / written), lexicon (including pronunciations), dictionaries…
• Language specific technology development/adaptation
May 28, 2006 Cocosda / WRITE Workshop 11
Lang-Net proposalLang-Net proposal• Build-up ERA-Net proposal of infrastructural natureBuild-up ERA-Net proposal of infrastructural nature
– Language Resources, LT evaluation, Standards, Survey • Share of information• Strategic activities and Best Practice• Implementation of joint activities• Transnational research activities
– Identify EU countries or regions having similar programs• 11 countries / regions in partnership : Germany, France, Italy, Trento region,
Czech Republic, Denmark, Norway, The Netherlands / Belgium-Flanders (Dutch Language Union), Spain, Basque region, Sweden
• Austria, Catalonia, Finland, Greece, Iceland, Portugal, Switzerland, UK (contacts)
– Extendable to other partners• NMS (Slovenia, Cyprus, Poland, Hungary, Malta, Baltic countries…)• AS (Romania, Bulgaria…)• USA, Japan, South Africa, Israel, Canada… (contacts)
May 28, 2006 Cocosda / WRITE Workshop 12
Joint LT program proposalJoint LT program proposal• DG Research (ERA-Net program)DG Research (ERA-Net program)
– Lang-Net proposal submitted in march 2005, not selected– Look forward for Thematic ERA-Net+ in FP7
• DG INFSO + MediaDG INFSO + Media– «Science & Technology Forum on Multilingualism»
• June 2005 and February 2006 in Luxembourg
• DG Education, culture and mulDG Education, culture and multilingualismtilingualism– « A new framework strategy for multilingualism » (Nov. 2005)
• http://europa.eu.int/languages/ Web site in the 20 EU languages• EC will set up a High Level Group on Multilingualism• A EU ministerial conference will be held• Further communication will be presented by EC to Parliament and Council
– Committee of the regions (use of regional Spanish languages)
• TC-Star report : Introduction signed by V. Reding & J. FigelTC-Star report : Introduction signed by V. Reding & J. Figel
May 28, 2006 Cocosda / WRITE Workshop 13
French support to LT in FP7French support to LT in FP7• Visit of a French delegation to EC E DirectorateVisit of a French delegation to EC E Directorate
– H. Forster & B. Smith (September 2005)
• French MFrench Memorandum for a emorandum for a DDigital Europe (i2010)igital Europe (i2010)
• EuropeaEuropeann DDigital igital LLibraryibrary
• EU ICT Directors meeting (Vienna, March 2006)EU ICT Directors meeting (Vienna, March 2006)
• FP7 ICT program (2007-2013)FP7 ICT program (2007-2013)– Technology pillar : Simulation, Visualization, Interaction, mixed
realities• « Multilingual and automatic machine translation systems »
– Replace / add LT• « Language technology, including multilingual and automatic MT systems »
– FP7 Budget reduction (12 B€ to 9 B€ for ICT)• «language-enabled … interaction & communication»
May 28, 2006 Cocosda / WRITE Workshop 14
LT in FP7LT in FP7
• Article 169 large (several 100 M€) EC + MS + Article 169 large (several 100 M€) EC + MS + industry program) on LT in FP7 ?industry program) on LT in FP7 ?
• Present topics: SMEs, Metrology, Research in the Baltic sea…
• Joint Joint support to LT in FP7 support to LT in FP7 from MSfrom MS