SlideShare a Scribd company logo
Multilingual Data Value Chain for CEF
Automated Translation:
Interoperability Plan
CEF.AT workshop
Luxembourg, 22 Sept 2015
Dave Lewis, ADAPT Centre
dave.lewis@adaptcentre.ie
TranslatorsCEF.AT
Sustainable ML Data Value Chain
Improved
Productivity
Low Cost
MT
Domain
Adapted
Approved
Terminology
Language
Pairs
Language
Resources
TM
Term base
Domain
knowledge
Quality
Assurance
Language Resources
Discover
Check Rights
Select & Use
Translation Productivity
Postedit
Consume
Quality Assure
LR Enrichment
Enrichment
Services
Validate
Annotation
Self-build Micro-
domains
Value Chain Interoperability
•  Move from Archival Curation to Active Curation
•  Open meta-data published at source:
–  W3C Data Catalogue Vocabulary (DCAT)
–  Legacy meta-data conversion & validation
–  Concretely: Meta-Share Linked Data Mapping
•  http://www.w3.org/community/ld4lt/wiki/Meta-
Share_OWL_metamodel
•  Searchable Cataloguing Service:
–  Concretely – LingHub:
•  http://linghub.lider-project.eu/
•  Machine readable rights/license
–  W3C Open Digital Rights Language
•  https://www.w3.org/community/odrl/
–  Use for Translation IP
LR: Discovery & Useage Rights
•  Linked data from existing format:
–  TMX, XLIFF to W3C CSV-on-the-Web to
RDF
•  Selection meta-data
–  Provenance (MT or PE) & translation
language codes
–  Dereferencable segments for open
annotation of terms
•  MT Web Service APIs
–  Forced decoding with term translations
–  Iterative Re-training API
–  MT log data: out of vocabulary & forced term
to inform PE productivity
LR Select & Use
•  Bottom line: did MT make translation more
productive?
•  Measure #1: Post-editing effort
–  A/B test on total segment post-editing time
–  Open Edit Vector format
–  iOmegaT- instrumented open source CAT tool
–  Edit vector analysis tool - licensable
•  Measure #2: ML Web Site analysis
–  A/B test on translated web pages (MT vs PE vs HT)
–  Easyling web translation proxy
Translation Productivity
•  Enrich segments with links to open
lexical-conceptual resources
–  Word Sense Disambiguation, Entity
Linking, Automated Term Extraction
•  Babelfy API
•  DBPedia Spotlight, TaaS APIs
•  Open validation
–  Publish +/-ve validation of enrichment from
translation projects
–  In-context validation from project
posteditors and terminologists using TBX
status flags
LR Enrichment
•  Goal: Reduce cost of collecting and selecting parallel data
•  Agree & Promote DCAT Profile for publishing public
sector parallel text
•  Establish suite of common machine-readable licences
(ODRL)
•  DCAT and licence meta-data profile for standardised
parallel text format
–  XLIFF 2.0 module
–  TMX update – new OASIS TC
–  CSV on the Web
•  Linghub as basis for public index/search service
•  Minimise distance between published parallel text and
meta-data passed along translation value chain
Interoperability Plan: Parallel Text
•  Goal: Make it easy for public bodies to
measure impact of MT on their translation
processes
•  Agree/Promote Open Edit Vector format
– Encourage integration in CAT tools
•  Guidelines on A/B testing, analysis and
interpretation
•  Open feedback channels to CEF.AT
Interoperability Plan: Productivity
•  Goal: annotate segments with links to terms and
lexical-conceptual resources
•  Agree/promote Open Annotation links
–  XLIFF 2.1: inline ITS Terminology and TextAnalysis
attributes or standoff with XLIFF fragment
–  Need similar ITS profile and fragment for TMX
–  Profile W3C CSV-on-the-Web with Open Annotation
•  Guidelines on deferencing Links to Term-bases
or Lexical-Conceptual resources
–  W3C Ontolex group
•  Validation workflow and feedback
– Trials with FREME, Babelfy, others
Interoperability Plan: LR Enrichment
THANK YOU!
Dave.Lewis@adaptcentre.ie

More Related Content

PPTX
Benchmarking of distributed linked data streaming systems
Holistic Benchmarking of Big Linked Data
 
PPTX
Archaeology in Europeana quality assurance, enrichment and publishing
CARARE
 
PDF
FIWARE Global Summit - Defragmenting the IoT with the Web of Things
FIWARE
 
PPT
LoCloud Micro Services and the Digitisation Workflow
locloud
 
PPTX
Publishing "5 star" data: the case for RDF
PeterWinstanley1
 
PDF
ActiveMigrate - ECM Renovation Roadshow
Zia Consulting
 
PPTX
LoCloud: overview of LoCloud Services
locloud
 
PPSX
V6 Sales Slide
guest223f4c1
 
Benchmarking of distributed linked data streaming systems
Holistic Benchmarking of Big Linked Data
 
Archaeology in Europeana quality assurance, enrichment and publishing
CARARE
 
FIWARE Global Summit - Defragmenting the IoT with the Web of Things
FIWARE
 
LoCloud Micro Services and the Digitisation Workflow
locloud
 
Publishing "5 star" data: the case for RDF
PeterWinstanley1
 
ActiveMigrate - ECM Renovation Roadshow
Zia Consulting
 
LoCloud: overview of LoCloud Services
locloud
 
V6 Sales Slide
guest223f4c1
 

Viewers also liked (20)

DOCX
Qué es una base de datos
carlos alberto rojas garcia
 
DOC
Pat_RESUME
Patrick Cundiff
 
PDF
Autocad commands for civil and mechanical Download
couponsavan
 
PDF
Presentation1
Pranjali Mahajan
 
PPTX
la música
Cristian Villegas
 
PPTX
P.E.A.C.E. in mediation - Mick Symons
Resolution Institute
 
PDF
15205061 (klp 5)
mamogi
 
PPT
Blended e learning and the e-learning planning framework
Hazel Owen
 
PPTX
Chapter 5 dimensioning in auto cad 2010
Shelly Wilke
 
PPTX
Perkembangan pada masa reformasi
Riris Ros Lina
 
PDF
Atajos autocad
Leo Joel Rojas Romero
 
PPTX
Tenses
Nur Fadholi
 
DOCX
Proposal smk
bayu wijayanto
 
PPTX
Chapter 15 isometric pictorials in auto cad 2010
Shelly Wilke
 
DOCX
Perbandingan Pendidikan di Indonesia, Finlandia dan Jepang
Izan M.Pd
 
PDF
GHAMAS Design Principles
Michael Rawlins
 
PPT
Engineering Drawing: Chapter 09 section
mokhtar
 
PDF
Lecture 10 bending stresses in beams
Deepak Agarwal
 
PDF
Basic design & visual arts (Elements of design)
Panorama Visualization, Pokhariput
 
PPT
Basic Theory of Architecture
Architecture Faculty
 
Qué es una base de datos
carlos alberto rojas garcia
 
Pat_RESUME
Patrick Cundiff
 
Autocad commands for civil and mechanical Download
couponsavan
 
Presentation1
Pranjali Mahajan
 
la música
Cristian Villegas
 
P.E.A.C.E. in mediation - Mick Symons
Resolution Institute
 
15205061 (klp 5)
mamogi
 
Blended e learning and the e-learning planning framework
Hazel Owen
 
Chapter 5 dimensioning in auto cad 2010
Shelly Wilke
 
Perkembangan pada masa reformasi
Riris Ros Lina
 
Atajos autocad
Leo Joel Rojas Romero
 
Tenses
Nur Fadholi
 
Proposal smk
bayu wijayanto
 
Chapter 15 isometric pictorials in auto cad 2010
Shelly Wilke
 
Perbandingan Pendidikan di Indonesia, Finlandia dan Jepang
Izan M.Pd
 
GHAMAS Design Principles
Michael Rawlins
 
Engineering Drawing: Chapter 09 section
mokhtar
 
Lecture 10 bending stresses in beams
Deepak Agarwal
 
Basic design & visual arts (Elements of design)
Panorama Visualization, Pokhariput
 
Basic Theory of Architecture
Architecture Faculty
 
Ad

Similar to Multilingual Data Value Chain for CEF Automated Translation: Interoperability Plan (20)

PDF
The Standards Mosaic Opening the Way to New Technologies
Dave Lewis
 
PDF
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Association for Computational Linguistics
 
PDF
Knowledge Technologies group at Cefriel
Irene Celino
 
PDF
Turning Transport Data to Comply with EU standards while Enabling a Multimoda...
Mario Scrocca
 
PPTX
EDF2013: Selected talk by David Lewis: Linked Data Reuse in the Language Serv...
European Data Forum
 
PPTX
Linked Open Data Cloud
PretaLLOD
 
PDF
Crosslingual search-engine
Carlos Badenes-Olmedo
 
PPTX
Localisation World Dublin 2014 - Disruptive Innovation: Opportunities and Cha...
Nova Language Solutions
 
PPT
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
Mike Bergman
 
PPTX
Enabling Language Resources to Expose Translations as Linked Data on the Web
Jorge Gracia
 
PPTX
Common industry API for translation services presented by TAUS at FEISGILTT
TAUS - The Language Data Network
 
PDF
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Project
 
PDF
AI for Translation Technologies and Multilingual Europe
Georg Rehm
 
PDF
Methodology for Linguistic Linked Open Data generation. The Apertium RDF case
Jorge Gracia
 
PDF
Translation technology plugging the gaps_ecpd
Lucinda Brooks
 
PDF
The Snap Solution to Turn Transport Data into EU Compliance
Marco Comerio
 
PPTX
Presentation at CEF-EU-Luxembourg
Manuel Herranz
 
PDF
Semantic interoperability in Transport Domain
David Chaves-Fraga
 
PPT
Knowledge graphs in search engines
Emanuele Della Valle
 
PDF
20110728 datalift-rpi-troy
François Scharffe
 
The Standards Mosaic Opening the Way to New Technologies
Dave Lewis
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Association for Computational Linguistics
 
Knowledge Technologies group at Cefriel
Irene Celino
 
Turning Transport Data to Comply with EU standards while Enabling a Multimoda...
Mario Scrocca
 
EDF2013: Selected talk by David Lewis: Linked Data Reuse in the Language Serv...
European Data Forum
 
Linked Open Data Cloud
PretaLLOD
 
Crosslingual search-engine
Carlos Badenes-Olmedo
 
Localisation World Dublin 2014 - Disruptive Innovation: Opportunities and Cha...
Nova Language Solutions
 
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
Mike Bergman
 
Enabling Language Resources to Expose Translations as Linked Data on the Web
Jorge Gracia
 
Common industry API for translation services presented by TAUS at FEISGILTT
TAUS - The Language Data Network
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Project
 
AI for Translation Technologies and Multilingual Europe
Georg Rehm
 
Methodology for Linguistic Linked Open Data generation. The Apertium RDF case
Jorge Gracia
 
Translation technology plugging the gaps_ecpd
Lucinda Brooks
 
The Snap Solution to Turn Transport Data into EU Compliance
Marco Comerio
 
Presentation at CEF-EU-Luxembourg
Manuel Herranz
 
Semantic interoperability in Transport Domain
David Chaves-Fraga
 
Knowledge graphs in search engines
Emanuele Della Valle
 
20110728 datalift-rpi-troy
François Scharffe
 
Ad

Recently uploaded (20)

PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PDF
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PDF
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 

Multilingual Data Value Chain for CEF Automated Translation: Interoperability Plan

  • 1. Multilingual Data Value Chain for CEF Automated Translation: Interoperability Plan CEF.AT workshop Luxembourg, 22 Sept 2015 Dave Lewis, ADAPT Centre [email protected]
  • 2. TranslatorsCEF.AT Sustainable ML Data Value Chain Improved Productivity Low Cost MT Domain Adapted Approved Terminology Language Pairs Language Resources TM Term base Domain knowledge Quality Assurance
  • 3. Language Resources Discover Check Rights Select & Use Translation Productivity Postedit Consume Quality Assure LR Enrichment Enrichment Services Validate Annotation Self-build Micro- domains Value Chain Interoperability
  • 4. •  Move from Archival Curation to Active Curation •  Open meta-data published at source: –  W3C Data Catalogue Vocabulary (DCAT) –  Legacy meta-data conversion & validation –  Concretely: Meta-Share Linked Data Mapping •  http://www.w3.org/community/ld4lt/wiki/Meta- Share_OWL_metamodel •  Searchable Cataloguing Service: –  Concretely – LingHub: •  http://linghub.lider-project.eu/ •  Machine readable rights/license –  W3C Open Digital Rights Language •  https://www.w3.org/community/odrl/ –  Use for Translation IP LR: Discovery & Useage Rights
  • 5. •  Linked data from existing format: –  TMX, XLIFF to W3C CSV-on-the-Web to RDF •  Selection meta-data –  Provenance (MT or PE) & translation language codes –  Dereferencable segments for open annotation of terms •  MT Web Service APIs –  Forced decoding with term translations –  Iterative Re-training API –  MT log data: out of vocabulary & forced term to inform PE productivity LR Select & Use
  • 6. •  Bottom line: did MT make translation more productive? •  Measure #1: Post-editing effort –  A/B test on total segment post-editing time –  Open Edit Vector format –  iOmegaT- instrumented open source CAT tool –  Edit vector analysis tool - licensable •  Measure #2: ML Web Site analysis –  A/B test on translated web pages (MT vs PE vs HT) –  Easyling web translation proxy Translation Productivity
  • 7. •  Enrich segments with links to open lexical-conceptual resources –  Word Sense Disambiguation, Entity Linking, Automated Term Extraction •  Babelfy API •  DBPedia Spotlight, TaaS APIs •  Open validation –  Publish +/-ve validation of enrichment from translation projects –  In-context validation from project posteditors and terminologists using TBX status flags LR Enrichment
  • 8. •  Goal: Reduce cost of collecting and selecting parallel data •  Agree & Promote DCAT Profile for publishing public sector parallel text •  Establish suite of common machine-readable licences (ODRL) •  DCAT and licence meta-data profile for standardised parallel text format –  XLIFF 2.0 module –  TMX update – new OASIS TC –  CSV on the Web •  Linghub as basis for public index/search service •  Minimise distance between published parallel text and meta-data passed along translation value chain Interoperability Plan: Parallel Text
  • 9. •  Goal: Make it easy for public bodies to measure impact of MT on their translation processes •  Agree/Promote Open Edit Vector format – Encourage integration in CAT tools •  Guidelines on A/B testing, analysis and interpretation •  Open feedback channels to CEF.AT Interoperability Plan: Productivity
  • 10. •  Goal: annotate segments with links to terms and lexical-conceptual resources •  Agree/promote Open Annotation links –  XLIFF 2.1: inline ITS Terminology and TextAnalysis attributes or standoff with XLIFF fragment –  Need similar ITS profile and fragment for TMX –  Profile W3C CSV-on-the-Web with Open Annotation •  Guidelines on deferencing Links to Term-bases or Lexical-Conceptual resources –  W3C Ontolex group •  Validation workflow and feedback – Trials with FREME, Babelfy, others Interoperability Plan: LR Enrichment