23 February: [jfoster@computing.dcu.ie: DCLRS, Friday February 24th, 4pm]

Dublin Computational Linguistics Research Seminar: Index of February 2012 | Dublin Computational Linguistics Research Seminar - Index of year: 2012 | Full index

Date: Mon, 20 Feb 2012 18:41:40 +0000
From: Jennifer Foster
To: Robert Ross ,
John Kelleher ,
Noel Fitzpatrick ,
Carl Vogel , Arthur Cater ,
Sarah Jane Delany ,
Brian Mac Namee ,
Colm Sloan ,
Niels Schütte ,
Mark Dunne , Yan Li ,
Kenneth Kennedy ,
Patrick Lindstrom
Subject: DCLRS, Friday February 24th, 4pm
Return-Path:
X-Original-To: vogel@cs.tcd.ie
X-Virus-Scanned: Debian amavisd-new at scss.tcd.ie
X-Spam-Flag: NO
X-Spam-Score: -2.701
X-Spam-Level:
X-Spam-Status: No, score=-2.701 required=5 tests=[AWL=-0.001, BAYES_00=-2.599,
DKIM_VALID=1, DKIM_VERIFIED=-0.1, RCVD_IN_DNSWL_LOW=-1,
SPF_PASS=-0.001]
X-Google-Sender-Auth: xJmx8YUOl9NLGceiYcx4eutu7OY

*Apologies for multiple postings*

Hi all

You are very welcome to the second talk in this year's Dublin
Computational Linguistics Research Seminar Series. The speaker is Dr.
Johann Roturier from Symantec (title and abstract below).

The talk will take place on Friday 24th February at 4pm in L2.21 in
the School of Computing, Dublin City University.

Hope to see you all there,

Jennifer

Dr. Jennifer Foster
L2.16
National Centre for Language Technology
School of Computing
Dublin City University
Dublin 9
Ireland
Phone: 003531 700 5263
Mobile: 00353 868135701
Web: http://nclt.computing.dcu.ie/~jfoster

**************************************************************************************************************************************************************************************************************************
MT evaluation: an industrial perspective

Dr. Johann Roturier

Principal Research Engineer, Symantec
Industry Partner, Centre for Next Generation Localisation

This talk is divided into two parts. First, we present the evaluation
results of a study conducted to determine the ability of various
Machine Translation systems in translating User-Generated Content,
particularly online forum content. Four systems are compared, focusing
on the English>German and English>French language . After describing
some of the characteristics of these systems, the methodological
framework used during a medium scale evaluation campaign is presented.
A careful analysis of both human and automated scores show that one
system is overall significantly better than the other three systems
for the English>German language pair, but that very little difference
exists for specific post types (such as questions and solutions). The
results are also much more balanced for the English>French language
pair, suggesting that all systems could be useful in a multi-system
deployment scenario. Our results also show that human scores and
automated scores do not consistently correlate, penalizing certain
systems more than others.
The second part of this talk focuses on SymEval, which is an
open-source, cross-platform, graphical translation evaluation toolkit
written using the Python programming language. SymEval can be used to
compare a sample translation (usually the output of a Machine
Translation system) with a reference translation. The sentence-level
comparison is achieved using a third-party library (General Text
Matcher) as well as the Python standard library. The reports produced
by the toolkit can be useful for comparing MT systems, assessing MT
output in pre-production runs, and determining over or under editing.
The original version of the toolkit was developed as an internal
prototype in 2007. In 2010 an open-source version was made available
to external users. We will outline and discuss the challenges
encountered during this journey, as well as the future of this
community-based project.

----- End forwarded message -----

Dublin Computational Linguistics Research Seminar - Index of February 2012 | Index of year: 2012 | Full index