7 March: DCLRS -- Alfredo Maldonado, Friday, March 10, 16:00 (TCD Lloyd
Dublin Computational Linguistics Research Seminar: Index of March 2017 | Dublin Computational Linguistics Research Seminar - Index of year: 2017 | Full index
Friday of this week (March 10), at 16:00, in the Lloyd Lecture Theatre
01, in the Lloyd Basement (TCD; note the change of TCD venue from the
last DCLRS), Dr. Alfredo Maldonado (ADAPT) speaks on:
Title: Detection of Verbal Multi-Word Expressions via Conditional Random
Fields with Syntactic Dependency Features and Semantic Re-Ranking
Abstract: The automatic identification of Multi-Word Expressions (MWEs) has
long been recognised as an important but challenging task in Natural
Language Processing. An effort in response to this challenge was the Shared
Task on detecting Verbal MWEs (VMWEs) organised by PARSEME (www.parseme.eu).
In the context of this Shared Task, the five types of VMWEs to be identified
were 1) idiomatic and fixed expressions that involve verbs like "let the cat
out of the bag", "kick the bucket" or the French "il faut" ('one must'), 2)
Reflexives such as the French "se dérouler" ('to unfold', lit. 'to unroll
itself'), "se trouver" ('to be located', lit. 'to find itself'), "se battre"
('to fight', 'to battle', 'to strive'), 3) light-verb constructions like
"make a decision", "have a conversation", "take a nap", 4) verb-particle
constructions (a.k.a. phrasal verbs) like "take off", "put up", "ask
[somebody] out" and 5) any other type of language-specific VMWE not covered
by 1-4. This talk describes the ADAPT Centre's participation in this Shared
Task with a system that exploits universal syntactic dependency features
through a Conditional Random Fields (CRF) sequence model, and an optional
post-processing step that re-ranks the 10 best CRF-predicted label sequences
via semantic vector regression. As part of this description, the talk
introduces the intuitions that motivated our solution and gives a brief
technical overview of the concepts and techniques we exploited (syntactic
dependency tree structures, CRF learning and decoding, semantic word vectors
and decision trees regression). I shall also present an analysis of our
system's performance and compare it to the performance of the other
participant systems. In particular, I shall show that all systems would
struggle to beat a simple lookup baseline system and argue for a more
purpose-specific evaluation scheme. Our system was applied to the data of 15
different languages, ranking 2nd place in most, based on full VMWE
evaluation, and 1st in three languages, based on single word evaluation.
ABOUT THE SPEAKER
Dr. Alfredo Maldonado is a researcher within the Science Foundation
Ireland funded ADAPT Centre. Formerly, he worked as Linguistic
Engineer within Microsoft. Dr. Maldonado earned his PhD working
within the Computational Linguistics Group of the School of Computer
Science and Statistics in Trinity College Dublin.
------
The Dublin Computational Linguistics Research Seminar series is a
cooperation among Trinity College Dublin, Dublin City University,
University College Dublin and the Dublin Institute of Technology, a
long standing collaboration which overlaps with the SFI CNGL/ADAPT
centres.
www.scss.tcd.ie/disciplines/intelligent_systems/clg/clg_web/DCLRS
Dublin Computational Linguistics Research Seminar - Index of March 2017 | Index of year: 2017 | Full index