15 December: FYI -- post-doctoral research (Orleans, France)
Index of December 2017 | Index of year: 2017 | Full index
The PARSEME-FR (http://parsemefr.lif.univ-mrs.fr/doku.php) project
offers a 1.5-year post-doc position in Natural Language Processing,
starting in April 2018. Candidates should send their application
before February 1st, 2018 (see contact information below).
* Duration: 18 months, starting in April 2018 (open until filled)
* Location: to be discussed with the members of the PARSEME-FR consortium (Nancy, Orléans or Paris)
* Employer: University of Orléans
* Contract : fixed term position
* Remuneration: approx. 2,300€ per month net income (in addition to the salary, the contract includes health benefits)
## Topic:
**French MultiWord Expressions representation and parsing**
Many NLP applications require a fine-grained representation of the
syntactic (and sometimes semantic) structure of texts. The process of
building such a representation is called deep parsing. Recent work
combining symbolic and data-driven techniques have led to significant
advances in this field, notably in terms of robustness and
efficiency. Still, Multiword expressions (MWE), that is, groups of
(not always continuous) words that exhibit some idiosyncratic
properties, such as "hot dog", "hard disk", "kick the bucket", "pay
attention", etc. are still a major bottleneck for deep parsing (Sag et
al. 2001, Baldwin and Kim 2010). This is due, among other things, to
their unpredictable behavior at several levels (irregular
morpho-syntax, non-compositional semantics, ...) and to the lack of
annotated training data.
One of the goals of the PARSEME-FR project is to enhance the support
of MWEs in French parsing. To do so, 4 work packages have been
defined, dealing respectively with (i) MWE annotation in texts or
treebanks, (ii) MWE lexicons, (iii) MWE statistical and (iv) symbolic
parsing. The recruted post-doc will work in the last WP. Two
complementary aspects will be considered:
- the representation of MWEs in linguistic resources (including
electronic grammars, see e.g. (Abeillé, 2002)),
- the use of these MWE-aware resources in deep (symbolic and hybrid)
parsing (see e.g. (Foth and Menzel, 2006)).
Among existing resources for French, one may cite the FRMG (FRench
MetaGrammar) resource which corresponds to a linguistically motivated
abstract and modular description of the syntax of French (De La
Clergerie, 2010). FRMG has been successfully used to compute deep
representations of French texts. The first phase of the postdoc
project will consist in extending the expressive power of metagrammars
to provide compact representations of MWEs. A second step will consist
in extending FRMG with information about MWEs automatically extracted
from treebanks (e.g. syntactic or lexical constraints, distribution
information, etc.) and from external resources (e.g. lexicon and
grammars).
This extension of the linguistic description fed to the parser may
rise some efficiency issues. Indeed, the larger the size of the input
grammar, the larger the size of the parsing search space (due to
syntactic and/or lexical ambiguities). To control the exploration of
this search space, several techniques have been proposed including A*
algorithms for MWEs (Waszczul et al., 2017). The second phase of the
postdoctoral project will focus on the extension of existing
algorithms dedicated to MWE parsing and their application to the
DyALog engine used to run FRMG (De La Clergerie, 2013).
## Profile:
* PhD in computer science or computational linguistics
* Good knowledge of French and English (not necessarily native)
* Interest in linguistics and familiarity with language technology
* Capacity to work independently and as part of a team
## Important dates:
Application deadline: February 1, 2018 (or until fulfilled)
Position starts: April 2018
Duration: 18 months
## Contact information:
Enquiries and / or applications should be sent to Yannick Parmentier
(yannick.parmentier@loria.fr) and Eric de la Clergerie
(eric.de_la_clergerie@inria.fr).
Applications should contain an extended CV (mentioning the PhD
defense date and the names and contact information of 2 to 3
references) and a cover letter.
## References:
Abeillé A. (2002) « Une grammaire électronique du français », CNRS
Editions, Paris.
Baldwin T. and Kim S. N. (2010) « Multiword Expressions », in Nitin
Indurkhya and Fred J. Damerau (eds.), Handbook of Natural Language
Processing, Second Edition, CRC Press, Boca Raton, USA, pp. 267–292.
De La Clergerie E. (2010) « Building factorized TAGs with
meta-grammars », in The 10th International Conference on Tree
Adjoining Grammars and Related Formalisms - TAG+10, New Haven, CO,
USA, pp. 111-118.
De La Clergerie E. (2013) « Improving a symbolic parser through
partially supervised learning », in The 13th International
Conference on Parsing Technologies (IWPT), Nara, Japan.
Foth K. and Menzel W. (2006) « Hybrid parsing: using probabilistic
models as predictors for a symbolic parser », in Proceedings of the
21st International Conference on Computational Linguistics and the
44th annual meeting of the Association for Computational
Linguistics, Sydney, Australia, pp. 321-328.
Sag I., Baldwin T., Bond F., Copestake A. and Flickinger D. (2001) «
Multiword Expressions: A Pain in the Neck for NLP », in proceedings
of CICLing 2002: Computational Linguistics and Intelligent Text
Processing, Mexico, pp 1-15.
Waszczuk J., Savary A. and Parmentier Y. (2017) « Multiword
expression-aware A* TAG parsing revisited », in 13th International
Workshop on Tree-Adjoining Grammar and Related Formalisms, Umeå
(TAG+13), Sweden, pp. 84-93.
Index of December 2017 | Index of year: 2017 | Full index