13 February: fyi -- Polish text mining, Italy

Index of February 2012 | Index of year: 2012 | Full index



Call Reference Number: 2012-IPSC-16 - ISPRA Title: Multilingual Text
Mining and Evaluation (Polish)

Duration: 6 months

Location: Joint Research Centre (JRC), Ispra, Italy

URL on rules and conditions:
http://ec.europa.eu/dgs/jrc/downloads/jrc_trainee_rules_en.pdf

Application via staff recruitment application tool ESRA:
http://recruitment.jrc.ec.europa.eu





We are:



The mission of the Joint Research Centre (JRC) is to provide
customer-driven scientific and technical

support for the conception, development, implementation and monitoring
of EU policies. Being a Directorate-

General of the European Commission, the JRC functions both as the
in-house science service of the

Commission and as a reference centre for science and technology for
the Union. With 7 Scientific Institutes,

3 Corporate Directorates and the DG/DDG Office, the JRC is located in
5 Member States (Belgium,

Germany, Italy, the Netherlands and Spain). Further information is
available at: http://www.jrc.ec.europa.eu

The current vacancy is in the Institute for the Protection and
Security of the Citizen (located in Ispra, Italy).

The Institute provides research results and supports EU policy-makers
in their effort towards global security

and protection of European citizens from accidents, deliberate
attacks, fraud and illegal actions against EU

policies. More details on IPSC can be found at:
http://ipsc.jrc.ec.europa.eu.



The vacancy is within the Global Security and Crisis Management Unit
(GlobeSec), in the OPTIMA Action

(Open Source Text Information Mining and Analysis). Research and
development efforts in the OPTIMA

group produce novel and unique approaches and software that gather and
analyse an average of 100,000

media reports per day from online news portals world-wide in 50
languages. The tools classify according to

subject domains, cluster related articles, summarise the news
clusters, extract information from them,

aggregate the extracted information, track topics over time, issue
breaking news alerts and produce visual

presentations of the information found. See
http://emm.newsbrief.eu/overview.html to access the public

Europe Media Monitor (EMM) portals.



We propose:



We propose a trainee position in Ispra, Italy.



We are looking for a person to help us analyse Polish language news
and social media posts, and

specifically to help us adapt EMM’s multilingual suite of text mining
tools to the Polish language. EMM’s tools

- currently developed for up to 20 languages - include the following
functionality: Named Entity Recognition

and disambiguation (persons, organisation, locations, dates);
co-reference resolution of definite descriptions;

quotation recognition; document clustering; document categorisation
using Boolean search expressions;

multi-document summarisation; Statistical Machine Translation.



Trainee Project Sheet



The selected person will be a member of an international and highly
motivated team of researchers and

developers. They will learn about the inner workings of some of the
most highly multilingual text analysis

applications world-wide, and they are likely to become co-authors of
scientific publications on the

applications they work on.



The successful candidate will be asked to contribute to the group
effort by working on the following tasks:



· Creating lexical resources for Information Extraction, by using
semi-automatic methods;

· Exploiting externally available dictionaries and corpora, which
requires format conversion, data cleaning, consistency
checking;

· Adapting the currently existing language-independent rule set to
Polish, if necessary;

· Evaluating the output of the Polish text mining tools and helping to
improve them;

· Possibly, producing gold-standard annotations for various
information extraction tasks for evaluation purposes;

· Contribute to scientific publications (with co-authorship).



We look for:



We look for a candidate who fits the following description:



· University degree in Computational Linguistics or a related field,
either completed or near completion;

· Hands-on Java programming skills;

· Knowledge of Polish morphology;

· Ability to work in a predominantly English-speaking team;

· Willingness to contribute hands-on to produce working online
applications.



One or more of the following skills would be an asset:



· Programming skills in a scripting language like Perl or Python;

· Knowledge of, and hands-on experience with, a variety of text mining
tools;

· Hands-on experience with using databases;

· Hands-on experience with using Polish linguistic resources;

· Experience in morphology, lexicology and or text annotation;

· Knowledge, even passive, of further natural languages;

· Experience with XML and with text data format conversion.



Mandatory language skills:



· For EU nationals: knowledge of at least 2 Community official
languages, of which one should be English, French or
German. Required 2nd language level is B2 according to the
Common European Framework of Reference for Languages.

· For non-EU nationals: very good knowledge of English, French or
German. Required level of the language is C2 according to
the Common European Framework of Reference for Languages.

· Other requirements are according to the Rules Governing the
Traineeship Scheme of the Joint Research Centre.



In order to apply please follow directions next to the published call:

http://recruitment.jrc.ec.europa.eu/

Please note that only online applications via the ESRA tool will be
considered.

Index of February 2012 | Index of year: 2012 | Full index