13 August: fyi -- corpus linguistics, Portugal

Index of August 2008 | Index of year: 2008 | Full index

The ILTEC institute (Institute for Theoretical and Computational
Linguistics) is looking for a post-doc fellow for a period of five years,
to start before the end of 2008. More information about the ILTEC institute
can be found on its web-site listed above.

The main purpose of the fellowship is the development of projects in corpus
linguistics, specifically geared to attaining standards of high
international quality in the field.

ILTEC has been developing different corpora with multiple purposes in the
last few years - Termináutica, Judo, Cinema, Comércio Electrónico,
Nanotecnologia - all corpora of written specialized texts; REDIP - a corpus
of media productions, both written and spoken; CORPORAL - a large spoken
corpus of informal speech; as well as several newspaper corpora collected
for the purpose of tracking neologisms. Other corpora are also being
assembled for individual research purposes, primarily as part of several
PhD projects.

With all these corpora present at the institute, ILTEC is looking to build
an infrastructure for corpus data which will permit the integration of the
existing corpora, as well as the extension of the corpus material with new
parts to be compiled in the future. The infrastructure should allow
different information retrieval tasks, as well as knowledge extraction
tasks. The infrastructure should also enable the corpora to be used for
pedagogical purposes, for education in the fields of Lexicology,
Lexicography, Terminology, and Translation Studies. Furthermore, the
objective is to make these resources available not only to all linguists
from ILTEC, but also to the scientific community at large.

The researcher is expected to improve our capacities in gathering and
managing large-scale corpora of different types of discourse (oral /
written, general / specialized languages, etc.) and performing advanced
searches in these. One of the main applications of the researcher's work at
ILTEC will be the development of corpus-based terminological work, both
mono- and multilingual. His/her specific tasks will include: developing an
integrated infrastructure for large-scale resources and other corpora and
lexicons; assuring their availability; developing resources to be used to
foster terminological work; establishing automatic processes of information
extraction; applying statistical NLP tasks; and developing corpus
annotation systems.

More information about this position can be found at
http://www.eracareers.pt/opportunities/index.aspx?task=global&jobId=11083

Candidates should have a PhD in the area of corpus linguistics,
computational linguistics, or a related field, and (preferentially) have at
least three years of research experience since the completion of their PhD.
Apart from a strong background in corpus linguistics, computational
semantics, natural language processing, statistical NLP and/or machine
learning, candidates are expected to have some working experience and
knowledge of the use of database systems, and some experience in
programming (Perl, C/C++, PHP). Candidates should also have a good command
of English, and should have, or be willing to develop, active knowledge of
spoken and written Portuguese, which is the target language of the majority
of projects developed at ILTEC.

The position involves a contract for a fixed period of 5 years, and
consists of a full research position with no teaching obligations. The
annual salary for the position before tax is just over €43,000, equal to
that of an assistant professor.

Application Deadline: 30-Sep-2008
Mailing Address for Applications:
ILTEC
Rua Conde de Redondo, 74, 5º andar
Lisbon 1150-109 Lisboa
Portugal
Email Address for Applications: direc@iltec.pt
Contact Information
ILTEC
Email: direc@iltec.pt

Index of August 2008 | Index of year: 2008 | Full index