7 September: fyi -- jobs, UK
Index of September 2005 | Index of year: 2005 | Full index
Three new research posts are now available in the University of
Sheffield's Natural Language Processing group in the Department of
Computer Science. The jobs will be part of a team centred around GATE,
a General Architecture for Text Engineering. The posts will be for 1
year in the first instance (but funds will be available for extension
on new projects starting in 2006), on salary scales RA1A or RA1B. The
work will be in the areas of statistical and symbolic language
processing and knowledge technologies, and their support using GATE.
Full details here; contact Hamish Cunningham for further information
if required.
UK research workers have excellent working conditions, with flexible
hours and 6 weeks holiday per year (plus national days). There is the
opportunity to pursue a PhD as part of your work, and there are no
fees for doing so. The international nature of European research means
that travel is often part of the job. The Sheffield NLP group is
around 50 people, and has an excellent track-record in providing a
stable and innovative research environment.
The following skills (and others) will be relevant:
* language processing, information retrieval, speech recognition
* machine learning and information theory
* technical authoring
* research project administration
* ontological data modelling and knowledge management
* use of GATE
* programming in Java and in script languages
* software engineering, components, systems modelling and design
* programming tools like CVS, JBuilder, make, ANT, ...
* database programming in SQL and JDBC
* finite state language analysis
* natural language generation
* computational linguistics
* technical authoring
* research project administration
* finance industries
The GATE team is a close-knit group with a world-class reputation. Our
research partners include large organisation such as IBM,
Hewlett-Packard, Bertelsmann, British Telecom and the BBC, and are
located in cities as diverse as Athens and Galway, Sofia and
Paris. One of our distinctive contributions to the field is to make
our results available as open source software that is in use by many
of our colleagues in language- and knowledge-related fields. This
means we also benefit from strong relationships with leading groups in
a number of areas, and from feedback and improvements from users of
our software. The tools that we use to support our own work are now
quite advanced, and this means that we can move quickly in new
projects and new areas of research without continually having to
re-invent the wheel. We have a division of labour within the team, so
we can each play to our respective strengths. We participate in
leading evaluation forums such as ACE, Pascal and TIDES.
Sheffield lies on the Eastern slopes of the Pennine mountain range in
the centre of Northern England, with excellent outdoor pursuits very
near to the city centre and to the University. Connections to nearby
Manchester are excellent, where you can find the best UK nightlife
outside London.
These posts will initially be attached to one or more of the following projects:
* PrestoSpace, an \u20ac11m four year Integrated Project (IP) led by Institut National de l'Audiovisuel (the French national multimedia archive) and the BBC, addresses the pressing need to digitise 20th century analog media before they irretrievably decay. Justifying the costs of digitisation requires new models of content access and automated metadata production.
* SEKT, Semantic Knowledge Technologies, a \u20ac9m three year IP led by British Telecom, will investigate the synergies of Human Language Technology, Data Mining and Ontologies for Knowledge Management.
* KnowledgeWeb, a \u20ac7m four year Network of Excellence led by the University of Innsbruck, is the successor to the OntoWeb project, and will be the leading forum for knowledge technology networking in Europe.
* LIRICS, a \u20ac3m approx. 30 month project led by the INRIA, is an eContent project developing Linguistic Infrastructure for Interoperable Resources and Systems.
* ETCSL, is an AHRB £150k three year project. ETCSL is the world's largest Electronic Text Corpus of the Sumerian Language. State of the art language technology will facilitate new models of access for researchers via a web portal at the University of Oxford.
Four new projects begin in 2006, and successful applicants will be
well-placed to work on these beyond their initial contract.
Background (1): GATE and IE
Information Extraction (IE) analyses ordinary texts (or speech
transcripts) and produces formal data that can be used to populate
models in COTS products such as databases, spreadsheets or link
analysis engines. IE allows the mining of natural language data based
on the entities, events and relations present in that data.
GATE, a General Architecture for Text Engineering, is a development
environment and middleware framework for creating, adapting and
deploying HLT components, plus a collection of components for various
HLT tasks. GATE is in use across the research spectrum from blue sky
experimentation through technology transfer and into productisation,
with users from large corporates and SMEs to academic research. GATE
is probably the best system of its type currently available, and
almost certainly the world leader amongst open source systems.
Recent applications of IE and GATE include:
* organising scientific abstracts by chemical compound for medical informatics;
* discovering new industry groupings in commodities trading;
* generating personalised presentations for museum visitors;
* indexing sports videos to allow conceptual search of content;
* automating customer care call centres;
* summarisation of key information from company reports;
* digital libraries for language scholars and public cultural heritage.
The IE tools in GATE have been proven in a wide range of contexts and
have participated in all the major quantitative evaluation
competitions since 1995 (including MUC, TREC/QA, ACE, DUC, Pascal and
TIDES/Surprise Language), so we can now estimate quite precisely the
development resource required to do a particular task, and the
accuracy that can be expected as a result. We have built or are in
process of developing IE systems in languages including: Arabic,
Bengali, Bulgarian, Chinese, English, French, German, Greek, Russian,
Spanish and Swedish. Background (2): HLT for the Semantic Web
The Semantic Web (SW) is adding a machine-tractable layer to the
natural language web of HTML. The Grid initiative is constructing
infrastructure for distributed collaborative science, or
e-science. Web Services are driving the decomposition of monolithic
software into flexible component sets that can be reconfigured to keep
ahead in rapidly changing markets. The three areas are closely linked:
web technology is essential to the Grid; the Semantic Web and the Grid
are co-penetrating to form the Semantic Grid; Web Services underpin
the next generation of the Grid in the Open Grid Services
Architecture; Semantic Web Services (SWSs) allow dynamic construction
of applications from component services, and better service
description and discovery.
Together these developments represent the next stage of evolution for
the web, distributed computing and collaborative science. Key to the
success of the enterprise is the production and maintenance of formal
data. The SW and SWSs rely on formal semantics in the shape of
ontologies and related instance sets, or knowledge bases. Whereas the
simplicity of HTML and the ubiquity of natural language led to the
organic growth of the hypertext web, semantic data is harder to create
and maintain. HLT provides the missing link between language and
formal data, the glue to fix web services to their user constituency
and enable easier enterprise integration.
GATE is also capable of solving all common relationship problems,
walking your dog, and removing even the most stubborn stains.
Index of September 2005 | Index of year: 2005 | Full index