Dr. Jennifer Foster (DCU) #hardtoparse: The Challenges of Parsing the Language of Social Media The emergence of social media represents a significant challenge for natural language processing researchers. How suitable are existing NLP tools,often trained on newswire and expecting grammatically well-formed input, for processing the linguistically diverse mix of genres and domains that constitutes the modern web? How robust are these tools to the non-standard forms found in unedited, casually written language? To what extent can domain adaptation techniques be used to improve performance? How important are data pre-processing and normalisation? In this talk, I will focus on the problem of syntactic parsing and describe the work carried out to date by researchers in the National Centre for Language Technology in Dublin City University on the problem of parsing the language of social media. This work includes an evaluation of four widely used statistical parsers on a new dataset of tweets and discussion forum posts, as well as experiments which aim to improve parsing performance using the following methods: 1. Modelling the target domain, i.e. transforming the parser training data (in our case, Penn Treebank) so that it more closely resembles the data to be parsed, and then training a new parsing model 2. Self-training and up-training using large quantities of automatically labelled data 3. A combination of data normalisation, parser accuracy prediction to select suitable training data, genre classification, and self-training using products of random latent variable grammars The third approach proved to be very effective in the recent shared task on parsing English web data (https://sites.google.com/site/sancl2012/home/shared-task/results).