Dublin Computational Linguistics Research Seminar Series
Since 1997, the Dublin Computational Linguistics Research Seminar series has been run jointly by DCU (Dublin City University), DIT (Dublin Institute of Technology), TCD (Trinity College Dublin) and UCD (University College Dublin).
The 2012/2013 seminar series is hosted by Trinity College with the support of the Department of Computer Science, the Centre for Language and Communication Studies, the Department of Germanic Studies, the School of Irish, the Department of French, the Centre for Computing and Language Studies and the Centre for Next Generation Localisation.
The talks are scheduled for 4:00 to 6:00 on Friday.
Schedule 2012 -- 2013
-
SPEAKER: Brian Murphy (CMU)
TITLE: Learning Mental Representations from the Web
VENUE: LCR, O'Reilly Bldg, TCD
DATE: 24 April 2013, 2-3pm
-
SPEAKER: Dorit Abusch (Cornell)
TITLE: Anaphoric relations in sequential and conflated pictures
VENUE: LCR, O'Reilly Bldg, TCD
DATE: 19 April 2013
-
SPEAKER: Francesca Bonin (TCD)
TITLE: Social Signal and discourse function
VENUE: LB01, Lloyd Bldg, TCD
DATE: 22 March 2013
-
SPEAKER: Mark Keane (UCD)
TITLE: Power Laws & Language: Can We Predict Population-Level Decision Making?
VENUE: LB01, Lloyd Bldg, TCD
DATE: 15 March 2013
-
SPEAKER: Joachim Wagner (DCU)
TITLE: Detecting grammatical errors using probabilistic parsing with treebank-induced grammars
VENUE: LB01, Lloyd Bldg, TCD
DATE: 8 March 2013
ABSTRACT: Today's dominant parsing technology uses grammars that have been automatically induced from treebanks, i.e. text annotated with syntactic structures. Given sufficiently large treebanks, such grammars tend to be highly robust to unexpected input and achieve wide coverage of unrestricted text. These are desirable properties in many applications. However, the robustness also covers grammatical errors. Almost all input is parsed into a (more or less plausible) parse tree, meaning that parsability cannot be used as a criterion for grammaticality. In this talk, I present three methods for applying probabilistic, treebank- induced grammars to the task of automatically judging the grammaticality of an input string. The best-performing method exploits the differences between parse results for grammars trained on grammatical and ungrammatical treebanks. This method combines well with n-gram and deep grammar-based approaches, as well as combinations thereof, in a machine learning-based framework. To address uncertain miss-classification costs and varying error densities, methods are evaluated with accuracy curves (which are related to ROC curves) and, during training, a set of optimal classifiers is selected from the ROC convex hull.
-
SPEAKER: Jennifer Foster (DCU)
TITLE: #hardtoparse: The Challenges of Parsing the Language of Social Media
VENUE: LCR, O'Reilly, TCD
DATE: 1 March 2013
-
SPEAKER: Tony Veale (UCD)
TITLE: Exploding the Creativity Myth: The Computational Foundations of Linguistic Creativity
VENUE: LCR, O'Reilly, TCD
DATE: 22 February 2013
To mark the publication of Tony's new book on the topic, we plan to host a reception after his talk (hence the change from the normal venue).
-
SPEAKER: Robin Cooper (University of Gothenburg)
TITLE: Judgement, truth and probability: taste, (dis)agreement and compromise
VENUE: Room 3074, Arts Building, TCD
DATE: 15 February 2013
-
SPEAKER: Dan Ventura (Brigham Young University, USA)
TITLE: Art[ificial]: Computational Creativity for Communicating Intention
VENUE: Room 3074, Arts Building, TCD
DATE: 8 February 2013
ABSTRACT: The question of computational creativity is as fundamental as the question of machine intelligence. One approach to answering the question is to attempt to build computational systems to which creativity may be attributed. I will discuss one such system, called DARCI, that we are developing to produce visual art that communicates intention. A major component of the DARCI project is a method for making visuo-linguistic associations, the goal being to map linguistic content (intention) onto visual representation. I will discuss some of the details of DARCI's implementation, results from studies of the system done to date that suggest DARCI can already communicate intention in limited ways, and an art exhibit, entitled Fitness Function, that contained human-produced artwork for which DARCI was the sole juror. This subject should interest both scientific and humanistic audiences, with the goals of the talk being to introduce the field of computational creativity, to get the audience thinking about some interesting questions and to be fun.
BIO: Dan Ventura is a Professor in the Computer Science Department at Brigham Young University. Prior to joining the faculty at BYU, he was a member of the Information Sciences and Technology division of the Applied Research Laboratory and a member of the Graduate Faculty of Computer Science and Engineering at Penn State University. Dan has also spent time in industry as a Research Scientist with fonix corporation, working on the development of state-of-the-art technology for large vocabulary continuous speech recognition. His research focuses on creating artificial intelligent systems that incorporate robustness, adaptation and creativity in their approaches to problem solving and incorporates neural models, machine learning techniques, and evolutionary computation.
-
SPEAKER: Liliana Mamani Sanchez, TCD
TITLE: Epistemic Signals and Emoticons Affect Kudos
VENUE: Room 3074, Arts Building, TCD
DATE: 1 February 2013
ABSTRACT: Community fora are increasingly used by companies as consumer-communication channels. In a community forum, users contribute by writing posts and by giving ratings (kudos) to posts they find relevant or useful. The focus of this work is on the interaction between emoticon use and epistemic hedges in the perception of individual contributions to discourse (and posters of those contributions) as deserving of kudos for their input. The communities with English as a lingua franca that we explore consist of self-motivated contributors to user-fora supported by a major multinational with a software technology company. User categories are determined by a few orthogonal classifications: employees, novice users, and experts; recipients of kudos vs. non-recipients of kudos; etc.
We explore the interaction between social signals and signals of certainty in content. Among the effects reported are the negative influence of epistemic hedges used in posting on propensity for others in the community to accord kudos to such postings, but a positive influence of the same in interaction with the use of emoticons. -
SPEAKER: Henk Zeevat, University of Amsterdam
TITLE: Automatic Self-Monitoring
VENUE: LB08 (Lloyd Bldg Basement)
DATE: 7 December 2012
-
SPEAKER: Gerard Lynch, TCD
TITLE: Detecting the source language of a literary translation
VENUE: LB08 (Lloyd Bldg Basement), TCD
DATE: 30 November 2012
ABSTRACT: In recent times there has been an increased interest in problems in translation stylistics from researchers in computational linguistics. Baroni and Bernardini (2006) spearheaded this new movement of collaboration between translation studies and the computational sciences with their study which applied machine learning techniques from the text classification literature to learn textual features which distinguish between translated and non-translated Italian journalistic text. Their work was also novel for their experiment which compared human classification/identification of translated text with the performance of computational methods on the same task. A related task was examined by van Halteren (2008) who used similar methods to detect the source language of translated text from the Europarl corpus in several European languages. Our work examines this question but in relation to literary translations, the question remains whether one can detect the source language of a literary translation, a genre for which automatic classification could be considered more complex due to the varying nature of literary style. A corpus of 19th century literary works was assembled for experimental purposes, including translations from German, French and Russian. In reference to Bernardini et al, English original texts were also included in the classification task. We present results on our classification experiments including analysis of the textual features found to be discriminatory in our task (word and POS ngrams and document statistics such as type-token ratio etc ). Classification results were found to be comparable to the state of the art(ca. 80%) based on 10-fold cross validation experiments and testing on a held out set. Testing on unseen data resulted in lower accuracy however results were still well above the baseline.