Invited Talks

Title: Words and People

Speaker: Rada Mihalcea, University of Michigan

 

Abstract

What do the words we use say about us and about how we view the world surrounding us? And what do we - as speakers of those words with our own defining attributes, imply about the words we utter? In this talk, I will explore the relation between words and people and show how we can develop cross-cultural word models to identify words with cultural bias – i.e., words that are used in significantly different ways by speakers from different cultures. Further, I will also show how we can effectively use information about the speakers of a word (i.e., their gender, culture) to build better word models.

 

Short bio

Rada Mihalcea is a Professor in the Computer Science and Engineering department at the University of Michigan. Her research interests are in computational linguistics, with a focus on lexical semantics, multilingual natural language processing, and computational social sciences. She serves or has served on the editorial boards of the Journals of Computational Linguistics, Language Resources and Evaluations, Natural Language Engineering, Research in Language in Computation, IEEE Transactions on Affective Computing, and Transactions of the Association for Computational Linguistics. She was a program co-chair for the Conference of the Association for Computational  Linguistics (2011) and the Conference on Empirical Methods in Natural Language Processing (2009), and a general chair for the Conference of the North American  Chapter of the Association for Computational Linguistics (2015). She is the recipient of a National Science Foundation CAREER award (2008) and a Presidential Early Career Award for Scientists and Engineers awarded by President Obama (2009). In 2013, she was made an honorary citizen of her hometown of Cluj-Napoca, Romania.


Title: Learning large and small: How to transfer NLP successes to low-resource languages

Speaker: Trevor Cohn, The University of Melbourne

 

Abstract

Recent advances in NLP have predominantly been based upon supervised learning over large corpora, where rich expressive models, such as deep learning methods, can perform exceptionally well. However, these state of the art approaches tend to be very data hungry, and consequently do not elegantly scale down to smaller corpora, which are more typical in many NLP applications.

 

In this talk I will describe the importance of small data in our field, drawing particular attention to so-called “low-“ or “under-resourced” languages, for which corpora are scarce, and linguistic annotations scarcer yet.  One of the key problems for our field is how to translate successes on the few high-resource languages to practical technologies for the remaining majority of the world’s languages. I will cover several research problems in this space, including transfer learning between high- and low-resource languages, active learning for selecting text for annotation, and speech processing in a low-resource setting, namely learning to translate audio inputs without transcriptions. I will finish by discussing open problems in natural language processing that will be critical in porting highly successful NLP work to the myriad of less-well-studied languages.

 

Short bio

Trevor Cohn is an Associate Professor and ARC Future Fellow at the University of Melbourne, in the School of Computing and Information Systems.  He received Bachelor degrees in Software Engineering and Commerce, and a PhD degree in Engineering from the University of Melbourne. He was previously based at the University of Sheffield, and before this worked as a Research Fellow at the University of Edinburgh. His research interests focus on probabilistic and statistical machine learning for natural language processing, with applications in several areas including machine translation, parsing and grammar induction. Current projects include translating diverse and noisy text sources, deep learning of semantics in translation, rumour diffusion over social media, and algorithmic approaches for scaling to massive corpora. Dr. Cohn’s research has beenrecognised by several best paper awards , including best short paper at EMNLP in 2016. He will be jointly organising ACL 2018 in Melbourne.


Title: Strategies for Discovering Underlying Linguistic Structure

Speaker: Jason Eisner, Johns Hopkins University

 

Abstract

A goal of computational linguistics is to automate the kind of reasoning that linguists do. Given text in a new language, can we determine the underlying morphemes and the grammar rules that arrange and modify them?

 

The Bayesian strategy is to devise a joint probabilistic model that is capable of generating the descriptions of new languages.  Given data from a particular new language, we can then seek explanatory descriptions that have high prior probability.  This strategy leads to fascinating and successful algorithms in the case of morphology.

 

Yet the Bayesian approach has been less successful for syntax.  It is limited in practice by our ability to (1) design accurate models and (2) solve the computational problem of posterior inference. I will demonstrate some remedies: build only a partial (conditional) model, and use synthetic data to train a neural network that simulates correct posterior inference.

 

Short bio

Jason Eisner is Professor of Computer Science at Johns Hopkins University, where he is also affiliated with the Center for Language and Speech Processing, the Machine Learning Group, the Cognitive Science Department, and the national Center of Excellence in Human Language Technology. His goal is to develop the probabilistic modeling, inference, and learning techniques needed for a unified model of all kinds of linguistic structure. His 100+ papers have presented various algorithms for parsing, machine translation, and weighted finite-state machines; formalizations, algorithms, theorems, and empirical results in computational phonology; and unsupervised or semi-supervised learning methods for syntax, morphology, and word-sense disambiguation. He is also the lead designer of Dyna, a new declarative programming language that provides an infrastructure for AI research. He has received two school-wide awards for excellence in teaching.

 Important Dates

Program at a Glance

Pre-Conference Workshops and Tutorials:
November 27, 2017

Main Conferences:
November 28-30, 2017

Post-Conference Workshops and Shared Tasks:
December 1, 2017