Location: Facebook London, 1 Rathbone Square, Fitzrovia, London W1T 1FB, UK

Date: October 11, 2019

08:00-09:00 Registration + Breakfast

09:00-09:15 Opening

09:15-10:10 Keynote, Nando de Freitas: Exploration and Imitation

Session 1: Personalisation and responsible use of AI - Chair: Guillaume Bouchard

10:10-10:30 Vera Demberg: Must NLP consider individual differences in language processing more?

10:30-10:50 Dirk Hovy: With more Layers comes more Responsibility

10:50-11:20 Coffee break

Session 2: Knowledge representation and parsing - Chair: Ryan McDonald

11:20-11:40 Joakim Nivre: Is the End of Supervised Parsing in Sight? Twelve Years Later

11:40-12:00 Bonnie Webber: Implicit Discourse Relations co-exist with Explicit Relations

12:20-12:40 Ion Androutsopoulos: Embedding Biomedical Ontologies by Jointly Encoding Network Structure and Textual Node Descriptors

12:40-13:00 Natalie Schluter: Neural syntactic parsing seems so simple. Is it?

13:00-14:00 Lunch

Session 3: Grounding and common sense - Chair: Raquel Fernández

14:00-14:20 Gemma Boleda: Generic and situation-specific information in distributed representations

14:20-14:40 Reut Tsarfaty: The Empty Elements Project

14:40-15:00 Lucia Specia: Towards more holistic approaches to human-machine communication

15:00-15:30 Coffee Break

Session 4: Transformers - Chair: Matthias Gallé

15:30-15:50 André Martins: Beyond Sparsemax: Adaptively Sparse Transformers

15:50-16:10 Angela Fan: Pruning Transformers at Inference Time

16:10-16:30 Rico Sennrich: What Do Transformers Learn in NLP? Recent Insights from Model Analysis

16:30-17:15 Panel on Academia and Industry in NLP - Chair: Tim Rocktäschel, Panel members: Bonnie Webber, Lucia Specia, André Martins, Dirk Hovy, Angela Fan and Phil Blunsom

17:15-17:30 Closing ceremony

17:30-19:30 Poster Session & Happy Hour

Table of contents:

Nando de Freitas is a machine learning professor at Oxford University, a lead research scientist at Google DeepMind, and a Fellow of the Canadian Institute For Advanced Research (CIFAR) in the successful Neural Computation and Adaptive Perception program. He received his PhD from Trinity College, Cambridge University in 2000 on Bayesian methods for neural networks. From 1999 to 2001, he was a postdoctoral fellow at UC Berkeley in the AI group of Stuart Russell. He was a professor at the University of British Columbia from 2001 to 2014. He has spun off a few companies, most recently Dark Blue Labs acquired by Google. Among his recent awards are best paper awards at IJCAI 2013, ICLR 2016, ICML 2016, and the Yelp Dataset award for a multi-instance transfer learning paper at KDD 2015. He also received the 2012 Charles A. McDowell Award for Excellence in Research, and the 2010 Mathematics of Information Technology and Complex Systems (MITACS) Young Researcher Award.

Nando de Freitas: Exploration and Imitation

Research on imitation has progressed rapidly and considerably in recent years. I will cover some of the recent advances in one-shot imitation, the imitation-meta-learning connection, the impact of generative models on imitation, hierarchical imitation, high-fidelity imitation, intentional imitation, and third-person imitation. I will provide demonstrations using simulated environments and robots. Finally, I will discuss some areas where research is needed including selective imitation (ie who and what to imitate), very-long term imitation, and the emergence of abstract cultural concepts.

Vera Demberg is a Professor for Computer Science and Computational Linguistics at Saarland University in Germany. She obtained her PhD from the School of Informatics at Edinburgh University in 2010. From 2010-2016, she was the Head of an Independent Research Group at the Cluster of Excellence, Saarland University. From 2001-2006, she studied Linguistics at Stuttgart University and in 2005, she obtained her MSc in Artificial Intelligence also from Edinburgh University.

Vera Demberg: Must NLP consider individual differences in language processing more?

Today, variation in language comprehension by humans (as evidenced by disagreement between annotators or observed in misunderstandings) is usually regarded as noise that can be ignored. I will argue in my talk that some of the variation that we observe may be systematic, and that it may be related to individual differences in cognitive processing of language. Being able to model the underlying individual differences and predict their effects on human language comprehension may be the key to resolving problems for machine learning that result from seemingly inconsistent labels, and may enable us to build more adaptive NLP systems.

Dirk Hovy is associate professor of computer science at Bocconi University in Milan, Italy. He is interested in the interaction between language, society, and machine learning, or what language can tell us about society, and what computers can tell us about language. He has authored over 50 articles on these topics, including 3 best paper awards. He has organized one conference and several workshops on abusive language, ethics in NLP, and computational social science.

Outside of work, Dirk enjoys cooking, running, and leather-crafting. For updated information, see

Dirk Hovy: With more Layers comes more Responsibility

Neural Networks have revolutionized NLP: they have increased performance across the board, and enabled a number of applications that were not possible before. All this is a great opportunity, but also a new responsibility for NLP: never before was it so easy to write a powerful NLP system, just because you can. However, these systems are increasingly used in applications they were not intended for, by people who treat them as interchangeable black boxes. The results can be performance drops, but also systematic biases against various user groups.

As a consequence, we as NLP practitioners suddenly have a new role in addition to developer: considering the ethical implications of our systems, and educating the public about the possibilities and limitations of our work.

I will talk about the possibilities deep learning has opened up, and the caveats that come with it, and where this might lead NLP as a field. This includes case examples and some provocations for future directions.

Joakim Nivre is Professor of Computational Linguistics at Uppsala University. He holds a Ph.D. in General Linguistics from the University of Gothenburg and a Ph.D. in Computer Science from Växjö University. His research focuses on data-driven methods for natural language processing, in particular for morphosyntactic and semantic analysis. He is one of the main developers of the transition-based approach to syntactic dependency parsing, described in his 2006 book Inductive Dependency Parsing and implemented in the widely used MaltParser system, and one of the founders of the Universal Dependencies project, which aims to develop cross-linguistically consistent treebank annotation for many languages and currently involves over 80 languages and over 300 researchers around the world. He has produced over 250 scientific publications and has more than 14,000 citations according to Google Scholar (July, 2019). He was the president of the Association for Computational Linguistics in 2017.

Joakim Nivre: Is the End of Supervised Parsing in Sight? Twelve Years Later

At ACL in Prague in 2007, Rens Bod asked whether supervised parsing models would soon be a thing of the past, giving way to superior unsupervised models. For the next decade or so, the answer seemed to be negative, as supervised approaches to syntactic parsing continued to outperform their unsupervised counterparts by large margins. However, recent developments in our field has made the question relevant again, in at least two different ways. First, there is the question of whether the end of all parsing (supervised or other) is in sight, simply because systems that are trained end-to-end for real applications have no room (or need) for traditional linguistic representations of the kind that parsers produce, a question that I will not directly discuss in this talk. However, there is also the question of whether we need traditional supervised parsers even if we want to compute discrete syntactic representations of natural language sentences. To put this question into perspective, I will survey developments in dependency parsing since 2007, showing that most of the advanced parsing models proposed during the first half of this period have been made obsolete by advances in deep learning during the second half. Moreover, the advent of deep contextualized word embeddings appear to have eliminated the last remaining differences between different algorithmic approaches, suggesting that specialized parsing algorithms are largely superfluous in the state-of-the-art systems of today.

Bonnie Webber is best known for her computational and corpus work on discourse anaphora and discourse relations. Along with Aravind Joshi, Rashmi Prasad, Alan Lee and Eleni Miltsakaki, she is co-developer of the Penn Discourse TreeBank, whose latest release in 2019 contains information on over 53K discourse relations. In service to the field of NLP, she has served as President of the Association for Computational Linguistics (ACL) and Deputy Chair of the European COST action IS1312, "TextLink: Structuring Discourse in Multilingual Europe". She now serves as General Chair of EMNLP 2020 and Senior Chair for Language and Computation for ESSLLI 2020. She is a Fellow of the Association for Advancement of Artificial Intelligence (AAAI), the Association for Computational Linguistics (ACL), and the Royal Society of Edinburgh (RSE), and Professor Emeritus at the University of Edinburgh. Wherever she can, she works to promote women's visibility in the NLP community and in Science and Technology more generally.

Bonnie Webber: Implicit Discourse Relations co-exist with Explicit Relations

It is well-known that relations between sentences or clauses can be signalled explicitly (e.g., with conjunctions, discourse adverbials, lexico-syntactic constructions, etc.) or left to hearer inference (based on what the hearer takes to be the arguments of the relation). But texts containing a sequence of adjacent explicit connectives (e.g., "but instead" / "so instead", "because otherwise" / "but otherwise", "because then" / "but then" / "so then", etc.) suggest that, even if only one discourse relation between clausal/sentential arguments is signalled explicitly, other collocated evidence might lead a hearer to infer other relations as well.

Experiments that Hannah Rohde and I and colleagues have carried out over the past three years, with support from the Nuance Foundation, have shown that this is indeed the case. It also turns out to be a way of making sense of "disagreements" between human annotators who have been asked to indicate what discourse connective "best expresses" the relation that holds between two sentences [Malmi et al, LREC 2018]. As this has implications for predicting discourse connectives and relations for language technology applications, I will briefly describe some of our experiments and their results.

Ion Androutsopoulos is Associate Professor in the Department of Informatics, Athens University of Economics and Business (AUEB), Director of AUEB's Information Processing Lab, and head of AUEB's NLP Group. He is also Research Associate of the Research Centre "Athena" and NCSR "Demokritos". He holds a Diploma in Electrical Engineering from the National Technical University of Athens, an MSc in Information Technology from the University of Edinburgh, and a PhD in AI from the University of Edinburgh. Before joining AUEB, he was a researcher at NCSR "Demokritos" and the former MSR Institute of Macquarie University in Sydney. His current research interests include QA for structured and unstructured data, especially biomedical QA; NLG from databases, ontologies, and more recently medical images; text classification, including filtering spam and abusive content; IE and opinion mining, including legal text analytics and sentiment analysis; NLP tools for Greek. His group recently had the best document and snippet retrieval results in BioASQ 2018 and 2019, and also the best results in ImageCLEFmed Caption 2019. Ion co-organizes the 2019 Athens NLP Summer School, and co-organized EACL 2009 in Athens, the Large Scale Hierarchical Text Classification challenges (2010-14), the BioASQ challenges (2012-14), and the SemEval Aspect-Based Sentiment Analysis task (2014-16).

Ion Androutsopoulos: Embedding Biomedical Ontologies by Jointly Encoding Network Structure and Textual Node Descriptors

Network Embedding (NE) methods, which map network nodes to low-dimensional feature vectors, have wide applications in network analysis and bioinformatics. Many existing NE methods rely only on network structure, overlooking other information associated with the nodes, e.g., text describing the nodes. Recent attempts to combine the two sources of information only consider local network structure. We extend NODE2VEC, a well-known NE method that considers broader network structure, to also consider textual node descriptors using recurrent neural encoders. Our method is evaluated on link prediction in two networks derived from UMLS. Experimental results demonstrate the effectiveness of the proposed approach compared to previous work. This is joint work with Sotiris Kotitsas, Dimitris Pappas, Ryan McDonald, and Marianna Apidianaki, which was also recently presented at BioNLP 2019.

Natalie Schluter is Associate Professor at the IT University of Copenhagen. She is also Head of Programme for the Data Science BSc at the IT University of Copenhagen. Previously, she was Chief Analyst and Lead Data Scientist at MobilePay by DanskeBank, Copenhagen, Postdoctoral researcher at the University of Copenhagen, and Postdoctoral researcher at Malmö University, Sweden. She obtained her PhD from Dublin City University, Ireland, an MSc in Mathematics from Trinity College, Dublin, an MA in Linguistics and an Honours BSc in Mathematics from Université de Montréal, and a BA in French and Spanish from the University of British Columbia. Her main topics of interest are in Theoretical Computer Science (Graph Algorithms, Automata and Machine Learning) and applications for these theoretical findings in Natural Language Processing and other Data Science.

Natalie Schluter: Neural syntactic parsing seems so simple. Is it?

Neural network approaches to syntactic parsing achieve great success with rather simple models. In this talk I will explore the power of neural networks for this task and whether these models really are so simple.

Gemma Boleda is a tenure-track researcher at Universitat Pompeu Fabra in Barcelona, where she heads the Computational Linguistics and Linguistic Theory group (COLT). She previously held post-doctoral positions at the Department of Linguistics of The University of Texas at Austin and the CIMEC Center for Brain/Mind Sciences of the University of Trento. In her research, currently funded by an ERC Starting Grant, Dr. Boleda uses quantitative and computational methods to better understand the semantics of natural languages. She is a member of the standing review committee of TACL and an elected Information Officer of the SIGSEM Board. She acted as area co-chair of ACL 2016, program co-chair of *SEM 2015, and local co-chair at ESSLLI 2015.

Gemma Boleda: Generic and situation-specific information in distributed representations

For both humans and computational models, it is essential to be able to abstract away information from specific instances, e.g. to build a generic concept for "bird", such that it can be applied to different individuals. It is equally essential to model specific situations, for instance to understand the sentence "that bird is about to poke you" and react accordingly. Traditional distributional semantics handles generic word information very well, but struggles to account for situation-specific information. Deep learning models represent a step forward, since they are endowed with mechanisms to process contextual aspects and integrate them with generic word representations; however, it is at present unclear to what extent they represent situation-specific knowledge, and what aspects they can account for. I will present recent work at the interface between generic and situation-specific information that suggests that there is still a long way to go.

Reut Tsarfaty is Associate Professor at Bar-llan university and a research scientist at AI2 Israel. Reut holds a BSc. from the Technion and MSc./PhD. from the Institute for Logic Language and Computation (ILLC) at the University of Amsterdam. She also held postdoctoral fellowships at Uppsala University in Sweden and at the Weizmann Institute in Israel. Her research focuses on parsing, broadly interpreted to cover morphological, syntactic and semantic phenomena, of typologically different languages. Applications she has worked on include (but are not limited to) natural language programming, natural language navigation, automated essay scoring, analysis and generation of social media content, and more. Reut's research is funded by an ERC Starting Grant (677352) and ISF grant (1739/26).

Reut Tsarfaty: The Empty Elements Project

We know, at least since Grice's Maxim of quantity (Grice, 1975), that human speakers try to be as informative as they can. That is, they try to give as much information as is needed, but no more than that. In actuality this means that speakers drop a lot of content from their utterances, while hearers adjust their interpretation quietly and without a fuss to interpret and accommodate the missing information. For example, speakers may drop a single word, as in "I started the book, it is fascinating" (started <reading>), a phrase: "I went into the room, the walls were black." (the walls <of the room>), or an entire clause: "My wife called." (<I am married.>). Most of the NLP pipeline nowadays (tagging, parsing, NER, co-ref resolution) focus on interpreting what's in the text. However, the success of NLP resides precisely in identifying and resolving the elements that are outside the text and between the lines, yet are conceived as a natural part of the NL utterance interpretation by humans. In this talk I present the Empty Elements (EE) project, where we aim to automatically infer, expand, and resolve the elements that have been dropped yet are unambiguously inferred by humans. I start by surveying these phenomena and their importance for language technology, I then discuss the challenges in soliciting EE information from human speakers and in explicitly modeling them, and I finally sketch our approach, along with some preliminary results, towards their resolution.

Lucia Specia is Professor of Natural Language Processing at Imperial College London and University of Sheffield. Her research focuses on various aspects of data-driven approaches to language processing, with a particular interest in multimodal and multilingual context models and work at the intersection of language and vision. Her work can be applied to various tasks such as machine translation, image captioning, quality estimation and text adaptation. She is the recipient of the MultiMT ERC Starting Grant on Multimodal Machine Translation (2016-2021) and is currently involved in other funded research projects on machine translation, multilingual video captioning and text adaptation. In the past she worked as Senior Lecturer at the University of Wolverhampton (2010-2011), and research engineer at the Xerox Research Centre, France (2008-2009, now Naver Labs). She received a PhD in Computer Science from the University of São Paulo, Brazil, in 2008.

Lucia Specia: Towards more holistic approaches to human-machine communication

In this talk I will provide an overview of recent work on multimodal machine learning where images or videos are used to build richer context models for natural language tasks and argue towards building more holistic approach to human-machine communication.

André Martins is the VP of AI Research at Unbabel, a research scientist at Instituto de Telecomunicações, and an invited professor at Instituto Superior Técnico in the University of Lisbon. He received his dual-degree PhD in Language Technologies in 2012 from Carnegie Mellon University and Instituto Superior Técnico. His research interests include natural language processing, machine learning, deep learning, and optimization. He received a best paper award at the Annual Meeting of the Association for Computational Linguistics (ACL) for his work in natural language syntax, and a SCS Honorable Mention at CMU for his PhD dissertation. He is one of the co-founders and organizers of the Lisbon Machine Learning Summer School (LxMLS). He holds an ERC starting grant for the project DeepSPIN (Deep Structured Prediction in Natural Language Processing).

André Martins: Beyond Sparsemax: Adaptively Sparse Transformers

Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word relationships. However, with standard softmax attention, all attention heads are dense, assigning a non-zero weight to all context words. In this talk, I will introduce the adaptively sparse Transformer, wherein attention heads have flexible, context-dependent sparsity patterns. This sparsity is accomplished by replacing softmax with alpha-entmax: a differentiable generalization of softmax that allows low-scoring words to receive precisely zero weight. Moreover, we derive a method to automatically learn the alpha parameter—which controls the shape and sparsity of alpha-entmax—allowing attention heads to choose between focused or spread-out behavior. Our adaptively sparse Transformer improves interpretability and head diversity when compared to softmax Transformers on machine translation datasets. Findings of the quantitative and qualitative analysis of our approach include that heads in different layers learn different sparsity preferences and tend to be more diverse in their attention distributions than softmax Transformers. Furthermore, at no cost in accuracy, sparsity in attention heads helps to uncover different head specializations.

Joint work with Gonçalo Correia, Ben Peters, and Vlad Niculae.

Angela Fan is a PhD student at INRIA Nancy and Facebook AI Research Paris, advised by Antoine Bordes, Chloe Braud, and Claire Gardent. I am interested in text generation. Before starting my PhD, she was a research engineer at FAIR for two and a half years and got a Bachelor’s degree in statistics at Harvard.

Angela Fan: Pruning Transformers at Inference Time

Overparametrized transformer networks have obtained state of the art results in various natural language processing tasks, such as machine translation, language modeling, and question answering. These models contain hundreds of millions of parameters, necessitating a large amount of computation and making them prone to overfitting. In this work, we present a pruning mechanism that allows for efficient pruning at inference time to the desired Transformer depth. We show that it is possible to select smaller sub-networks from a large network, without having to finetune them and with limited impact on performance.

Rico Sennrich is an SNSF Professor at the University of Zurich and Lecturer at the University of Edinburgh. His main research interests are multilingual NLP and machine translation, and his recent research has focused on modelling linguistically challenging phenomena in machine translation, including grammaticality, productive morphology, domain effects, discourse, and pragmatic aspects, and analysing NLP models.

Rico Sennrich: What Do Transformers Learn in NLP? Recent Insights from Model Analysis

The Transformer architecture and the BERT training scheme have enjoyed great empirical success in NLP, but their inner workings are still poorly understood. What happens in multi-head self-attention? Why does the training objective in model pre-training matter so much, and why is masked language modelling such a good choice? I will discuss recent research that sheds some light on these questions, analysing the function of attention heads in multi-head self-attention, and comparing the evolution of representations in the Transformer with different objective functions.