Fecha: 22 enero de 2013

Ponente: Kristian Woodsend, (Institute for Language, Cognition and ComputationUniversity of Edinburgh)

Lugar de celebración: Sala 1.03, ETSI Informática, UNED (mapa)

Resumen:Recent years have witnessed increased interest in data-driven methods for text rewriting, e.g., writing a document in a simpler style, or a sentence in more concise manner. It is frequently the case, when performing inference in these natural language tasks, that the decisions involved are mutually dependent. Local decision makers (such as machine-learning classifiers) have a role to play, but in order to make coherent decisions during inference, it is essential that takes these interdependencies into account. I will be giving a tutorial on how to develop Integer Linear Programming (ILP) models for inference, using models that we developed for text rewriting as examples. In these models, we combined the rules and predictions made through data-driven and machine learning methods, with declarative knowledge expressed as constraints. In the second part, I will go on to describe our application of these techniques on two rather old and well-studied text generation problems: simplification and multi-document summarization. Leveraging large-scale corpora such as Wikipedia, we induced automatically a quasi-synchronous tree-substitution grammar, a formalism that can naturally capture structural mismatches and complex rewrite operations. I will then present ILP models that select the most appropriate content from the space of possible rewrites generated by the grammar. Finally, I will present experimental results to show that this approach is able to produce grammatical and meaningful output. Joint work with Mirella Lapata.