Fecha: Lunes 22 de septiembre de 2014 a las 11.30.
Ponente: Dirk Hovy. Center for Language Technology. University of Copenhage.
Lugar de celebración: Sala 1.03, ETSI Informática, UNED (mapa)
In NLP, we rely on annotated data to train models. This implicitly assumes that the annotations represent the truth. However, this basic assumption can be violated in two ways: either because the annotators exhibit a certain bias (label bias), or because there simply is not one single truth (bias in ground truth). In this talk, I will present approaches to deal with both problems.
In the case of label bias, we can collect multiple annotations and aggregate them to infer both the underlying truth and the reliability of the individual annotators. We present a software package, MACE (Multi-Annotator Competence Estimation) which considerably improves over majority voting baselines both in terms of predicted label accuracy and competence estimates. Additionally, it allows us to trade precision for recall, achieving even higher performance, and to incorporate control items.
In the second case, where no one single truth exists, we can learn which categories are easily confused and incorporate this knowledge into the training process of NLP models. We use small samples of doubly-annotated POS data for Twitter to estimate annotation reliability and implement inter-annotator reliability in the loss functions of a structured perceptron. We find that this cost-sensitive model performs better across annotation projects as well as on data annotated according to the same guidelines. Finally, we show that cost-sensitive models perform better on downstream tasks.