Replicability Analysis for Natural Language Processing - NLP Highlights Podcast


A talk about the TACL 2017 paper by Rotem Dror, Gili Baumer, Marina Bogomolov, and Roi Reichart.

Roi comes on to talk to us about how to make better statistical comparisons between two methods when there are multiple datasets in the comparison. This paper shows that there are more powerful methods available than the occasionally-used Bonferroni correction, and using the better methods can let you make stronger, statistically-valid conclusions. We talk a bit also about how the assumptions you make about your data can affect the statistical tests that you perform, and briefly mention other issues in replicability / reproducibility, like training variance.