Papers

Discrepancy Ratio: Evaluating Model Performances When Even Experts Disagree on the Truth PDF

Igor Lovchinsky, Alon Daks, et. al. “Discrepancy Ratio: Evaluating Model Performances When Even Experts Disagree on the Truth.” ICLR 2020.

Abstract

In most machine learning tasks unambiguous ground truth labels can easily be acquired. However, this luxury is often not afforded to many high-stakes, real-world scenarios such as medical image interpretation, where even expert human annotators typically exhibit very high levels of disagreement with one another. While prior works have focused on overcoming noisy labels during training, the question of how to evaluate models when annotators disagree about ground truth has remained largely unexplored. To address this, we propose the discrepancy ratio: a novel, task-independent and principled framework for validating machine learning models in the presence of high label noise. Conceptually, our approach evaluates a model by comparing its predictions to those of human annotators, taking into account the degree to which annotators disagree with one another. While our approach is entirely general, we show that in the special case of binary classification, our proposed metric can be evaluated in terms of simple, closed-form expressions that depend only on aggregate statistics of the labels and not on any individual label. Finally, we demonstrate how this framework can be used effectively to validate machine learning models using two real-world tasks from medical imaging. The discrepancy ratio metric reveals what conventional metrics do not: that our models not only vastly exceed the average human performance, but even exceed the performance of the best human experts in our datasets.

Do the Golden State Warriors Have Hot Hands? PDF, Arxiv, Scientific American

Alon Daks, Nishant Desai, and Lisa R. Goldberg “Do the Golden State Warriors Have Hot Hands?” The Mathematical Intelligencer 2018. Republished in Scientific American.

Abstract

Star Golden State Warriors Steph Curry, Klay Thompson, and Kevin Durant are great shooters but they are not streak shooters. Only rarely do they show signs of a hot hand. This conclusion is based on an empirical analysis of field goal and free throw data from the 82 regular season and 17 postseason games played by the Warriors in 2016–2017. Our analysis is inspired by the iconic 1985 hot-hand study by Thomas Gilovitch, Robert Vallone and Amos Tversky, but uses a permutation test to automatically account for Josh Miller and Adam Sanjurjo’s recent small sample correction. In this study we show how long standing problems can be reexamined using nonparametric statistics to avoid faulty hypothesis tests due to misspecified distributions.

Unsupervised Authorial Clustering Based on Syntactic Structure PDF

Alon Daks and Aidan Clark. “Unsupervised Authorial Clustering Based on Syntactic Structure.” ACL — SRW 2016, 114.

Abstract

This paper proposes a new unsupervised technique for clustering a collection of documents written by distinct individuals into authorial components. We highlight the importance of utilizing syntactic structure to cluster documents by author, and demonstrate experimental results that show the method we outline performs on par with state-of-the-art techniques. Additionally, we argue that this feature set outperforms previous methods in cases where authors consciously emulate each other’s style or are otherwise rhetorically similar.