Discrepancy Ratio: Evaluating Model Performances When Even Experts Disagree on the Truth PDF

Igor Lovchinsky, Alon Daks, et. al. “Discrepancy Ratio: Evaluating Model Performances When Even Experts Disagree on the Truth.” ICLR 2020.


In most machine learning tasks unambiguous ground truth labels can easily be acquired. However, this luxury is often not afforded to many high-stakes, real-world scenarios such as medical image interpretation, where even expert human annotators typically exhibit very high levels of disagreement with one another. While prior works have focused on overcoming noisy labels during training, the question of how to evaluate models when annotators disagree about ground truth has remained largely unexplored. To address this, we propose the discrepancy ratio: a novel, task-independent and principled framework for validating machine learning models in the presence of high label noise. Conceptually, our approach evaluates a model by comparing its predictions to those of human annotators, taking into account the degree to which annotators disagree with one another. While our approach is entirely general, we show that in the special case of binary classification, our proposed metric can be evaluated in terms of simple, closed-form expressions that depend only on aggregate statistics of the labels and not on any individual label. Finally, we demonstrate how this framework can be used effectively to validate machine learning models using two real-world tasks from medical imaging. The discrepancy ratio metric reveals what conventional metrics do not: that our models not only vastly exceed the average human performance, but even exceed the performance of the best human experts in our datasets.

Do the Golden State Warriors Have Hot Hands? PDF, Arxiv, Scientific American

Alon Daks, Nishant Desai, and Lisa R. Goldberg “Do the Golden State Warriors Have Hot Hands?” The Mathematical Intelligencer 2018. Republished in Scientific American.


Star Golden State Warriors Steph Curry, Klay Thompson, and Kevin Durant are great shooters but they are not streak shooters. Only rarely do they show signs of a hot hand. This conclusion is based on an empirical analysis of field goal and free throw data from the 82 regular season and 17 postseason games played by the Warriors in 2016–2017. Our analysis is inspired by the iconic 1985 hot-hand study by Thomas Gilovitch, Robert Vallone and Amos Tversky, but uses a permutation test to automatically account for Josh Miller and Adam Sanjurjo’s recent small sample correction. In this study we show how long standing problems can be reexamined using nonparametric statistics to avoid faulty hypothesis tests due to misspecified distributions.

Deep Factor Graphs for Bayesian Prediction of High-Dimensional Games PDF

Alon Daks “Deep Factor Graphs for Bayesian Prediction of High-Dimensional Games.” EECS Department, University of California, Berkeley. 2017.


This paper offers an extension to TrueSkill, a Bayesian method for ranking players and predicting outcomes of multiplayer games, for cases where a game is high-dimensional. TrueSkill was originally developed by Microsoft Research to rank and match XBox Live players, but offers a general method for inferring player skill based almost exclusively on the win-loss outcome of a match. Although such a method works well for relatively simple games like Halo, the framework is limited in its ability to incorporate information-rich features — often called boxscores — commonly used to describe high dimensional games, such as basketball. Our work extends TrueSkill for these types of games by reformulating its underlying graphical model as the internal dynamics of a recurrent neural network cell in addition to using neural networks as expressive function approximators to map between high-dimesional boxscores and a player’s weight when conducting TrueSkill updates. Experimental results on NBA data shows that our method improves upon the original TrueSkill algorithm for predicting the outcome of basketball games.

Unsupervised Authorial Clustering Based on Syntactic Structure PDF

Alon Daks and Aidan Clark. “Unsupervised Authorial Clustering Based on Syntactic Structure.” ACL — SRW 2016, 114.


This paper proposes a new unsupervised technique for clustering a collection of documents written by distinct individuals into authorial components. We highlight the importance of utilizing syntactic structure to cluster documents by author, and demonstrate experimental results that show the method we outline performs on par with state-of-the-art techniques. Additionally, we argue that this feature set outperforms previous methods in cases where authors consciously emulate each other’s style or are otherwise rhetorically similar.