Evaluating Evaluation of AI Systems (Meta-Eval 2020)

Workshop at The thirty fourth AAAI Conference on Artificial Intelligence

Submission Deadline: November 27, 2019

Notification of acceptance: December 13, 2019

Final camera-ready papers due: December 18, 2019

Call for papers

Submission Link

The last decade has seen massive progress in AI research powered by crowdsourced datasets and benchmarks such as Imagenet, Freebase, SQuAD; as well as widespread adoption and increasing use of AI in deployed systems. A crucial ingredient is the role of crowdsourcing in operationalizing empirical ways for evaluating, comparing, and assessing the progress.

The focus of this workshop is not on evaluating AI systems, but on evaluating the quality of evaluations of AI systems. When these evaluations rely on crowdsourced datasets or methodologies, we are interested in the meta-questions around characterization of those methodologies. Some of the expected activities in the workshop include:

Using crowdsourced datasets for evaluating AI systems’ success at tasks such as image labeling and question answering have proven powerful enablers for research. However, adoption of such datasets is typically driven by the mere existence and size of a dataset without proper scrutiny of its scope, quality, and limitations. While crowdsourcing has enabled a burst of published work on specific problems, determining if that work has resulted in real progress cannot continue without a deeper understanding of how the dataset supports the scientific or performance claims of the AI systems it is evaluating. This workshop will provide a forum for growing our collective understanding of what makes a dataset good, the value of improved datasets and collection methods, and how to inform the decisions of when to invest in more robust data acquisition.

Topics of Interest

We invite scientific contributions and positions papers on the following topics:

Program Committee

Organizing Committee

Email: rigorous-evaluation AT googlegroups.com