The efficacy of machine learning (ML) models depends on both algorithms and data. Training data defines what we want our models to learn, and testing data provides the means by which their empirical progress is measured. Benchmark datasets define the entire world within which models exist and operate, yet research continues to focus on critiquing and improving the algorithmic aspect of the models rather than critiquing and improving the data with which our models operate. If “data is the new oil,” we are still missing work on the refineries by which the data itself could be optimized for more effective use.
Measurement of AI success today is often metrics-driven, with emphasis on rigorous model measurement and A/B testing. However, measuring the goodness of the fit of the model to the dataset ignores any consideration of how well the dataset fits the real world problem. Goodness-of-fit metrics, such as F1, Accuracy, AUC, do not tell us much about data fidelity (i.e., how well the dataset represents reality) and validity (how well the data explains things related to the phenomena captured by the data). No standardised metrics exist today for characterising the goodness-of-data.
This page gathers resources, workshops, challenges, and communities focused on data for building and testing AI/ML systems.