Sample Size Sensitivity in Descriptive Baseball Statistics
John Kulas, Marlee Wanamaker, Diuky Padron-Marrero, Hui Xu

This paper presents one element of a larger project that probes for systematic
and predictable patterns of variability/volatility in baseball's descriptive
statistics. The larger project standardizes many baseball indices along an
event metric and provides relative estimates of each index’s point of inflection
toward an empirical asymptote. Specifically these estimates reflect deviations
in sensitivity to “sample size” (e.g., which descriptive statistics are more or
less robust across events). The end purpose of this broader investigation is a
qualifier to be associated with such statistics: sample size sensitivity (Triple
S). Not because it's needed, but because, colloquially, discussions of baseball
statistics are commonly qualified by the cautionary statement, "well, it's a
small sample size". The current presentation highlights the process and results
of estimating the logarithmic event function of one statistic, batting average,
and we will provide real-time projections of accuracy (our estimated function
versus in-coming baseball data that occurs during the CARMA conference).
Results have implications for the integration of BigData applications into
digestable summary statistics that appeal to a broad-reaching audience with
practical implications and meaning.

