RSS Twitter Facebook YouTube
Expand Menu

Discardable data: Spring training as sabermetrics

What does and doesn't matter when it comes to spring training stats?

Discardable data: Spring training as sabermetrics
A wide shot of Goodyear Ballpark (Photo: Jake Dungan)
March 12, 2014
Share via: Share: Facebook Share: Twitter Share: Google Share: Pinterest Share: Print Share: Email

Domonic Brown of the Philadelphia Phillies stood alone at the lofty summit of 2013 spring training statistics. He compiled 32 hits, the most of any hitter, and compiled a .376 batting average over the course of his torrid spring campaign.

The value of spring training, evidently, is not in its surface-level statistics. The amount of hits, slugging percentage, home run totals – each of these are idle amusement for those waiting for opening day, to be relegated to trivia when that day finally arrives. A cursory glance over the leaderboards say nothing about a batter’s plate discipline nor quality of contact – in short, spring training stats are excellent descriptive stats, in that they faithfully describe what occurred, but they are very poor predictive stats. Hits are the effect, not the cause, of good hitting, and as baseball fans who watch spring training games recognize, there exists a gap between cause and effect – a spring can be considered successful without an impressive stat-line.

The focus of spring training, then, is on the component mechanisms of success; a batter’s swing, the strength of a batter’s contact, the rate at which balls are hit well – upon these factors MLB coaches and executives lay their attention. They, like baseball fans at large, realize that there are factors beneath the stats, factors that forecast a player’s success far more effectively than a distilled tally of hits. It is for this reason that baseball fans – saber-oriented or otherwise – generally disregard spring training numbers.

The question that naturally results, then, is the following: is this true only during spring training?

The aforementioned description of hits and batting average – that they are effects and there is a gap between these effects and a solid plate approach – still applies even during the regular season. There are more plate opportunities and the MLB talent pool is more concentrated than in spring training,  which are two factors that narrow the gap between cause and effect. Yet despite this, the gap remains, and hits and batting average remain descriptive statistics.

If these descriptive stats, then, say little about how the hits did or did not occur, why not create statistics that better describe a hitter’s talent? If instead of only counting hits, why not count how many balls in play were struck well? Why not describe how frequently hitters swing at balls out of the zone? Why not ask about the rate at which hitters, when they swing at pitches inside the zone, actually make contact?

In short, why not track predictive statistics?

There is little, in this sense, that separates sabermetrics from the behaviors of anyone trying to diagnose the future effectiveness of players. Scouts care far less about the numbers that a hitter puts up in any given game than they do about the approach taken by that player to get there. Repeatability, velocity, movement, and makeup eclipse things like runs allowed or batting average allowed; cause, rather than effect, is the center of his or her attention.

Yet scouting is manpower-intensive; no team could possibly acquire faithful, in-depth scouting reports for every player in affiliated baseball for every game of the year. Such a task would require an army and would be impossible even if the pool were narrowed to Major League Baseball. One can acquire scouting reports on each player in baseball, but these are samples of how a player approached his task on the particular day that the scouting report was taken – a faithful, uniquely in-depth glimpse of root causes, but a glimpse nonetheless.

For the long-scale, sabermetric statistics are indispensable. Using statistics, one can find out how much movement a ball had – early in 2013, Corey Kluber’s ERA – an effect statistic – was abysmal, but looking deeper into the statistics, one finds that his strikeout rates, walk rates, and the PITCH f/x measurement of his slider’s average horizontal break – cause stats, stats within Kluber’s control – were excellent. According to these more predictive statistics, Kluber was due for a breakout in run prevention. Based on his incredible performance in the months of June and July, these predictions genuinely came to fruition.

To trust sabermetric statistics is to do nothing more than to separate cause from effect. Looking on Jose Iglesias’ 2013 .300 batting average with skepticism, based on a high Batting Average on Balls in Play despite a dearth of hard-hit line drives, is to note that the effect (batting average) doesn’t match up with cause (rate of solid contact). Looking on Jason Kipnis’ 2013 .284 batting average with optimism, knowing his extremely high line drive rate supports his extraordinarily high BABIP, is to note that cause correlated perfectly with effect.

The skepticism with which saber-oriented writers view certain regular season statistics is informed by precisely the same rationale that prompts all baseball fans to view spring training statistics with skepticism. The phrase ‘New School’ as applied to sabermetrics, then, isn’t totally accurate: it’s merely the Old School’s skepticism combined with the baseball-tracking technologies that have arisen in the last decade – the 21st-century Old School.

John can be reached on Twitter at @JHGrimmHe can also be reached by e-mail at

User Comments

BG ohio
March 14, 2014 - 11:31 PM EDT
all these number are Greek to me but baseball is as American as Pi,
March 12, 2014 - 12:21 PM EDT
interesting article, but I had hoped that it would shine some light on struggling starters like Bourn or the competition for the 5th spot in the rotation.

Your Name:
Leave a Comment:
Security Code: