Thursday, July 21, 2011

Michael Humphreys' Wizardry: Imperfect But Critical Analytics

Errors using inadequate data
are much less than those using no data at all.

-- Charles "The Dudmaston Devastator" Babbage

Social systems are determined
by technological systems

-- Leslie A. White

In business as in Baseball, technology triggers innovation which affects comparative
advantage. The spread of relatively
inexpensive business analytics tools in the late 1990s
proliferated an
immense cadre of people with various combined levels of skill, of insight, and
of training to attack the kinds of large-dataset problems that have much to
yield to such technology. Business, as always, lagged behind Baseball
(Government did too, but, as usual, not quite as much), because Contemporary
Corporate governance is almost always following financial aspirations, while
Baseball's aspiration portfolio is broader.

Because Baseball is intrinsically far more accountable than the corporate
workplace, it has embraced more advanced accountability engines, such as the
only-recently capable of deployment video capture technologies that can identify
events such as true balls and strikes, the exact trajectory, speed and rotation
of a pitched or batted ball, the distance and vector paths a fielder takes to
get to a batted ball.

The majority of the data loaded and analysed from these systems has decoded
pitching for umpires, coaches and pitchers themselves. But the most important
data loaded and analysed from these systems has been that aimed at decoding
That's because judging fielding has been Baseball's most gaping
knowledge lag. There are good data to have a glimpse at the value of batters
and, to a lesser degree, pitchers, but all pre-contemporary high-tech analysis
to judge fielding has been interesting but under-infused with hard

So the arrival of that technology has been good. But there's a sad background
to that, an Internet-inspired trend in the background that dulls the innovative
advantage of this magnificent innovation. It needs to be proprietary, because
the Internet has enabled people who don't respect intellectual property (the
idea that inventors deserve compensation for their inventions, creators deserve
compensation for their creations) to expropriate public instances of others'
private property for their own private profit.

"Content wants to be free" is the mantra of the non-creative
free-market types who want to reap the creations of creative people at their own
whim for their own profit. It's a perfectly parasitical paradigm, perniciously
peddled by pseudo-intellectual free-market rent-boys like Laurence Lessig. In a
society that values money > creativity, creativity will gravitate towards
serving the purpose of money so people with money but no creativity will buy
creation while people with creativity will tend to constrain their focus
to serve uncreative people with money.

This twin-killing has made it very difficult to achieve much with the
Business Analytics tools our technological innovators have made possible, in
part because the ubiquity of the Internet intellectual-property-theft tools our
technological innovators have made possible. Beyond Baseball, probably in your
own organization, innovation and mission advancement is stunted by the same

If the companies that invested in the high-tech creations that have brought
so much actionable information to baseball were not very protective of their
data, they would be rich in insight and poor in money, with no chance of earning
back their investment. So all this wonderful "batted ball" data that
decodes fielding skill and enables baseball teams to make better, more informed
decisions is kept proprietary, not shared with the vast cadre of analysts I
described earlier. And so fewer informed ideas get tested, vetted, argued for
and against -- that is, refined with the scientific method.


Into this scientific gap plunged Michael Humphreys, with an attempt to see what
could be synthesised using only publicly-available data, and using what
fragments of the proprietary data had been publicly-shared to "test"

The result was a few years of peer-review and dialogue that culminated in a
book, Wizardry:
Baseball's All-Time Great Fielders Revealed
, (
Oxford University Press,
2011, New York). In it, Humphreys devises a system that approximates the
knowledge that could be uncovered using proprietary systems.

It's a most noble effort, one I found flawed in some ways, but one that I believe
achieves its very useful mission: putting ideas that benefit from the scientific
and analytic method into a public dialogue. The book, therefore is not an end in
itself, but a means towards that end, and end that's very difficult to achieve
in our finance-led society which gravitates in the other direction.

Wizardry has two parts. Part I is a detailed, open-book description of
Humphreys' analytic methods (which I like much for its insight and openness).
Part II is a position-by-position and "era"-by-"era"
application of the methods to name names and built stacked lists of bests and
worsts (which I didn't like much).

On the eccentric side, he proposes pitchers' fielding be credited with
infield pop-outs and shallow outfield flies with long hang time, rather than any
individual fielders' numbers, as he views them as automatic outs and not really
something with which you should credit an individual fielder or the rest of the
team. Unless I mis-read his intent, I suspect this should be credited to the
pitcher's pitching instead (like a strikeout is credited to his pitching and not
to the catcher's or the pitcher's fielding). But it's an interesting and
thought-provoking assertion.

And in tribute to the now-widely accepted but laughably wrong "Wisdom
of Crowds
" cult, he proposes at one point that the only way to posit
one aspect of outfield defense is to take two existing obviously-flawed systems
and make a simple average between the two. Yikes...that's like suggesting
that averaging the coordinates of two pitches called balls are the best way to
determine what a strike is.

Disputes like this aside, he's made his analysis something others can build
on by making it an open systems effort and bases it on publicly-available data.
It doesn't need to be perfect; it needs to be sufficient and something good
enough that others can build on it, and Humphreys' work is both.


I want to encourage you, if you are an analyst or have any management affect on
analysis departments to grab a copy of Wizardry and read it for ideas for
your own efforts.

First, absorb how he spread his ideas around to different people with very
different points of view and used their critiques to synthesise refinements to
his own system. In your own shop, that could mean circulating the answers and
questions they engender to other departments with very different kinds of
insights or could mean combining with other organizations that are not direct
competitors to synthesise your mutual wisdom. It's not fully open source (though
going fully open source is a strategy that was at least as effective as secured
analysis for the Oakland A's Moneyball strategy), but pushes the energy
in that direction and the comparative gains that has to offer.

Second, see how you can use available non-proprietary data to blend in with
your own, the way Humphreys has. Analysts, I've found, too often restrict their
span to their own perimeters of collected data.

Third, try posing
naïve questions, Paul DePodesta style
. Play around with your questions in
eccentric (not out of this universe) ways, parallel to Humphreys' crediting
pitchers' fielding with pop-up outs.


In Baseball, at least, there are solutions to finding a workable balance. Don
Malcolm has some ideas, hinted
at here
& to be described in full at some later date, I hope.

Beyond Baseball, as long as as a society we find money more worthy than
anything else, the Internet and skilled lobbyists for Red China's industrial
plutocrats ("we mustn't offend them, and free
markets demand we respect their needs, and it makes everything so cheap to buy,
so it feels good"
) make it seem inevitable that
innovation and creativity must serve as the midwife to the uncreativity of
finance. I'm pretty sure it's not inevitable, but it does require people working
and thinking and acting like Humphreys, as well as being willing to pay
innovators and creators instead of cheap idea cloners and purveyors of cheap
toxic crap.

It just takes will and enforcing accountability, something Baseball does
every minute.

