PDA

View Full Version : Statheads getting out of control


Fenway
06-05-2010, 03:02 PM
BOSOX@APPLE.EASE.LSOFT.COM

A warning now, hit "next" if you aren't interested in abstruse
mathematics laden with simplifying assumptions. :) Just thought I'd
lay out a few formulas that might be useful as a "first order"
approximation.

I was looking for a framework from which to estimate the playoff odds
in the simplest possible situation: a race for a single playoff spot
between two teams of comparable quality without head-to-head games to
confuse the situation. Assume that the performance of each team in
each game is independently binomially distributed.

What are the odds of a team that is trailing by X games making up the
difference if there are N games remaining in the season?

The variance of a binomial distribution is calculated by n * p * (1-p)
where n is the number of trials/games and p is the probability of
success/winning. Because the product p * (1-p) varies only slightly
in the range we are discussing (teams between .500 and .600), we can
approximate this as being n * .5 * .5 = n/4. Thus the variance in the
Red Sox final record is approximately 106/4 = 26.5.

When calculating the sum or difference of variables (i.e. comparing
two teams over the remainder of the season) the variances add. Thus
the variance in the number of games the Rays win compared to the
number of games the Red Sox win is roughly 26.5 + 26.75 = 53. The
standard deviation is the square root of the variance, or ~7.3 games
in the standings.

The Rays are presently 4.5 games ahead of the Red Sox. Thus the Red
Sox only catch the Rays if they are 4.5 games better over the
remainder of the season. Now I'm assuming that the Red Sox and Rays
are of equal quality, thus that would be 4.5 games better than the
expected mean. Dividing by the standard deviation, we see this is
+0.6 standard deviations.

Looking that up in a Z-score table, we see that the Red Sox would fall
short 72.6% of the time while catching the Rays 27.4% of the time.

Some thoughts...
* The head-to-head games clearly break the assumption of independence.
A swing in THOSE games in effect counts twice.

* The Red Sox were thought to be the better team in March. If their
expectation were two games better over the remainder of the season,
then they would only need to be 2.5 games above expectation to catch
the Rays. That would improve the Red Sox chances of catching the Rays
to 37%.

* Because this depends on the square root of the variance, the
standard deviation will shrink only slowly in the coming months. With
64 games left to play (last week of July?), the variance for a single
team would be 16 games and for a pair of teams compared would be 32
games. That would shrink the standard deviation from its present 7.3
games to "just" 5.7 games. With one month left of play, the standard
deviation when comparing two teams is still 4 games. This is why
old-time baseball fans will continually remind you, "there's a lot of
baseball left".

In conclusion, I'm liking the Red Sox chances much better these days.
It still won't come as a surprise to me if they fall short, but they
clearly have a realistic shot at the playoffs at this point.

-Ted

getonbckthr
06-05-2010, 03:42 PM
What the **** did I just read and what does it all mean? I have never read something and lacked so much comprehension before. Haha.:D:

Fenway
06-05-2010, 04:24 PM
What the **** did I just read and what does it all mean? I have never read something and lacked so much comprehension before. Haha.:D:

The Bosox list is where some of the founders of Baseball Prospectus first met...


One of the founders of the list now works for the Indians

Keith WoolnerManager, Baseball Research & Analytics

TommyJohn
06-05-2010, 04:36 PM
BOSOX@APPLE.EASE.LSOFT.COM

In conclusion, I'm liking the Red Sox chances much better these days.
It still won't come as a surprise to me if they fall short, but they
clearly have a realistic shot at the playoffs at this point.

-Ted
He went through all that incomprehensible horse**** just to reach this conclusion???????? All you have to do is say "they are 4.5 games back and there are X number of games left, so they have a shot at catching the Rays." All this other stuff is utter, mindless horsecrap.

This is why I don't like stat-geekiness. They suck the joy out of every damn thing. It doesn't require calculus or quantum physics to figure out something as simple as "if you are 4 games back with about 80 to go, you have a shot at winning the division." Oy vey.

doublem23
06-05-2010, 04:39 PM
This is why I don't like stat-geekiness. They suck the joy out of every damn thing. It doesn't require calculus or quantum physics to figure out something as simple as "if you are 4 games back with about 80 to go, you have a shot at winning the division." Oy vey.

:scratch:

There's no quantum calculus or geometric physics or anything in there. This is BASIC statistics. Anybody with a high school diploma should be able to read and understand what this guy is talking about.

voodoochile
06-05-2010, 05:27 PM
:scratch:

There's no quantum calculus or geometric physics or anything in there. This is BASIC statistics. Anybody with a high school diploma should be able to read and understand what this guy is talking about.

Yeah, but HS stats aren't nearly good enough to predict baseball seasons. Not even close, not by a long shot.

TommyJohn
06-05-2010, 06:30 PM
:scratch:

There's no quantum calculus or geometric physics or anything in there. This is BASIC statistics. Anybody with a high school diploma should be able to read and understand what this guy is talking about.OK, BASIC statistics. But even those were not required for the guy to reach his BASIC point, which was "the Red Sox have a chance of catching the Rays." No sit.

And I was just exaggerating when I said calculus and quantum physics.

voodoochile
06-05-2010, 06:52 PM
OK, BASIC statistics. But even those were not required for the guy to reach his BASIC point, which was "the Red Sox have a chance of catching the Rays." No sit.

And I was just exaggerating when I said calculus and quantum physics.

You'd actually probably have to get into some aspects of chaos theory to accurately model baseball, IMO. Too many discrete acts control the outcome of the game. Unlike in other team sports, where the teamwork is easy to see and understand how it affects the outcome, baseball relies on a series of individual actions that are completely discrete. Yes, there are moments when teamwork comes into play (relay throws, backing up plays defensively, etc.) but the vast majority of the game is controlled by individual players making individual plays. The ball is put into play on average over 250 times each and every game. That's before you factor in changes in weather, individual playing surfaces, field dimensions and whether the cleanup hitter drank a fifth of Jack and stayed up until 9 AM with the woman he met at the bar.

There are simply too many variables and people who try to break it down by batting averages pretty soon find their predictions have such a wide distribution as to be almost meaningless once you get to any degree of certainty. This guy isn't even trying to use RS/RA which at least tries to make an effort to incorporate actual outcomes to make predictions. He's doing entry level stats and simplifying the situation so far it's not even worth discussing.

I'd bet you'd get better results simply looking at all the teams who have made up a 4.5 game deficit with ~ 4 months to play and calculating the odds.

He's putting that chance at 27+% which is ridiculously low, IMO and he's not even factoring in the other teams Boston has to leapfrog too.

It's ridiculously simplistic and utterly meaningless...

Fenway
06-05-2010, 07:16 PM
He responds to you

Kevin, I don't know what your White Sox friend thinks I was trying to
do, but given what I *was* trying to do, his comments are ridiculously
nonsensical. If I'm trying to understand what portion of the calculated
odds stem from the position in the standings and the point in the
season, then why the heck would I be interested in RS/RA?!?

So please give him my disdain in return.

-Ted



You'd actually probably have to get into some aspects of chaos theory to accurately model baseball, IMO. Too many discrete acts control the outcome of the game. Unlike in other team sports, where the teamwork is easy to see and understand how it affects the outcome, baseball relies on a series of individual actions that are completely discrete. Yes, there are moments when teamwork comes into play (relay throws, backing up plays defensively, etc.) but the vast majority of the game is controlled by individual players making individual plays. The ball is put into play on average over 250 times each and every game. That's before you factor in changes in weather, individual playing surfaces, field dimensions and whether the cleanup hitter drank a fifth of Jack and stayed up until 9 AM with the woman he met at the bar.

There are simply too many variables and people who try to break it down by batting averages pretty soon find their predictions have such a wide distribution as to be almost meaningless once you get to any degree of certainty. This guy isn't even trying to use RS/RA which at least tries to make an effort to incorporate actual outcomes to make predictions. He's doing entry level stats and simplifying the situation so far it's not even worth discussing.

I'd bet you'd get better results simply looking at all the teams who have made up a 4.5 game deficit with ~ 4 months to play and calculating the odds.

He's putting that chance at 27+% which is ridiculously low, IMO and he's not even factoring in the other teams Boston has to leapfrog too.

It's ridiculously simplistic and utterly meaningless...

Fenway
06-05-2010, 07:26 PM
you struck a nerve LOL

To comment specifically...

On 6/5/2010 6:59 PM, Fenway wrote:
> You'd actually probably have to get into some aspects of chaos theory
> to accurately model baseball, IMO.

So if we can't accurately model it, we shouldn't even try? There are
some pretty decent models out there (the Coolstandings model isn't
bad). Just because they aren't perfect doesn't mean they are worthless.

> There are simply too many variables and people who try to break it
> down by batting averages pretty soon find their predictions have such
> a wide distribution as to be almost meaningless once you get to any
> degree of certainty.

Batting averages? Say, what?!?

> This guy isn't even trying to use RS/RA which at least tries to make
> an effort to incorporate actual outcomes to make predictions. He's
> doing entry level stats and simplifying the situation so far it's not
> even worth discussing.

No, I wasn't interested in coming up with a general model. I was
attempting to answer a much more specific question -- if two teams are
evenly matched but one starts "hot" and the other starts "cold", how
much does that shift the odds?

And I firmly maintain that the standard deviation in outcome for the
season should be at LEAST the standard deviation of the "overly
simplistic" binomial model, for the simple reason that independence is
violated. Thus the overly simplistic model can serve as a lower bound.

> I'd bet you'd get better results simply looking at all the teams who
> have made up a 4.5 game deficit with ~ 4 months to play and
> calculating the odds.

Coolstandings incorporates a bit of this into their model, which is why
they project greater uncertainty than the "purely statistical" models
that Baseball Prospectus uses.


> He's putting that chance at 27+% which is ridiculously low, IMO and
> he's not even factoring in the other teams Boston has to leapfrog too.

Uh, no. For what I was calculating, I'm pretty certain that number was
accurate. It wasn't meant as the probability that the Red Sox make the
playoffs. If your friend can find a flaw in my mathematics, he is
welcome to point it out. If not, then he's pissing in the wind.


> It's ridiculously simplistic and utterly meaningless...

I.e. your friend can't find the answer to HIS question in it, so didn't
bother to consider whether or not it has some other meaning.

-Ted




You'd actually probably have to get into some aspects of chaos theory to accurately model baseball, IMO. Too many discrete acts control the outcome of the game. Unlike in other team sports, where the teamwork is easy to see and understand how it affects the outcome, baseball relies on a series of individual actions that are completely discrete. Yes, there are moments when teamwork comes into play (relay throws, backing up plays defensively, etc.) but the vast majority of the game is controlled by individual players making individual plays. The ball is put into play on average over 250 times each and every game. That's before you factor in changes in weather, individual playing surfaces, field dimensions and whether the cleanup hitter drank a fifth of Jack and stayed up until 9 AM with the woman he met at the bar.

There are simply too many variables and people who try to break it down by batting averages pretty soon find their predictions have such a wide distribution as to be almost meaningless once you get to any degree of certainty. This guy isn't even trying to use RS/RA which at least tries to make an effort to incorporate actual outcomes to make predictions. He's doing entry level stats and simplifying the situation so far it's not even worth discussing.

I'd bet you'd get better results simply looking at all the teams who have made up a 4.5 game deficit with ~ 4 months to play and calculating the odds.

He's putting that chance at 27+% which is ridiculously low, IMO and he's not even factoring in the other teams Boston has to leapfrog too.

It's ridiculously simplistic and utterly meaningless...

Daver
06-05-2010, 07:35 PM
Tell Ted to enjoy his sessions of mental masturbation, those of us that are firmly rooted in reality will simply watch the games, because they aren't played on paper.

voodoochile
06-05-2010, 07:43 PM
Ted's a hoot...

His math is just fine, his assumptions are so deeply flawed it's not worth discussing. GIGO...

doublem23
06-05-2010, 11:04 PM
Tell Ted to enjoy his sessions of mental masturbation, those of us that are firmly rooted in reality will simply watch the games, because they aren't played on paper.

I don't understand why those can't be enjoyed together. I always thought baseball was "the thinking man's game." Now apparently, you can't enjoy a game without being a dumb ass troglodyte who cannot comprehend math that just about every person should understand.

I'll admit the OP's friend is probably some douchey nerd who just likes hearing himself talk. His rambling post could have been chopped down to roughly 3 sentences. But trust me, guys, it is possible to enjoy both the actual art of baseball as its played, and the science behind the numbers play out behind the game. You really don't have to be all one or the other.

voodoochile
06-05-2010, 11:13 PM
I don't understand why those can't be enjoyed together. I always thought baseball was "the thinking man's game." Now apparently, you can't enjoy a game without being a dumb ass troglodyte who cannot comprehend math that just about every person should understand.

I'll admit the OP's friend is probably some douchey nerd who just likes hearing himself talk. His rambling post could have been chopped down to roughly 3 sentences. But trust me, guys, it is possible to enjoy both the actual art of baseball as its played, and the science behind the numbers play out behind the game. You really don't have to be all one or the other.

That's fine, but this isn't any kind of serious numerical analysis. It adds absolutely nothing to the discussion. The guy even admits his entire premise is flawed because he doesn't account for H2H play (and can't be bothered to figure out how it influences his numbers. He doesn't even use the Rays and Red Sox current winning percentages, just uses .500 calls the whole thing an approximation then gets pissy when people call him on what is some of the lamest stats work I've ever seen all to get a probability curve that has a 21+ game range to get anything close to 99% confidence interval.

I repeat... GIGO...

Gavin
06-05-2010, 11:38 PM
If only I knew that AP Stats was mental masturbation.

I could have taken shop instead. :(

voodoochile
06-05-2010, 11:43 PM
If only I knew that AP Stats was mental masturbation.

I could have taken shop instead. :(

It isn't... as a stepping stone, but it's not good enough to do what this guys trying to do with it...

Craig Grebeck
06-05-2010, 11:54 PM
It isn't... as a stepping stone, but it's not good enough to do what this guys trying to do with it...
Bull****. You'll deride anything like this because it doesn't jive with your own aesthetic. Go ahead and deride sabermetrics because you think it's bull****, but it's not going away.

voodoochile
06-06-2010, 12:01 AM
Bull****. You'll deride anything like this because it doesn't jive with your own aesthetic. Go ahead and deride sabermetrics because you think it's bull****, but it's not going away.

What the **** are you talking about. This crap is about as far away from sabermetrics as you can get and still call it stats. This is entry level stuff that adds very little to the conversation if anything at all. It's got flawed premises and rounds everything to super simple numbers and admits as much. If someone from one of the SM sites passed this crap off as valid, I'd hammer them too.

I think there are some interesting new stats coming out - for example the BABIP stuff looks promising, IMO, but that doesn't mean that every single statistical analysis is good valid and true.

I'd think you'd know by now that I use stats too, just don't think they are as useful as others do, but go ahead, fly off the handle, throw a bunch of swear words around, don't get the point I'm trying to make and then accuse me of having an agenda... well done...

Daver
06-06-2010, 05:29 PM
Bull****. You'll deride anything like this because it doesn't jive with your own aesthetic. Go ahead and deride sabermetrics because you think it's bull****, but it's not going away.

I don't think it's crap, I know it's crap, but you're right, it's not going away, there will always be people that think they can break down anything, including human error, into pure numbers. The whole concept is laughable, but propellerheads seem to enjoy it.

october23sp
06-06-2010, 05:32 PM
I don't think it's crap, I know it's crap, but you're right, it's not going away, there will always be people that think they can break down anything, including human error, into pure numbers. The whole concept is laughable, but propellerheads seem to enjoy it.

This. Just go out and play.

Gavin
06-06-2010, 05:34 PM
I don't think it's crap, I know it's crap, but you're right, it's not going away, there will always be people that think they can break down anything, including human error, into pure numbers. The whole concept is laughable, but propellerheads seem to enjoy it.

What is a pure number?

Seriously, do you just hate the subject of math or something? Are you threatened (http://blogs.sfweekly.com/thesnitch/donald-gibb-ogre-revenge-of-the-nerds.jpg) by the fact that people with calculators and spreadsheets are encroaching your sport? Does anything qualify your opinion aside from the fact that you are an internet messageboard moderator?

Daver
06-06-2010, 05:43 PM
Does anything qualify your opinion aside from the fact that you are an internet messageboard moderator?

Nope.

I know nothing about baseball.

jabrch
06-06-2010, 05:51 PM
Holy simplifications...

The variance in the range of .500 to .600 is HUGE. Also the Z-Score table is nearly irrelevant because of all the human factors not incorporated into it.

This is nothing more than forcing numbers into formulas and then poorly interpreting the outcomes.

I love the use of statistics in baseball. This is not use - it is misuse and abuse. It is not truly predictive of the odds of Boston coming back.

Gavin
06-06-2010, 06:19 PM
Holy simplifications...

The variance in the range of .500 to .600 is HUGE. Also the Z-Score table is nearly irrelevant because of all the human factors not incorporated into it.

This is nothing more than forcing numbers into formulas and then poorly interpreting the outcomes.

I love the use of statistics in baseball. This is not use - it is misuse and abuse. It is not truly predictive of the odds of Boston coming back.

It's one thing to denounce a poor application of statistics/math in baseball and its quite another to denounce the whole idea of statistics/math in baseball.

Right size, wrong shape, all around.

TheOldRoman
06-06-2010, 06:52 PM
I don't understand why those can't be enjoyed together. I always thought baseball was "the thinking man's game." Now apparently, you can't enjoy a game without being a dumb ass troglodyte who cannot comprehend math that just about every person should understand:clap: Excellent. People disagreeing with you are only in disagreement because they aren't intelligent enough to understand your opinion. Bravo.

Gavin
06-06-2010, 07:14 PM
:clap: Excellent. People disagreeing with you are only in disagreement because they aren't intelligent enough to understand your opinion. Bravo.

Once again.. mockery: the best way to assert and defend yourself in the absence of a logical explanation of yourself.

Daver
06-06-2010, 07:23 PM
Once again.. mockery: the best way to assert and defend yourself in the absence of a logical explanation of yourself.

Who is mocking whom?

How is this for logic, you can't quantify human error, and everything in baseball is subject to human interpretation, even things as simple as balls and strikes.

Gavin
06-06-2010, 07:31 PM
Who is mocking whom?

How is this for logic, you can't quantify human error, and everything in baseball is subject to human interpretation, even things as simple as balls and strikes.

You can quantify anything.. interpretation is more subjective.

If you don't want to accept someone's interpretation, you would make a better point to say why they are wrong instead of simply saying they are wrong for even attempting.

Daver
06-06-2010, 07:54 PM
You can quantify anything.. interpretation is more subjective.

If you don't want to accept someone's interpretation, you would make a better point to say why they are wrong instead of simply saying they are wrong for even attempting.

What formula does one use to quantify an error?

jabrch
06-06-2010, 07:56 PM
It's one thing to denounce a poor application of statistics/math in baseball and its quite another to denounce the whole idea of statistics/math in baseball.

Right size, wrong shape, all around.

Are you ****ing serious? Read my post and show me where I "denounced the whole idea of statistics/math in baseball"...

You can't be ****ing serious...

I'm all for good use of statistics...to tell me what happened. I am a firm believer, and this is from someone with sufficient graduate level statistics to have an informed opinion, that there is very little predictive value to statistics as is shown by the crap that was posted in this thread. There is some...but very little. A good qualitative analyst is likely just as accurate as a good quantitative analyst when it comes to baseball. And unfortunately, there is no technical vehicle available to combine to two disciplines.

Again - show me where I denounced the whole idea of statistics/math in baseball...Great historical value comes from the use of statistics. Much less predictive value comes from them. Care to have a discussion about that without mischaracterizing my position? Cool. Want to go back to being snide and nasty and misrepresenting my position? That's fine. Then I got a deal for you....ignore me - I'll reciprocate.

Oblong
06-06-2010, 09:16 PM
This is a perfect example of taking something simple and making it complicated for no reason other than to show off.

Craig Grebeck
06-06-2010, 09:47 PM
This is a perfect example of taking something simple and making it complicated for no reason other than to show off.
Is it really that complicated?

Johnny Mostil
06-06-2010, 10:16 PM
Is it really that complicated?

No, but I go back to doublem's point: the original point Fenway quoted could have been made in perhaps three sentences. (And, for the record, I'm sympathetic to sabermetrics.)

jabrch
06-06-2010, 11:27 PM
This is a perfect example of taking something simple and making it complicated for no reason other than to show off.

I don't mind complicated - if you have modeled reality. Just using modeling tools that don't represent reality is ignorant.

voodoochile
06-06-2010, 11:39 PM
i don't mind complicated - if you have modeled reality. Just using modeling tools that don't represent reality is ignorant.

gigo...

jabrch
06-07-2010, 12:14 AM
gigo...


That's one issue...but even if they have good data going in, if they use a crappy model, all they will model is crap.

voodoochile
06-07-2010, 12:19 AM
That's one issue...but even if they have good data going in, if they use a crappy model, all they will model is crap.

Crappy model, crappy data... it's all G of one form or another...

kittle42
06-07-2010, 01:15 AM
Without making one comment on whether or not I care about statistical analysis in baseball...

Anyone who thinks this is complicated or rocket science is just silly.

TommyJohn
06-07-2010, 09:06 AM
Without making one comment on whether or not I care about statistical analysis in baseball...

Anyone who thinks this is complicated or rocket science is just silly.You and other professors like Grebeck can sneer all you want to-my point reains the same-he took a simple premise of "the Red Sox are 4.5 games back with 90-odd games left, so they have a chance to catch the Rays" and took 5-6 paragraphs to say it by tossing in a bunch of needless statistical mumbo jumbo. That, to me, is what is plain silly.

Oblong
06-07-2010, 09:06 AM
No, but I go back to doublem's point: the original point Fenway quoted could have been made in perhaps three sentences. (And, for the record, I'm sympathetic to sabermetrics.)

Yes, that's what I meant. I just have an aversion to unnecessary fluff. I believe in brevity. I understand some people can't help it but others do it just to make what they say sound more impressive. I'm also sympathetic to it and stuff like this doesn't help win any converts.

kittle42
06-07-2010, 12:30 PM
Yes, that's what I meant. I just have an aversion to unnecessary fluff. I believe in brevity. I understand some people can't help it but others do it just to make what they say sound more impressive. I'm also sympathetic to it and stuff like this doesn't help win any converts.

Sounds like the perfect argument for media sound bites. Brilliant!

voodoochile
06-07-2010, 12:58 PM
You and other professors like Grebeck can sneer all you want to-my point reains the same-he took a simple premise of "the Red Sox are 4.5 games back with 90-odd games left, so they have a chance to catch the Rays" and took 5-6 paragraphs to say it by tossing in a bunch of needless statistical mumbo jumbo. That, to me, is what is plain silly.


The fact he tried to quantify that "chance" using entry level statistical analysis makes it all the more worthless. Ignoring things like actual current winning percentage, head to head matchups, recent trends, winning percentage of opponents remaining and the fact there are another 2 teams between the Rays and Bosox doesn't help either.

Then he got huffy when his "deep thoughts" got hammered as the garbage they are. If you're going to get emotional when people start ripping on your scientific method, methinks you don't understand what the scientific method is all about and you shouldn't be using anything close to it.

This is just bad stats, period.