punter Posted June 18, 2013 Posted June 18, 2013 Hi Below is a sample of a type of problem I'm struggling with. I’d be enormously grateful for a step-by-step worked solution to it in simpleterms. I’m embarrassed to admit that I’m Bayesianly challenged. During a free trial period, an Internet horse race tipsterhas given out 100 tips as to the winner in each of the same number of races.25% of these won, and the associated starting odds were high enough to enable hisfollowers to turn a profit. Based on his success, he now decides to charge forthe service. During his next 60 predictions his success rate drops to 10%winners, and his customers now lose. At this point how do they calculate the aposteriori probability of this decrease in successful predictions? They wouldwant to do this in order have a quantitative basis (presumably a significancelevel) for deciding whether to continue with the service. An aside: I’ve read that the good Reverend developed histheorem to calculate horserace probabilities. Many thanks in advance, Jim
Popcorn Sutton Posted June 19, 2013 Posted June 19, 2013 You count the number of successful attempts and divide it by the total number of attempts for both cases. Subtract the latter from the former and youll have the change in percentage. -2
Bignose Posted June 20, 2013 Posted June 20, 2013 (edited) Let x = the true percentage of picking the correct horse. x = 0 would mean that the race is picked incorrectly every time, x = 1 would mean that the race is picked correctly every time. x will thus be bounded between 0 and 1. What you are interested in is: what is the probability that x takes a certain value given the data and initial guess. Let me denote the set of data as {data} and denote your initial or a priori distribution by I. In equation form, Bayes' Theorem can be written thusly: [math] P(x|\{data\},I) \propto P(\{data\}|x,I) P(x|I) [/math] There is a denominator term P({data}|I), but is does not involve posterior weighting directly, and really just serves to normalize the distribution on the LHS so that integral over all possibilities is exactly 1.0. As for the a priori distribution, P(x|I), you've got some leeway. I would suggest you start with P(x|I)=1, that is all possibilities are equally likely. In short, you don't trust any info. Now, since you are only interested in correct or incorrect, the distribution of P({data}| x,I) can be represented by the binomial distribution. [math] P(\{data\}|x,I) \propto x^R (1-x)^{N-R} [/math] where N is total number of races and R is the number of those that were predicted correctly. So putting this all together... after 100 races, the probability that the 'true' prediction rate is a certain value will be given by [math] P_{100}(x) = C_{100}x^{25}(1-x)^{75} [/math] (where C is again the normalization). If you plot this, it will be very peaked around x = 0.25, because after 100 races and 25 of them predicted correctly, you've got a fair amount of data that 25% is close to his true rate. But as data keeps coming in -- your next 60 races -- the distribution of his true prediction probability is [math] P_{160}(x) = C_{160}x^{31}(1-x)^{129} [/math] (again C is a normalization factor). This distribution will be even more peaked at a value less than 0.25, 0.19375 to be exact. That is, as more and more data comes in, the confidence in the value after 100 races is being eroded. You can use these formulas to also calculate the relative likelihoods of what the 'true' picking percentage is. For example, if you put in x=0.25 versus x= 0.20, the value at x=0.20 is more than 4 times higher than the value at x=0.25. That is, it is 4 times more likely that the picker is truly a 20% correct picker than a 25% picker. And so on. But note that it isn't like it is impossible for a true 25% picker to have hit a streak where he only gets 10% right. If it is truly random, things like that will happen. If you are a fan of baseball, this is like a true 0.300 hitter having an off week and only going 2 for 26. The guy may still be a 0.300 hitter, but there will just be times when they will be worse, and there will be times when they will go like 13 for 26 in a week. It happens. This is about the simplist Bayesian analysis there is, and is essentially the 'is this a fair coin?' question. I don't know much about horse racing, but I know that there are bets that a deeper that simply did this horse win or not. Win, place, show, etc. I don't really know what that means. You count the number of successful attempts and divide it by the total number of attempts for both cases. Subtract the latter from the former and youll have the change in percentage.Popcorn, this really doesn't address the guy's issue at all. This doesn't help answer the question (paraphrasing here), how likely is it that the guy is really a 25% picker but hit a streak of really bad luck, or how likely is it that the guy was really a 10% picker that got super lucky to get to 25%. For example, using your method really skews the analysis. Let me pose a very similar but different question. A casino runs a roulette wheel with only red and black spaces (no 0 or 00 spaces), an equal number of both, and you can bet on red or black. Watching for 24 hours, you record 487 reds and 513 blacks. 48.7% red, or really close to the 50:50 you'd expect if it was fair. What if you watched for an hour, and you saw 56 reds and 44 blacks, or 56.0% red over that hour. Do you really think that the wheel changed by 7.7% just over an hour? Or was it just that luck saw a little more reds than blacks that hour? That's why one has to be very, very careful doing statistical analysis. Just subtracting one frequency from another is rarely a meaningful answer. Edited June 20, 2013 by Bignose 2
Popcorn Sutton Posted June 20, 2013 Posted June 20, 2013 Your right, sorry, I'm used to calculating probabilities for now and whatever is within the closest proximity. I would, however, like to distinguish a clear difference between probability and likelihood, and in this case, it would seem to me that we are calculating likelihood rather than probability. Likelihood requires that someone knows reward and punishment odds on top of probability.
Bignose Posted June 20, 2013 Posted June 20, 2013 Likelihood requires that someone knows reward and punishment odds on top of probability.I don't really know why you'd say something like this. Likelihood and probability are used pretty interchangeably in a lot of the math literature. e.g. the probability distribution I wrote above P({data}|x,I) is often called the 'likelihood function'. No mention of 'reward' or 'punishment' there. See Sivia's Data Analysis A Bayesian Tutorial
Popcorn Sutton Posted June 20, 2013 Posted June 20, 2013 I've made it a point in my professional career in science to make likeability a function distinct from probability. It usually happens after the fact that something occurs, but it does seem to have significance in defining the future behavior of an agent. In AI, it's probably in our best interest to incorporate likeability for the sole purpose of avoiding future dilemmas such as putting a nuclear weapon in the hands of someone irresponsible, and other clearly analogous behavior. -3
Bignose Posted June 20, 2013 Posted June 20, 2013 (edited) I've made it a point in my professional career in science to make likeability a function distinct from probability.likeability... adj. easy to like; pleasing http://dictionary.reference.com/browse/likeability Besides the fact that you used the wrong word, I don't really know how your post there answers my question about why you think likelihood and probability are different, and where reward and punishment come in. You're drifting much more into game theory here than probability. They are related topics, but not really applicable to the question at hand. Also, if you are going to cite your 'professional career in science', please cite some published works to back it up. I did. Edited June 20, 2013 by Bignose
Popcorn Sutton Posted June 20, 2013 Posted June 20, 2013 I guess if you put it that way, I don't have a professional career in science. I wish I did though. He did suggest that the math is being calculated after the fact that the predictions are adjusted.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now