Ken Fabian Posted April 7, 2017 Share Posted April 7, 2017 If there is statistical evidence that the draw is not getting random results that may be circumstantial evidence of fraud or faulty methodology - which may be legally actionable against the operators of the lottery. Unless some kind of insider knowledge is involved those who won under those arrangements should be immune from legal action.The statistical evidence will not be proof by itself, rather, it would be cause to investigate and small numbers of small draws may not be a large enough number to apply statistics to unless the fraud or fault is major. The only proven method of winning with genuine "fair" lotteries that I am aware of is with accumulating jackpot type lotteries that end up with prize money exceeding the costs of all the tickets, by attempting to buy them all (or as close to all as possible). I'm not sure but I think rules often work to prevent that happening, by preventing mass purchases at the scales that needs and by allocations to over the counter type sales that would can't easily be bought in bulk. I have heard of syndicates set up to take advantage of large jackpots in jurisdictions with rules that enable it. Link to comment Share on other sites More sharing options...
Delta1212 Posted April 7, 2017 Share Posted April 7, 2017 If there is statistical evidence that the draw is not getting random results that may be circumstantial evidence of fraud or faulty methodology - which may be legally actionable against the operators of the lottery. Unless some kind of insider knowledge is involved those who won under those arrangements should be immune from legal action.The statistical evidence will not be proof by itself, rather, it would be cause to investigate and small numbers of small draws may not be a large enough number to apply statistics to unless the fraud or fault is major. The only proven method of winning with genuine "fair" lotteries that I am aware of is with accumulating jackpot type lotteries that end up with prize money exceeding the costs of all the tickets, by attempting to buy them all (or as close to all as possible). I'm not sure but I think rules often work to prevent that happening, by preventing mass purchases at the scales that needs and by allocations to over the counter type sales that would can't easily be bought in bulk. I have heard of syndicates set up to take advantage of large jackpots in jurisdictions with rules that enable it. The problem, of course, is that often with jackpots that large, you get multiple winners and wind up having to split the pot, and that can very easily take it from profit to major loss unless the jackpot is worth the ticket value several times over. Link to comment Share on other sites More sharing options...
Ken Fabian Posted April 7, 2017 Share Posted April 7, 2017 The problem, of course, is that often with jackpots that large, you get multiple winners and wind up having to split the pot, and that can very easily take it from profit to major loss unless the jackpot is worth the ticket value several times over. I think that depends on the specific lottery - the NSW government authorised Jackpot lotteries popular here are won by single tickets, a two draw system where the jackpot ticket number is drawn from all tickets (200,000 at $5.50 per ticket or 270,000 at $2.20, for the two common offerings if I recall correctly), but only wins if that ticket was also a winner of an ordinary prize (1 in 17 or 1 in 24 wins a prize) If you buy in as part of a syndicate, then yes, you share with other syndicate members and it won't be cheap to buy every ticket. I don't think it would work under the gaming regulations here, but in theory that would be $AU1.1M at $5.50 a ticket or $AU594K for the $2.20 version. But the jackpot prize has reached $16M for the latter, much more than the cost of all tickets. Link to comment Share on other sites More sharing options...
Prometheus Posted April 7, 2017 Share Posted April 7, 2017 I ran a quick analysis on some data from the lottery in question. Data obtained from https://www.palottery.state.pa.us/Draw-Games/PICK-2.aspx Dates searched: 01/01/15 - 05/04/17 both draws a day, representing all historical draws to the date of analysis If the game is fair every number has an equal chance of being drawn each draw. Therefore we expect the distribution to be uniform over the range of numbers. Here's a quick visual check of the draws. In red is the expected number of draws if each number is selected with equal probability. There is quite a bit of variance between the observed and expected, but we would be surprised if the observed frequency of draws matched this very closely. To quantify this a little i performed a chi-square goodness-to-fit test under the null hypothesis that each number is selected with equal probability. This gives p=0.003. This is the probability of seeing the results we do assuming the null hypothesis is true. This probability is quite low, so we have evidence that the numbers are not being selected with equal probability. Worth checking properly: Raider maybe onto something. I did this in R. Can provide a text file of the data if anyone wants it. 1 Link to comment Share on other sites More sharing options...
Raider5678 Posted April 7, 2017 Author Share Posted April 7, 2017 I ran a quick analysis on some data from the lottery in question. Data obtained from https://www.palottery.state.pa.us/Draw-Games/PICK-2.aspx Dates searched: 01/01/15 - 05/04/17 both draws a day, representing all historical draws to the date of analysis If the game is fair every number has an equal chance of being drawn each draw. Therefore we expect the distribution to be uniform over the range of numbers. Here's a quick visual check of the draws. In red is the expected number of draws if each number is selected with equal probability. HistofDraws.jpeg There is quite a bit of variance between the observed and expected, but we would be surprised if the observed frequency of draws matched this very closely. To quantify this a little i performed a chi-square goodness-to-fit test under the null hypothesis that each number is selected with equal probability. This gives p=0.003. This is the probability of seeing the results we do assuming the null hypothesis is true. This probability is quite low, so we have evidence that the numbers are not being selected with equal probability. Worth checking properly: Raider maybe onto something. I did this in R. Can provide a text file of the data if anyone wants it. What program did you use to make that graph? I'm using Microsoft and Excel and that's killer. Link to comment Share on other sites More sharing options...
Prometheus Posted April 7, 2017 Share Posted April 7, 2017 What program did you use to make that graph? I'm using Microsoft and Excel and that's killer. I used R. It's a popular language among statisticians. I've never used excel for any analysis, i've heard it's rubbish. -1 Link to comment Share on other sites More sharing options...
Lord Antares Posted April 7, 2017 Share Posted April 7, 2017 It's fairly easy to test for bias. No, it isn't. It's easy to suspect bias, but technically impossible to be certain about it. /cut Good analysis. Can you list the peak numbers? The graph doesn't define them. He mentioned something about numbers ending in 5 coming up more often. Link to comment Share on other sites More sharing options...
Raider5678 Posted April 7, 2017 Author Share Posted April 7, 2017 No, it isn't. It's easy to suspect bias, but technically impossible to be certain about it. Good analysis. Can you list the peak numbers? The graph doesn't define them. He mentioned something about numbers ending in 5 coming up more often. No, of all the numbers ending in 5, certain ones repeatedly showed up. Like 85, 25, 35, etc, never showed up while 15,55,05 showed up numerous times in a row. Link to comment Share on other sites More sharing options...
Delta1212 Posted April 7, 2017 Share Posted April 7, 2017 No, it isn't. It's easy to suspect bias, but technically impossible to be certain about it. Good analysis. Can you list the peak numbers? The graph doesn't define them. He mentioned something about numbers ending in 5 coming up more often. I made a couple of notes on the train yesterday and also thought that something looked a bit wonky, but didn't want to comment yet because I hadn't done anything more than back of the envelope math without an envelope and so didn't feel robustly confident in that conclusion. For the 96 draws on record, this was the number of times each number was drawn in first position: 0 - 11 1 - 13 2 - 12 3 - 9 4 - 11 5 - 11 6 - 7 7 - 5 8 - 6 9 - 11 And the number of times that each number was drawn in the second position: 0 - 10 1 - 9 2 - 14 3 - 6 4 - 11 5 - 8 6 - 5 7 - 11 8 - 10 9 - 12 And the total number each was drawn all together: 0 - 21 1 - 22 2 - 26 3 - 15 4 - 22 5 - 19 6 - 13 7 - 16 8 - 16 9 - 23 Additionally, there were 22 numbers that were drawn twice, 5 numbers drawn three times and the number 22 was drawn four times. This leaves 33 numbers that were drawn a single time and 39 numbers that were never drawn. Was going to look into how many repeats you would expect with 96 draws out of a sample of 100 with replacement but forgot by the time I got home. 1 Link to comment Share on other sites More sharing options...
Raider5678 Posted April 7, 2017 Author Share Posted April 7, 2017 (edited) Picked Numbers, in order Recent to Oldest 42-17-69-22-10-69-45-93-22-15-08-17-29-71-21-19-16-74-31-28-44-62-39-02-15-55-32-50-79-48-44-50-31-24-05-34-00-91-50-39-07-92-43-49-30 31-82-52-87-54-22-97-08-92-93-68-09-58-72-67-36-97-46-48-19-57-29-00-89-55-28-04-46-54-81-15-05-91-13-84-74-86-18-90-60-07-62-91-22-57-47-93-24-13-20 Number Set Numbers Picked 0s 08-02-05-00-07-08-09-00-04-05-07 10s 17-10-15-17-19-16-15-19-15-13-18-13 20s 22-22-29-21-28-24-22-29-28-22-24-20 30s 31-39-32-31-34-39-30-31-36 40s 42-45-44-48-44-43-49-46-48-46-47 50s 55-50-50-50-52-54-58-57-55-57 60s 69-69-62-68-67-60-62 70s 71-74-79-72-74 80s 82-87-89-81-84-86 90s 91-92-97-92-93-90-91-93 Number Set Numbers picked Ending in 0 10-50-50-00-50-30-00-90-60-20 Ending in 1 71-21-31-31-91-31-81-91-91 Ending in 2 42-22-22-62-02-32-92-82-52-22-92-72-62-22 Ending in 3 93-43-93-13-93-13 Ending in 4 74-44-44-24-34-54-04-54-84-74-24 Ending in 5 45-15-15-55-05-55-15-05 Ending in 6 16-36-46-46-86 Ending in 7 17-17-07-87-97-67-97-57-07-57-47 Ending in 8 08-28-48-08-68-58-48-28-18 Ending in 9 69-69-29-19-39-79-39-49-09-19-29-89 Number Set Frequency 0s Since: 11,13,11,2,4,12,4,11,4,5,9 10s Since: 2,3,5,2,4,1,8,40,11,3,4,11 20s Since: 4,5,4,2,5,14,17,16,4,18,4,2 30s Since: 19,4,4,6,3,4,5,1,15 40s Since: 1,6,14,9,1,12,1,19,1,9,18 50s Since: 25,2,4,7,9,2,8,8,4,4,16 60s Since: 3,3,16,34,4,25,2 70s Since: 14,4,11,30,22 80s Since: 47,2,20,6,5,2 90s Since: 8,30,4,10,2,1,7,16,6,4,4 Number Set Frequency Ending in 0 Since: 5,23,4,5,2,6,23,16,1,10 Ending in 1 Since: 14,1,4,14,5,8,29,3,10 Ending in 2 Since: 1,3,5,13,2,3,15,5,1,3,3,5,28,2 Ending in 3 Since: 8,35,12,24,13,2 Ending in 4 Since: 18,3,10,3,2,14,22,2,6,1,12 Ending in 5 Since: 7,3,15,1,9,35,6,1 Ending in 6 Since: 17,44,2,10,9 Ending in 7 Since: 2,10,29,8,3,8,2,4,20,4,1 Ending in 8 Since: 11,9,10,23,3,2,6,7,12 Ending in 9 Since: 3,3,7,3,7,6,11,4,13,10,2 Sorry. Originally these had all been nice little organized colored tables, but at least its the data. For the first data set, it lists all the numbers that were picked in the groups of 0s, 10s, 20s, 30s, etc. Data set 1: Underlined In the second data set it lists all the numbers that end in 0,1,2,3, etc. Data set 2: Bold The thirds shows frequency between the times the numbers were picked based up the number set. Data set 3: Slanted The fourth shows the frequency between the times the numbers were picked based upon what they ended in. Data set 4: Regular. If you compile all the frequencies, certain numbers seem to show up on a regular basis. Such as 20s. Edited April 7, 2017 by Raider5678 Link to comment Share on other sites More sharing options...
Prometheus Posted April 7, 2017 Share Posted April 7, 2017 You need to be careful performing numerous analyses on the same set of data. It's best practice to construct your hypotheses before looking at the data, otherwise you will fall into the trap of looking at the data every which way until you find something out of the ordinary. It's known as torturing the data. I'll do some more analysis on this if i get time. Link to comment Share on other sites More sharing options...
imatfaal Posted April 7, 2017 Share Posted April 7, 2017 I get a chi-squared of 77 which with 99 degrees of freedom is well within accepted chance of being drawn from the same distribution (ie random) Link to comment Share on other sites More sharing options...
Prometheus Posted April 7, 2017 Share Posted April 7, 2017 (edited) I got something like a chi-squared statistic of 155 on 99 DF, will check when i get home. What dataset did you use? What distribution did you test against? Edited April 7, 2017 by Prometheus Link to comment Share on other sites More sharing options...
Raider5678 Posted April 7, 2017 Author Share Posted April 7, 2017 I get a chi-squared of 77 which with 99 degrees of freedom is well within accepted chance of being drawn from the same distribution (ie random) Um, non genius here. What does that mean exactly? Link to comment Share on other sites More sharing options...
imatfaal Posted April 7, 2017 Share Posted April 7, 2017 I got something like a chi-squared statistic of 155 on 99 DF, will check when i get home. What dataset did you use? What distribution did you test against? My data are on my pc at work - so I cannot recheck. There were 1602 data points. I tested against an even distribution - ie sum^00_99 (bucket count - expected count if even distribution)^2 * (expected count if even distribution)^-1 Um, non genius here. What does that mean exactly? Chi-squared test is a test which allows you to test a group of observations and see how well they fit a theoretical distribution. In this case the theoretical distribution is an even (ie all the same) distribution resulting from true randomness (although we know that would almost never be the case). The figure we both quoted is basically the sum of the square of the differences between expected count and observed count (which is normalized by dividing by the expected count before summing) The degrees of freedom is a statistical term which has lots of meanings - in chi-squared it means the number of categories minus 1. The chi-squared figure and the degrees of freedom will give you (via a look up table or a function) a probability that the observed data and the theoretical data could both come from the same single data set I got something like a chi-squared statistic of 155 on 99 DF, will check when i get home. What dataset did you use? What distribution did you test against? I also checked singly ie first number against buckets from 0-9 and second number against buckets from 0-9. The second number was down in the 40% range - which is not enough to void the null but still damn worrying for a lottery. On monday I shall see if I can manage a minimum difference survey (followed by chi-squared) - that is the normal way to check the legitimacy of lottery draws. On a 6-49 lottery there is a 49% chance of a difference of one between two of the numbers, there are known chances for the other differences as well. You run a minimum difference for each set of 6 numbers and then pearson chi-squared the results of difference =1, =2, =3 etc. But I would first have to monte carlo / calculates analytically a set of probabilities for each of the differences for 2 draws from 10 Link to comment Share on other sites More sharing options...
Raider5678 Posted April 7, 2017 Author Share Posted April 7, 2017 My data are on my pc at work - so I cannot recheck. There were 1602 data points. I tested against an even distribution - ie sum^00_99 (bucket count - expected count if even distribution)^2 * (expected count if even distribution)^-1 Chi-squared test is a test which allows you to test a group of observations and see how well they fit a theoretical distribution. In this case the theoretical distribution is an even (ie all the same) distribution resulting from true randomness (although we know that would almost never be the case). The figure we both quoted is basically the sum of the square of the differences between expected count and observed count (which is normalized by dividing by the expected count before summing) The degrees of freedom is a statistical term which has lots of meanings - in chi-squared it means the number of categories minus 1. The chi-squared figure and the degrees of freedom will give you (via a look up table or a function) a probability that the observed data and the theoretical data could both come from the same single data set I also checked singly ie first number against buckets from 0-9 and second number against buckets from 0-9. The second number was down in the 40% range - which is not enough to void the null but still damn worrying for a lottery. On monday I shall see if I can manage a minimum difference survey (followed by chi-squared) - that is the normal way to check the legitimacy of lottery draws. On a 6-49 lottery there is a 49% chance of a difference of one between two of the numbers, there are known chances for the other differences as well. You run a minimum difference for each set of 6 numbers and then pearson chi-squared the results of difference =1, =2, =3 etc. But I would first have to monte carlo / calculates analytically a set of probabilities for each of the differences for 2 draws from 10. Geeze. Why don't you calculate this lottery and win a few grand? Seems like you'd be able to do it a lot more efficient then we did. Link to comment Share on other sites More sharing options...
imatfaal Posted April 7, 2017 Share Posted April 7, 2017 The differences (D) are easily calculable of course - there are only 100 outcomes D F(d) 0 10 1 18 2 16 3 14 4 12 5 10 6 8 7 6 8 4 9 2 But I don't have my data Geeze. Why don't you calculate this lottery and win a few grand? Seems like you'd be able to do it a lot more efficient then we did. This was the penn state lottery but I don't gamble. All the large state and State lotteries are very well checked by some serious statisticians for loopholes and for bias Link to comment Share on other sites More sharing options...
Prometheus Posted April 8, 2017 Share Posted April 8, 2017 Got a test statistic of 141. P-value 0.003 even after MC simulation (2000 replicates). But maybe a different dataset as it had 3032 observations. I've attached my data as a text file for anyone to check themselves. Never heard of minimum difference survey, look forward to seeing it. PA_Draw_2_lottery_results.txt Link to comment Share on other sites More sharing options...
Raider5678 Posted April 8, 2017 Author Share Posted April 8, 2017 Got a test statistic of 141. P-value 0.003 even after MC simulation (2000 replicates). But maybe a different dataset as it had 3032 observations. I've attached my data as a text file for anyone to check themselves. Never heard of minimum difference survey, look forward to seeing it. And this means? Link to comment Share on other sites More sharing options...
imatfaal Posted April 8, 2017 Share Posted April 8, 2017 Got a test statistic of 141. P-value 0.003 even after MC simulation (2000 replicates). But maybe a different dataset as it had 3032 observations. I've attached my data as a text file for anyone to check themselves. Never heard of minimum difference survey, look forward to seeing it. if you have have done the putting in categories correctly (and I presume you have) then that result is correct. I do chi squared by hand in excel - and get 141.6807 0.003186 And this means? With a huge pinch of salt - that there is only a very very small chance that the numbers are randomly generated. I do not think pearson chi-squared is the best test Link to comment Share on other sites More sharing options...
Raider5678 Posted April 8, 2017 Author Share Posted April 8, 2017 if you have have done the putting in categories correctly (and I presume you have) then that result is correct. I do chi squared by hand in excel - and get 141.6807 0.003186 With a huge pinch of salt - that there is only a very very small chance that the numbers are randomly generated. I do not think pearson chi-squared is the best test What would be the best test? Link to comment Share on other sites More sharing options...
imatfaal Posted April 8, 2017 Share Posted April 8, 2017 Got a test statistic of 141. P-value 0.003 even after MC simulation (2000 replicates). But maybe a different dataset as it had 3032 observations. I've attached my data as a text file for anyone to check themselves. Never heard of minimum difference survey, look forward to seeing it. OK Prometheus - you may like to run this check as well on our methodology. I just downloaded 3032 randomly generated numbers between 1-100 from the internet random number generator. via this method I got a set of p values of only 9%, and 46%, and 1% and 98% and 74% etc. It is not a sound method Link to comment Share on other sites More sharing options...
Raider5678 Posted April 8, 2017 Author Share Posted April 8, 2017 OK Prometheus - you may like to run this check as well on our methodology. I just downloaded 3032 randomly generated numbers between 1-100 from the internet random number generator. via this method I got a set of p values of only 9%, and 46%, and 1% and 98% and 74% etc. It is not a sound method Internet random numbers aren't very random. Link to comment Share on other sites More sharing options...
imatfaal Posted April 8, 2017 Share Posted April 8, 2017 Internet random numbers aren't very random. Yes they are. The numbers produced by RANDOM.ORG have been evaluated by eCOGRA, which is is a non-profit regulatory body that acts as the independent standards authority of the online gaming industry. For a typical gambling site, eCOGRA will oversee many aspects of its operation, including financial aspects, such as payout percentages. RANDOM.ORG is not a gambling site, so in our case, eCOGRA only evaluated the quality of the random numbers. They found that RANDOM.ORG consistently produced random numbers across scaling intervals and issued a certificate with their conclusion: ecogra-2009-06-25.pdf (1 page, 52 Kb) The numbers and software have also been evaluated by TST Global (part of Gaming Labs International) who in 2011 examined the generator for use in games hosted on Malta. TST's report stated that RANDOM.ORG ‘distributes numbers with sufficient non-predictability and fair distribution to particular outcomes’ and concluded that it ‘complies with the requirements of the applicable Technical Standard in the jurisdiction of Malta as regulated by The Lotteries and Gaming Authority (LGA).’ Most recently, our service was evaluated by by Gaming Labs International who in 2012 examined the generator for use in lottery games in the UK. Their report concluded that it ‘distributes numbers with sufficient non-predictability, fair distribution and lack of bias to particular outcomes’ and that it ‘complies with the requirements of the applicable Technical Standard in the UK Remote Gambling jurisdiction, as regulated by the United Kingdom Gambling Commission (UKGC).’ Further details are available upon request. Additionally, RANDOM.ORG is specifically accredited to generate randomness for use in games regulated by the following: Gambling Supervision Commission, Isle of Man Consumer and Business Services (formerly, Office of the Liquor and Gambling Commissioner), Government of South Australia Lotteries and Gaming Authority, Malta Certification documents for specific jurisdictions are available upon request. And here is how NIST tests for randomness of random numbers http://csrc.nist.gov/groups/ST/toolkit/rng/stats_tests.html I am going to bed to try to forestall the desire to start to test Link to comment Share on other sites More sharing options...
Prometheus Posted April 8, 2017 Share Posted April 8, 2017 It is not a sound method Or the method is sensitive enough to detect the pseudo-randomness of the internet generated numbers! Ha, doubt it though: i'll generate some random numbers too and have a play later. The other test I considered was the Kolmogorov-Smirnov test but i thought it would have less power as it is a non-parametric method and will require us to model a discrete distribution as continuous. I stumbled across the Cramer Von-Mises test, but i don't know a thing about it. That's why i'm interested to see the minimum difference survey method: how it works and performs. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now