Lottery Calculating Trouble

Ken Fabian · April 7, 2017

If there is statistical evidence that the draw is not getting random results that may be circumstantial evidence of fraud or faulty methodology - which may be legally actionable against the operators of the lottery. Unless some kind of insider knowledge is involved those who won under those arrangements should be immune from legal action.The statistical evidence will not be proof by itself, rather, it would be cause to investigate and small numbers of small draws may not be a large enough number to apply statistics to unless the fraud or fault is major.

The only proven method of winning with genuine "fair" lotteries that I am aware of is with accumulating jackpot type lotteries that end up with prize money exceeding the costs of all the tickets, by attempting to buy them all (or as close to all as possible). I'm not sure but I think rules often work to prevent that happening, by preventing mass purchases at the scales that needs and by allocations to over the counter type sales that would can't easily be bought in bulk. I have heard of syndicates set up to take advantage of large jackpots in jurisdictions with rules that enable it.

Delta1212 · April 7, 2017

If there is statistical evidence that the draw is not getting random results that may be circumstantial evidence of fraud or faulty methodology - which may be legally actionable against the operators of the lottery. Unless some kind of insider knowledge is involved those who won under those arrangements should be immune from legal action.The statistical evidence will not be proof by itself, rather, it would be cause to investigate and small numbers of small draws may not be a large enough number to apply statistics to unless the fraud or fault is major.

The only proven method of winning with genuine "fair" lotteries that I am aware of is with accumulating jackpot type lotteries that end up with prize money exceeding the costs of all the tickets, by attempting to buy them all (or as close to all as possible). I'm not sure but I think rules often work to prevent that happening, by preventing mass purchases at the scales that needs and by allocations to over the counter type sales that would can't easily be bought in bulk. I have heard of syndicates set up to take advantage of large jackpots in jurisdictions with rules that enable it.

The problem, of course, is that often with jackpots that large, you get multiple winners and wind up having to split the pot, and that can very easily take it from profit to major loss unless the jackpot is worth the ticket value several times over.

Ken Fabian · April 7, 2017

The problem, of course, is that often with jackpots that large, you get multiple winners and wind up having to split the pot, and that can very easily take it from profit to major loss unless the jackpot is worth the ticket value several times over.

I think that depends on the specific lottery - the NSW government authorised Jackpot lotteries popular here are won by single tickets, a two draw system where the jackpot ticket number is drawn from all tickets (200,000 at $5.50 per ticket or 270,000 at $2.20, for the two common offerings if I recall correctly), but only wins if that ticket was also a winner of an ordinary prize (1 in 17 or 1 in 24 wins a prize) If you buy in as part of a syndicate, then yes, you share with other syndicate members and it won't be cheap to buy every ticket. I don't think it would work under the gaming regulations here, but in theory that would be $AU1.1M at $5.50 a ticket or $AU594K for the $2.20 version. But the jackpot prize has reached $16M for the latter, much more than the cost of all tickets.

Prometheus · April 7, 2017

I ran a quick analysis on some data from the lottery in question.

Data obtained from https://www.palottery.state.pa.us/Draw-Games/PICK-2.aspx

Dates searched: 01/01/15 - 05/04/17 both draws a day, representing all historical draws to the date of analysis

If the game is fair every number has an equal chance of being drawn each draw. Therefore we expect the distribution to be uniform over the range of numbers.

Here's a quick visual check of the draws. In red is the expected number of draws if each number is selected with equal probability.

There is quite a bit of variance between the observed and expected, but we would be surprised if the observed frequency of draws matched this very closely.

To quantify this a little i performed a chi-square goodness-to-fit test under the null hypothesis that each number is selected with equal probability.

This gives p=0.003. This is the probability of seeing the results we do assuming the null hypothesis is true. This probability is quite low, so we have evidence that the numbers are not being selected with equal probability.

Worth checking properly: Raider maybe onto something.

I did this in R. Can provide a text file of the data if anyone wants it.

Raider5678 · April 7, 2017

I ran a quick analysis on some data from the lottery in question.

Data obtained from https://www.palottery.state.pa.us/Draw-Games/PICK-2.aspx

Dates searched: 01/01/15 - 05/04/17 both draws a day, representing all historical draws to the date of analysis

If the game is fair every number has an equal chance of being drawn each draw. Therefore we expect the distribution to be uniform over the range of numbers.

Here's a quick visual check of the draws. In red is the expected number of draws if each number is selected with equal probability.

HistofDraws.jpeg

There is quite a bit of variance between the observed and expected, but we would be surprised if the observed frequency of draws matched this very closely.

To quantify this a little i performed a chi-square goodness-to-fit test under the null hypothesis that each number is selected with equal probability.

This gives p=0.003. This is the probability of seeing the results we do assuming the null hypothesis is true. This probability is quite low, so we have evidence that the numbers are not being selected with equal probability.

Worth checking properly: Raider maybe onto something.

I did this in R. Can provide a text file of the data if anyone wants it.

What program did you use to make that graph?

I'm using Microsoft and Excel and that's killer.

Prometheus · April 7, 2017

What program did you use to make that graph?

I'm using Microsoft and Excel and that's killer.

I used R. It's a popular language among statisticians. I've never used excel for any analysis, i've heard it's rubbish.

Lord Antares · April 7, 2017

It's fairly easy to test for bias.

No, it isn't. It's easy to suspect bias, but technically impossible to be certain about it.

/cut

Good analysis. Can you list the peak numbers? The graph doesn't define them. He mentioned something about numbers ending in 5 coming up more often.

Raider5678 · April 7, 2017

No, it isn't. It's easy to suspect bias, but technically impossible to be certain about it.

Good analysis. Can you list the peak numbers? The graph doesn't define them. He mentioned something about numbers ending in 5 coming up more often.

No, of all the numbers ending in 5, certain ones repeatedly showed up.

Like 85, 25, 35, etc, never showed up while 15,55,05 showed up numerous times in a row.

Delta1212 · April 7, 2017

No, it isn't. It's easy to suspect bias, but technically impossible to be certain about it.

Good analysis. Can you list the peak numbers? The graph doesn't define them. He mentioned something about numbers ending in 5 coming up more often.

I made a couple of notes on the train yesterday and also thought that something looked a bit wonky, but didn't want to comment yet because I hadn't done anything more than back of the envelope math without an envelope and so didn't feel robustly confident in that conclusion.

For the 96 draws on record, this was the number of times each number was drawn in first position:

0 - 11

1 - 13

2 - 12

3 - 9

4 - 11

5 - 11

6 - 7

7 - 5

8 - 6

9 - 11

And the number of times that each number was drawn in the second position:

0 - 10

1 - 9

2 - 14

3 - 6

4 - 11

5 - 8

6 - 5

7 - 11

8 - 10

9 - 12

And the total number each was drawn all together:

0 - 21

1 - 22

2 - 26

3 - 15

4 - 22

5 - 19

6 - 13

7 - 16

8 - 16

9 - 23

Additionally, there were 22 numbers that were drawn twice, 5 numbers drawn three times and the number 22 was drawn four times. This leaves 33 numbers that were drawn a single time and 39 numbers that were never drawn.

Was going to look into how many repeats you would expect with 96 draws out of a sample of 100 with replacement but forgot by the time I got home.

Raider5678 · April 7, 2017

Picked Numbers, in order Recent to Oldest

42-17-69-22-10-69-45-93-22-15-08-17-29-71-21-19-16-74-31-28-44-62-39-02-15-55-32-50-79-48-44-50-31-24-05-34-00-91-50-39-07-92-43-49-30

31-82-52-87-54-22-97-08-92-93-68-09-58-72-67-36-97-46-48-19-57-29-00-89-55-28-04-46-54-81-15-05-91-13-84-74-86-18-90-60-07-62-91-22-57-47-93-24-13-20

Number Set

Numbers Picked

0s

08-02-05-00-07-08-09-00-04-05-07

10s

17-10-15-17-19-16-15-19-15-13-18-13

20s

22-22-29-21-28-24-22-29-28-22-24-20

30s

31-39-32-31-34-39-30-31-36

40s

42-45-44-48-44-43-49-46-48-46-47

50s

55-50-50-50-52-54-58-57-55-57

60s

69-69-62-68-67-60-62

70s

71-74-79-72-74

80s

82-87-89-81-84-86

90s

91-92-97-92-93-90-91-93

Number Set

Numbers picked

Ending in 0

10-50-50-00-50-30-00-90-60-20

Ending in 1

71-21-31-31-91-31-81-91-91

Ending in 2

42-22-22-62-02-32-92-82-52-22-92-72-62-22

Ending in 3

93-43-93-13-93-13

Ending in 4

74-44-44-24-34-54-04-54-84-74-24

Ending in 5

45-15-15-55-05-55-15-05

Ending in 6

16-36-46-46-86

Ending in 7

17-17-07-87-97-67-97-57-07-57-47

Ending in 8

08-28-48-08-68-58-48-28-18

Ending in 9

69-69-29-19-39-79-39-49-09-19-29-89

Number Set

Frequency

0s

Since: 11,13,11,2,4,12,4,11,4,5,9

10s

Since: 2,3,5,2,4,1,8,40,11,3,4,11

20s

Since: 4,5,4,2,5,14,17,16,4,18,4,2

30s

Since: 19,4,4,6,3,4,5,1,15

40s

Since: 1,6,14,9,1,12,1,19,1,9,18

50s

Since: 25,2,4,7,9,2,8,8,4,4,16

60s

Since: 3,3,16,34,4,25,2

70s

Since: 14,4,11,30,22

80s

Since: 47,2,20,6,5,2

90s

Since: 8,30,4,10,2,1,7,16,6,4,4

Number Set

Frequency

Ending in 0

Since: 5,23,4,5,2,6,23,16,1,10

Ending in 1

Since: 14,1,4,14,5,8,29,3,10

Ending in 2

Since: 1,3,5,13,2,3,15,5,1,3,3,5,28,2

Ending in 3

Since: 8,35,12,24,13,2

Ending in 4

Since: 18,3,10,3,2,14,22,2,6,1,12

Ending in 5

Since: 7,3,15,1,9,35,6,1

Ending in 6

Since: 17,44,2,10,9

Ending in 7

Since: 2,10,29,8,3,8,2,4,20,4,1

Ending in 8

Since: 11,9,10,23,3,2,6,7,12

Ending in 9

Since: 3,3,7,3,7,6,11,4,13,10,2

Sorry. Originally these had all been nice little organized colored tables, but at least its the data.

For the first data set, it lists all the numbers that were picked in the groups of 0s, 10s, 20s, 30s, etc. Data set 1: Underlined

In the second data set it lists all the numbers that end in 0,1,2,3, etc. Data set 2: Bold

The thirds shows frequency between the times the numbers were picked based up the number set. Data set 3: Slanted

The fourth shows the frequency between the times the numbers were picked based upon what they ended in. Data set 4: Regular.

If you compile all the frequencies, certain numbers seem to show up on a regular basis. Such as 20s.

Edited April 7, 2017 by Raider5678

Prometheus · April 7, 2017

You need to be careful performing numerous analyses on the same set of data.

It's best practice to construct your hypotheses before looking at the data, otherwise you will fall into the trap of looking at the data every which way until you find something out of the ordinary. It's known as torturing the data.

I'll do some more analysis on this if i get time.

**imatfaal** · April 7, 2017

I get a chi-squared of 77 which with 99 degrees of freedom is well within accepted chance of being drawn from the same distribution (ie random)

Prometheus · April 7, 2017

I got something like a chi-squared statistic of 155 on 99 DF, will check when i get home.

What dataset did you use? What distribution did you test against?

Edited April 7, 2017 by Prometheus

Raider5678 · April 7, 2017

I get a chi-squared of 77 which with 99 degrees of freedom is well within accepted chance of being drawn from the same distribution (ie random)

Um, non genius here. What does that mean exactly?

**imatfaal** · April 7, 2017

I got something like a chi-squared statistic of 155 on 99 DF, will check when i get home.

What dataset did you use? What distribution did you test against?

My data are on my pc at work - so I cannot recheck. There were 1602 data points.

I tested against an even distribution - ie sum^00_99 (bucket count - expected count if even distribution)^2 * (expected count if even distribution)^-1

Um, non genius here. What does that mean exactly?

Chi-squared test is a test which allows you to test a group of observations and see how well they fit a theoretical distribution. In this case the theoretical distribution is an even (ie all the same) distribution resulting from true randomness (although we know that would almost never be the case).

The figure we both quoted is basically the sum of the square of the differences between expected count and observed count (which is normalized by dividing by the expected count before summing)

The degrees of freedom is a statistical term which has lots of meanings - in chi-squared it means the number of categories minus 1.

The chi-squared figure and the degrees of freedom will give you (via a look up table or a function) a probability that the observed data and the theoretical data could both come from the same single data set

I got something like a chi-squared statistic of 155 on 99 DF, will check when i get home.

What dataset did you use? What distribution did you test against?

I also checked singly ie first number against buckets from 0-9 and second number against buckets from 0-9. The second number was down in the 40% range - which is not enough to void the null but still damn worrying for a lottery.

On monday I shall see if I can manage a minimum difference survey (followed by chi-squared) - that is the normal way to check the legitimacy of lottery draws. On a 6-49 lottery there is a 49% chance of a difference of one between two of the numbers, there are known chances for the other differences as well. You run a minimum difference for each set of 6 numbers and then pearson chi-squared the results of difference =1, =2, =3 etc. But I would first have to monte carlo / calculates analytically a set of probabilities for each of the differences for 2 draws from 10

Raider5678 · April 7, 2017

My data are on my pc at work - so I cannot recheck. There were 1602 data points.

I tested against an even distribution - ie sum^00_99 (bucket count - expected count if even distribution)^2 * (expected count if even distribution)^-1

Chi-squared test is a test which allows you to test a group of observations and see how well they fit a theoretical distribution. In this case the theoretical distribution is an even (ie all the same) distribution resulting from true randomness (although we know that would almost never be the case).

The figure we both quoted is basically the sum of the square of the differences between expected count and observed count (which is normalized by dividing by the expected count before summing)

The degrees of freedom is a statistical term which has lots of meanings - in chi-squared it means the number of categories minus 1.

The chi-squared figure and the degrees of freedom will give you (via a look up table or a function) a probability that the observed data and the theoretical data could both come from the same single data set

I also checked singly ie first number against buckets from 0-9 and second number against buckets from 0-9. The second number was down in the 40% range - which is not enough to void the null but still damn worrying for a lottery.

On monday I shall see if I can manage a minimum difference survey (followed by chi-squared) - that is the normal way to check the legitimacy of lottery draws. On a 6-49 lottery there is a 49% chance of a difference of one between two of the numbers, there are known chances for the other differences as well. You run a minimum difference for each set of 6 numbers and then pearson chi-squared the results of difference =1, =2, =3 etc. But I would first have to monte carlo / calculates analytically a set of probabilities for each of the differences for 2 draws from 10.

Geeze. Why don't you calculate this lottery and win a few grand?

Seems like you'd be able to do it a lot more efficient then we did.

**imatfaal** · April 7, 2017

The differences (D) are easily calculable of course - there are only 100 outcomes

D F(d)

0 10

1 18

2 16

3 14

4 12

5 10

6 8

7 6

8 4

9 2

But I don't have my data

Geeze. Why don't you calculate this lottery and win a few grand?

Seems like you'd be able to do it a lot more efficient then we did.

This was the penn state lottery but I don't gamble. All the large state and State lotteries are very well checked by some serious statisticians for loopholes and for bias

Prometheus · April 8, 2017

Got a test statistic of 141. P-value 0.003 even after MC simulation (2000 replicates). But maybe a different dataset as it had 3032 observations. I've attached my data as a text file for anyone to check themselves. Never heard of minimum difference survey, look forward to seeing it.

PA_Draw_2_lottery_results.txt

Raider5678 · April 8, 2017

Got a test statistic of 141. P-value 0.003 even after MC simulation (2000 replicates). But maybe a different dataset as it had 3032 observations. I've attached my data as a text file for anyone to check themselves. Never heard of minimum difference survey, look forward to seeing it.

And this means?

**imatfaal** · April 8, 2017

Got a test statistic of 141. P-value 0.003 even after MC simulation (2000 replicates). But maybe a different dataset as it had 3032 observations. I've attached my data as a text file for anyone to check themselves. Never heard of minimum difference survey, look forward to seeing it.

if you have have done the putting in categories correctly (and I presume you have) then that result is correct.

I do chi squared by hand in excel - and get

141.6807 0.003186

And this means?

With a huge pinch of salt - that there is only a very very small chance that the numbers are randomly generated.

I do not think pearson chi-squared is the best test

Raider5678 · April 8, 2017

if you have have done the putting in categories correctly (and I presume you have) then that result is correct.

I do chi squared by hand in excel - and get

141.6807 0.003186

With a huge pinch of salt - that there is only a very very small chance that the numbers are randomly generated.

I do not think pearson chi-squared is the best test

What would be the best test?

**imatfaal** · April 8, 2017

Got a test statistic of 141. P-value 0.003 even after MC simulation (2000 replicates). But maybe a different dataset as it had 3032 observations. I've attached my data as a text file for anyone to check themselves. Never heard of minimum difference survey, look forward to seeing it.

OK Prometheus - you may like to run this check as well on our methodology. I just downloaded 3032 randomly generated numbers between 1-100 from the internet random number generator. via this method I got a set of p values of only 9%, and 46%, and 1% and 98% and 74% etc.

It is not a sound method

Raider5678 · April 8, 2017

OK Prometheus - you may like to run this check as well on our methodology. I just downloaded 3032 randomly generated numbers between 1-100 from the internet random number generator. via this method I got a set of p values of only 9%, and 46%, and 1% and 98% and 74% etc.

It is not a sound method

Internet random numbers aren't very random.

**imatfaal** · April 8, 2017

Internet random numbers aren't very random.

Yes they are.

The numbers produced by RANDOM.ORG have been evaluated by eCOGRA, which is is a non-profit regulatory body that acts as the independent standards authority of the online gaming industry. For a typical gambling site, eCOGRA will oversee many aspects of its operation, including financial aspects, such as payout percentages. RANDOM.ORG is not a gambling site, so in our case, eCOGRA only evaluated the quality of the random numbers. They found that RANDOM.ORG consistently produced random numbers across scaling intervals and issued a certificate with their conclusion: ecogra-2009-06-25.pdf (1 page, 52 Kb)

The numbers and software have also been evaluated by TST Global (part of Gaming Labs International) who in 2011 examined the generator for use in games hosted on Malta. TST's report stated that RANDOM.ORG ‘distributes numbers with sufficient non-predictability and fair distribution to particular outcomes’ and concluded that it ‘complies with the requirements of the applicable Technical Standard in the jurisdiction of Malta as regulated by The Lotteries and Gaming Authority (LGA).’

Most recently, our service was evaluated by by Gaming Labs International who in 2012 examined the generator for use in lottery games in the UK. Their report concluded that it ‘distributes numbers with sufficient non-predictability, fair distribution and lack of bias to particular outcomes’ and that it ‘complies with the requirements of the applicable Technical Standard in the UK Remote Gambling jurisdiction, as regulated by the United Kingdom Gambling Commission (UKGC).’ Further details are available upon request.

Additionally, RANDOM.ORG is specifically accredited to generate randomness for use in games regulated by the following:

Gambling Supervision Commission, Isle of Man

Consumer and Business Services (formerly, Office of the Liquor and Gambling Commissioner), Government of South Australia

Lotteries and Gaming Authority, Malta

Certification documents for specific jurisdictions are available upon request.

And here is how NIST tests for randomness of random numbers

http://csrc.nist.gov/groups/ST/toolkit/rng/stats_tests.html

I am going to bed to try to forestall the desire to start to test

Prometheus · April 8, 2017

It is not a sound method

Or the method is sensitive enough to detect the pseudo-randomness of the internet generated numbers! Ha, doubt it though: i'll generate some random numbers too and have a play later.

The other test I considered was the Kolmogorov-Smirnov test but i thought it would have less power as it is a non-parametric method and will require us to model a discrete distribution as continuous. I stumbled across the Cramer Von-Mises test, but i don't know a thing about it. That's why i'm interested to see the minimum difference survey method: how it works and performs.

Sign In

Lottery Calculating Trouble

Recommended Posts

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Delta1212

DrP

Prometheus

Posted Images

Create an account or sign in to comment

Create an account

Sign in

Important Information