Jump to content

Significance test query


Recommended Posts

Guest d.brent80
Posted

Consider the following:

 

Period Shop Sales Customers Region

1 A 100 10 North

2 A 250 40 North

1 B 500 35 North

2 B 600 20 North

1 C 100 5 South

 

and so on...

 

I have calculated the average sales per customer for each region by adding up the sales and cutomer columns for each region and dividing total sales by total customers, e.g. average sales per customer for the North - 13.8.

 

I have then produced a bar chart displaying these averages for each region and I want to test them to see if they are statistically significantly different

 

Can this be done with just the data I have got?

 

Many thanks for your help.

Posted

Yes it can. If you want to compare just two regions (i.e. test for a difference between two samples of data) you can use an independent t-test. If you have three or more levels of your factor (i.e. region; e.g. north, south, east and west) you need to use a One-way Analysis of Variance (ANOVA).

Guest d.brent80
Posted

Thanks for replying Glider,

 

Yes those are the tests that I was thinking about. But, how can I actually use these with the data in the above format?

 

For example, how can I get the standard deviation with the data above?

 

Its a case of I know what method I want to use but dont know how to go about it!

Posted

For standard deviation of the ungrouped data just use the formula

 

:lcsigma:^2 = [ (:sum: x^2) - ((:sum:x)^2 / N) ] / (N - 1) *

 

where x = mean, N = sample size

 

Do that for each region, then you'll have enough info to do the tests.

 

*sample standard deviation, for population use N instead of N-1

Guest d.brent80
Posted

Yes, I am aware of the sd formula. I am obvisously not explaining myself properly - apologies.

 

I suppose the best way to understand the problem would be to ask - What would I use for 'x'.

 

If I create a new variable called average sales per customer (sales/customers) and averaged this variable to create the overall mean, I will get a different average to if I had summed the sales column and the customers column and then did total sales/total customers.

Posted
Originally posted by d.brent80

I suppose the best way to understand the problem would be to ask - What would I use for 'x'.

 

If I create a new variable called average sales per customer (sales/customers) and averaged this variable to create the overall mean, I will get a different average to if I had summed the sales column and the customers column and then did total sales/total customers.

 

If you want to see if the sales are statistically different for each region, what you're going to want to do is take the mean for each specific region.

 

You will get four different means (assuming four regions: north,south,east,west)

 

Then, take the standard deviation for each region. You will get four different standard deviations.

 

Then you can use independant t-test to compare one sample to another (north to south, east to west, etc) to the degree of certainty you wish.

 

I'm still not sure if I answered your question. If I read it correctly you weren't sure which xbar you wanted to use. Since you're comparing regions, calculate a seperate xbar for each region. You'll need those for the t-test. But remember the t-test can only compare one region to another, not all the regions at once.

 

I'm not sure how to go about the ANOVA analysis, but I can look it up if thats the way you want to go.

Posted

How many regions do you have? Are you using computer software or doing it by hand?

 

If you are using a spreadsheet (I use SPSS, one of most widely used statistical packages), the way to perform an independent t-test is to enter the data in two columns, where column 1 is the grouping variable (e.g. where 1 = north, and 2 = south). Column 2 would contain the raw data, i.e. the sales per each individual customer (if you have those data). Then you simply ask whatever spreadsheet you are using to compare the sales for group 1 against the sales for group 2.

 

If you only have the averages per customer, you can still do the test. If you only have the averages per region, you will need to do it by hand. It would help to see an example of the data you collected.

 

As I say, if you have more than two regions, you will need to use ANOVA. You could use multiple t-tests, but it's time consuming (if you had 4 regions, you would have to test 1 by 2, 1 by 3, 1 by 4, 2 by 3, 2 by 4 and so-on) and doesn't account for the whole model (differing degrees of freedom per pair), so it would not accurately show which region had the best sales compared to all other regions.

 

If you need help with ANOVA, I could attach a teaching booklet I wrote that was published by the Open University (I have a copy in *.pdf format on this system), but it's aimed mainly at psychologists who tend to use SPSS (Statistical Package for the Social Sciences).

Guest d.brent80
Posted

Thanks guys for your help, all of what you say makes sense and I agree with it.

 

I have attached an example of the data at the bottom of this message. Would it be possible for you to set out the workings manually. (I will be using SPSS but I just want to see the manual workings first).

 

Remember, I am comparing the averages of sales per customer by region.

 

I do have more than two groups and will eventually use ANOVA, but for now, I have just included 2 groups - so the independant t-test will suffice.

 

Many thanks once again

 

 

Example of data

 

Period Shop Sales Customers Region

1 a 100 10 North

2 a 250 40 North

1 b 500 35 North

2 b 600 20 North

1 c 100 5 South

2 c 300 15 South

Posted

There is a problem here. You can't really do a t-test without tables. These tables can be found in most statistics books appendices. As you are asking for the formula to do the test manually, I have to assume you don't have these books (as the formulae for most inferencial tests are presented in such books; Coolican or Howell for example).

 

Without these tables, you could calculate the value for 't' (the t-statistic), but that wouldn't tell you whether there was a significant difference between your samples. The value for t is only an indication of the magnitude of the difference between the samples (the further from zero, irrespective of sign, the greater the difference). If you wanted to know whether the difference was statistically significant, you would need to compare the t value against the tables under alpha = 0.05 (by convention).

 

Your best bet would be to get straight into SPSS. That will calculate t, perform a test for equality of variance (Levine's test) which is a basic assumption for parametric data, and tell you whether the t value is statistically significant or not and also provide the exact value for p (probability under the normal distribution).

 

Finally, for your sample data, I need variable headers. I don't know what three of the columns represent, so I can't even help with data entry.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.