Jump to content

Recommended Posts

Posted

Hi all

 

Have a quick question... I'm writing up my year's project work, and shockingly enough it involves a handful of straight line graphs. Excel kindly provides me with a line of best fit and its corresponding equation, along with a value called R-squared. (Mine is 0.537... not too good)

 

I know this is a measure of how good the fit is, but I'm not sure how to explain this in my write-up. I won't need detail on it, just a brief statement to the tune of "The correlation isn't very good". How should I refer to this R-squared value? I know very little about statistics.

 

Cheers

Kaeroll

Posted

The R-squared value is used to decide if the two values are correlated. This is not the same as the data falling into a straight line. Suppose that the data plots into a horizontal line. The R-squared would be low. The reason is that changing the horizontal variable, the independent variable, doe snot affect the dependent variable. The two are unlikely to be related since changing one does not change the other.

 

The R-squared value is used to tell whether or not one variable can be used to predict another variable. In your case it sounds undecided, a flip of the coin.

 

R-squared is not the quality of the fit. The variance measures the quality of the fit. There are other statistical measures that are used to measure the quality of the fit.

Posted

Thanks for your reply, that clarifies things a lot.

 

How would you suggest I use the R-squared value in discussing my results? Is it fairly typical to refer to it as R-squared without elaboration? (The project is not based on math or statistics at all.)

Posted

sterologist, you are mistaken about what Excel refers to it's [math]R^2[/math] value as. Excel's [math]R^2[/math] is simply a "goodness of fit" measure (and an awfully poor one at that).

 

Do your example up in Excel -- put in y values that don't change with changing x values, and ask Excel for the best-fit line and the [math]R^2[/math] value will equal exactly one.

 

[math]R^2[/math] in Excel is simply a measure of the sums of the squares between the "best-fit" line and the actual data.

 

http://en.wikipedia.org/wiki/Coefficient_of_determination has a more thorough discussion (as well as it's statistical roots which you are referring too)

 

But, you should not use it to determine which model is best (i.e. between linear and logarithmic and quadratic, etc.) because you can always manipulate the results to whatever [math]R^2[/math] value you want. I.e. if you have 20 data points, a 19th order polynomial going through every point will have an [math]R^2[/math] value of 1, but is in all likelihood a terrible representation of what is actually happening.

 

Too many people place way too much value on [math]R^2[/math] measures. There are many more meaningful measure of goodness of fit out there. A chi-squared ([math]\chi ^2[/math]) is one of the simpler and much more meaningful measures of goodness of fit.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.