Kaeroll Posted April 24, 2009 Posted April 24, 2009 Hi all Have a quick question... I'm writing up my year's project work, and shockingly enough it involves a handful of straight line graphs. Excel kindly provides me with a line of best fit and its corresponding equation, along with a value called R-squared. (Mine is 0.537... not too good) I know this is a measure of how good the fit is, but I'm not sure how to explain this in my write-up. I won't need detail on it, just a brief statement to the tune of "The correlation isn't very good". How should I refer to this R-squared value? I know very little about statistics. Cheers Kaeroll
stereologist Posted April 24, 2009 Posted April 24, 2009 The R-squared value is used to decide if the two values are correlated. This is not the same as the data falling into a straight line. Suppose that the data plots into a horizontal line. The R-squared would be low. The reason is that changing the horizontal variable, the independent variable, doe snot affect the dependent variable. The two are unlikely to be related since changing one does not change the other. The R-squared value is used to tell whether or not one variable can be used to predict another variable. In your case it sounds undecided, a flip of the coin. R-squared is not the quality of the fit. The variance measures the quality of the fit. There are other statistical measures that are used to measure the quality of the fit.
Kaeroll Posted April 24, 2009 Author Posted April 24, 2009 Thanks for your reply, that clarifies things a lot. How would you suggest I use the R-squared value in discussing my results? Is it fairly typical to refer to it as R-squared without elaboration? (The project is not based on math or statistics at all.)
Bignose Posted April 24, 2009 Posted April 24, 2009 sterologist, you are mistaken about what Excel refers to it's [math]R^2[/math] value as. Excel's [math]R^2[/math] is simply a "goodness of fit" measure (and an awfully poor one at that). Do your example up in Excel -- put in y values that don't change with changing x values, and ask Excel for the best-fit line and the [math]R^2[/math] value will equal exactly one. [math]R^2[/math] in Excel is simply a measure of the sums of the squares between the "best-fit" line and the actual data. http://en.wikipedia.org/wiki/Coefficient_of_determination has a more thorough discussion (as well as it's statistical roots which you are referring too) But, you should not use it to determine which model is best (i.e. between linear and logarithmic and quadratic, etc.) because you can always manipulate the results to whatever [math]R^2[/math] value you want. I.e. if you have 20 data points, a 19th order polynomial going through every point will have an [math]R^2[/math] value of 1, but is in all likelihood a terrible representation of what is actually happening. Too many people place way too much value on [math]R^2[/math] measures. There are many more meaningful measure of goodness of fit out there. A chi-squared ([math]\chi ^2[/math]) is one of the simpler and much more meaningful measures of goodness of fit.
hermanntrude Posted April 30, 2009 Posted April 30, 2009 excel can do that too... but you have to use a function to calculate it
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now