Validity of Statistics

AdvRoboticsE529 · April 1, 2014

No, I don't understand quantum mechanics, and I have asked about the application of probability in quantum mechanics, as repeated, am told that there are confirmations with experiments.

Bignose · April 1, 2014

and hence only poses to minimise the difference of the mean value from every-single value within the set of data.

How is that any different than coordinate points in geometry? The average coordinate is also the one that minimizes the distance between all the points. I.e. it is the exact same calculation. How is one more 'pure' than the other?

In statistics mostly, especially in applying to reality, the values within the set of data are very much different with unknown variables affecting them.

You can try to test the function in economics with statistics, it usually will not work and is quite inaccurate.

Ah, see, now you're on to a different subject. Data measuring and gathering. And a little bit of model fitting. Statistics has mathematics to deal with each of these, actually. But, I agree, that if the samples from the population are not done with care, then just blindly using the mean of those samples as the mean of the entire population can lead to large errors.

Look, if you are saying that an average doesn't capture all the details of the data, you're exactly right. But, it's not intended to. It is intended to mush everything together and get one single value.

You're right in that if people take that mean out of context, as sacrosanct, or attribute more value to it than there really is, bad things happen and poor interpretations follow.

It especially gets bad since in most cases, you can't measure every element of the population, so you take samples from it, and the mean of that sample also has to be interpreted.

But, here's the great thing -- the math of statistics handles that. If you sample in a certain way, you can calculate how likely the mean of the samples you took is actually the mean of the entire population. Will you know 100%, of course not. But this goes back to issues brought up 70 posts ago, like destructive testing.

I completely, 100%, totally, and without any hesitation agree that interpretation of these numbers is all too often done poorly. But this doesn't affect the 'purity' of the mathematics. Just because one is geometry and the other is samples doesn't make one more right or wrong than the other.

Edited April 1, 2014 by Bignose

studiot · April 1, 2014

You started this thread with a perfectly reasonable proposition for debating purposes.

However you also added an extremely insulting allegation against professionals, which you have further promoted throughout the thread.

The truly professional way to discuss something when there is a difference in point of view is to pool all the known facts and information and examine them dispassionately to try to jointly arrive at the truth.

I have tried to offer this process, but without cooperation.

I therefore reluctantly come to the conclusion that the true purpose of this thread is to foster an argument.

Finally, your use of the word certain seems to me to mean what is commonly called 'deterministic' in Science rather than the real meaning of the word certain.

I can count three tablets and establish that there are three.

There is nothing deterministic about this.

However 3 times 2 makes 6 is deterministic, but can also be called certain.

The following link may help you with some scientific terms and also has some interesting things to say about the thread subject matter.

http://abyss.uoregon.edu/~js/glossary/clockwork_universe.html

Edited April 1, 2014 by studiot

AdvRoboticsE529 · April 1, 2014

To determine how likely the sample of mean is the mean of the entire set of data with statistics itself seems to be going in circles, I only talked about averages, however, it would seem that many other definitions and the application of them still seems artificial.

@studiot

You're trying to be personal again.

Edited April 1, 2014 by AdvRoboticsE529

Bignose · April 1, 2014

Adv, here's a good example of what I am thinking you are saying.

Set1: [1, 2, 3, 4, 5, 6, 7, 8, 9]

Set2: [1, 1, 1, 1, 5, 9, 9, 9, 9]

Two very different distributions, but both have a mean of exactly 5.0.

And you are correct that is someone said both sets are the same because each set's mean is 5.0, they would be in very significant error.

But, that is also why other statistics about the sets were invented, like the variance (and skewness, and kurtosis, etc.).

Math isn't going to fix people poorly attributing meaning to the calculations. But that doesn't make the math 'un-pure'.

AdvRoboticsE529 · April 1, 2014

The application of mean in statistics is not the only definition I had problems with, it is as if my posts weren't read at all.

Variance and skewness is very much artificial, I never talked much about skewness, skewness focuses specifically in the difference in quartiles which are set at 25%, 50% and 75%, again, specifically engineered so to suit the ideal world of whoever created such methods, this is very much unpure.

Edited April 1, 2014 by AdvRoboticsE529

Bignose · April 1, 2014

To determine how likely the sample of mean is the mean of the entire set of data with statistics itself seems to be going in circles, I only talked about averages, however, it would seem that many other definitions and the application of them still seems artificial.

You can keep calling it artificial if you like, but the mathematics is well developed for it, and well verified over and over and over again.

For example, if you wanted to know the average age of every citizen of the United States... do you try to find every single person and ask them? Or do you ask 500 people? Or do you pull 1 state's DMV records? Or do you make a poll on Facebook?

Each of these are valid in their own way, and each will get different errors. But statistics can actually quantify those errors.

The application of mean in statistics is not the only definition I had problems with, it is as if my posts weren't read at all.

No, I've read them, and am spending significant time trying to understand your issues, and trying to help fix what I think are misconceptions. I wanted to start with the mean since it is a simple beginning point.

As I wrote earlier, let's start with the problems about mean before we tackle something more complex like variance.

And now I can be a smart-ass too and write "it is as if my posts weren't read at all."

-----------------------------

So, are you willing to do that, or should I just pack it in because you've made your mind up and you don't really want to discuss this, but are here to try to preach your personal gospel about what you think are the failings of statistics? I think I've shown I'm willing to have a discussion if you are. But you need to do your part of it, friend. You need to quit just calling things names, like 'pure' or 'artificial' and actually tell me what you mean by that. Or give examples of what you think is wrong, etc.

Edited April 1, 2014 by Bignose

AdvRoboticsE529 · April 1, 2014

The question is loaded, by finding the average age of every citizen of the USA, you would apply the method of average, of course.

However, what is your purpose? What relationship do you intend to look at? Artificial definitions and limits whereby the calculations conform to such.

Quantifying the errors of statistics with statistics does not seem to be practical.

studiot · April 1, 2014

specifically engineered so to suit the ideal world of whoever created such methods,

What is this if not a continued personal attack against professionals?

AdvRoboticsE529 · April 1, 2014

That is not an attack, stop bothering me.

@studiot

Also, you need to stop arguments from authority.

Bignose · April 1, 2014

However, what is your purpose? What relationship do you intend to look at? Artificial definitions and limits whereby the calculations conform to such.

Think of it in terms of the new health care act. Gov't needs to put a certain amount of money in the system to pay for all that health care. Older people require more health care.

So, as a first try, I want to know the average age of all the people so that I can put a certain amount of money aside for all people.

AdvRoboticsE529 · April 1, 2014

Interesting thought, although the income / wealth of specific households is quite important.

Statistical analysis will definitely be quite important in this case, hopefully, in future, better computing technology will assist in mass electronic identification to result in more precise calculations, this uncertainty bugs me.

Bignose · April 1, 2014

Interesting thought, although the income / wealth of specific households is quite important.

Statistical analysis will definitely be quite important in this case, hopefully, in future, better computing technology will assist in mass electronic identification to result in more precise calculations, this uncertainty bugs me.

Sure. It ought to. Because you are right that the details do matter. That what what I was driving at with "do you try to find every single person and ask them? Or do you ask 500 people? Or do you pull 1 state's DMV records? Or do you make a poll on Facebook?"

Each of those different ways of trying to answer the question have different assumptions build in, different biases, etc.

That doesn't mean that the answers each gets is wrong. It just means that each answer has its own context and meaning. That meaning does have to be understood by anyone who tried to use that number. And I agree that far too often it is not.

But that doesn't mean that the math itself it wrong. It means that the way the math was used it wrong.

Do you see what I am driving at? Or are you still stuck on the math itself being wrong somehow?

AdvRoboticsE529 · April 1, 2014

I don't I am able to get my point across, good talk anyways.

John Cuthber · April 1, 2014

.

Quantifying the errors of statistics with statistics does not seem to be practical.

seem to whom?

davidivad · April 1, 2014

No.

The use of "squared" is not an arbitrary choice, it follows from the properties of the distribution.

Do you actually know that maths behind statistics, or are you criticising it blindly?

is this no more than for the sake of argument?

poke them and get a point heh...

make a forward moving contribution to the argument please.

i like the idea of having someone on my side but they need to show civility.

this is a discussion not a sword fight.

Bignose · April 1, 2014

I don't I am able to get my point across, good talk anyways.

My suggestion is, then, to work on your message. Because just describing one case as "pure" and another as "ambiguous" doesn't really help. This is a science forum, and you need to provide evidence to back it up. That is part of what I kept asking you for examples of what was a "pure" calculation and what was an "ambiguous" one.

Examples will make it concrete of exactly what you're talking about and put us all on the same footing.

I invite you to come back and try again any time you want. I do want to help clear up this confusion for you, because I do think that to summarily dismiss all statistics is doing yourself a disservice. It is my opinion, that we generally don't do enough statistical work, because correctly calculated and interpreted statistics can be extremely meaningful and powerful descriptors. See, for example, the list of 6 questions I brought up quite some time ago. All of those are very important questions to answer correctly, and with judicious use of statistics, answers to those questions are very possible. But, I think a main point of yours -- that all too often statistics use is not very judicious -- does serve as a good reminder to us all.

Edited April 1, 2014 by Bignose

Cap'n Refsmmat · April 1, 2014

No.

The use of "squared" is not an arbitrary choice, it follows from the properties of the distribution.

Do you actually know that maths behind statistics, or are you criticising it blindly?

I'm not sure what your argument is here. I can write the normal distribution in terms of the precision (1/variance), the standard deviation, or any other weird quantity I'd like; the original normal distribution was differently parametrized than the one we have now. So while the sample variance is a good way of estimating the variance parameter, we could equally well calculate the sample precision or sample standard deviation.

But I also don't understand AdvRoboticsE529's point.

**CharonY** · April 1, 2014

I think the point is that someone needs to read up more on principles of statistics and stochastics.

AdvRoboticsE529 · April 2, 2014

Well, I mean I would've love the extent of proofs by statistics found in pure maths, such claims requires better proofs, the whole thing which started my distrust of statistics is the lack of proofs and proofs of which are only true given invented context, whereby when context is removed the proof will no longer apply.

Like I said, I don't get this problem when integrating the area of the circle, to give an example, there is no context needed,

Also, I wasn't able to get my point across on the ambiguous application of methods and defined definitions. Also, probability theory and such, unfortunately, does not see the rigorous testing as other concepts in math in which I would love to see, for the application of probability in gambling games can be true following the rules of probability theory, yet when applied will not hold, in this case I'm not talking about the application itself but the concept of probability itself. As repeated, am told that it is applicable in quantum physics.

I also had many problems with definitions, which I don't think we'll be able to discuss...

Anyways, the point of this thread is the validity of statistics, whether statistics is applicable or not (which in most cases, it is, and can be quite useful) does not mean it is validated, and statistics have on occasions failed even when applied by statisticians and agreed amongst statisticians, I gave an example in economics previously with works extensively in correlation.

Edited April 2, 2014 by AdvRoboticsE529

Bignose · April 2, 2014

Anyways, the point of this thread is the validity of statistics, whether statistics is applicable or not (which in most cases, it is, and can be quite useful) does not mean it is validated, and statistics have on occasions failed even when applied by statisticians and agreed amongst statisticians, I gave an example in economics previously with works extensively in correlation.

I think you'll find that every science has this same thing. That models in all branches of science have failed on occasion. Statistics is no different in that science learns from its mistakes and adapts.

For examples, a lot of times what people took as valid assumptions when calculating statistics weren't really valid. Again, it is not just statistics where this is a concern.

Bridges collapse. Cars catch on fire. People die from drug interactions. Do these invalidate statics, automotive engineering, and pharmacology? Because I feel you're doing the exact same thing to statistics. Sure, there are times when the predictions is makes don't come to bear. But that is every single field of science. Hell, one could make a decent argument that statistics is the most upfront science about this, because by its very nature it fully admits that its predictions will not be 100%.

I do think you owe it to yourself to study probability theory more in depth. There are many, many proofs in this branch of mathematics, and to claim otherwise just shows an ignorance of the field. Similarly with your comment about how it is non-rigorous. You are claiming something about the field that just isn't true. It may be ambiguous to you, but that is reflective of your level of knowledge on the subject, not the subject itself. And really, that's some of why this forum is here -- to help people exchange knowledge. I do hope you'll take some time to review what I an the many others have posted in this thread to demonstrate that before you just declare it non-valid, that you ought to learn more about it. Better yet, if you have questions, ask. We'll do our best to answer them for you.

But ultimately, I'm sorry, but before I'm going to take your or anyone else's word that statistics isn't valid, you're going to have to provide extraordinary evidence to support this extraordinary claim. Because I've seen a great deal of evidence that statistics is valid. Your word alone isn't enough. This is a science forum. Statistics has proven itself very valid, very non-ambiguous, and very, very useful to a great deal of people. If you wish to change that, start providing evidence. As I wrote above, you're simply declaring it so doesn't make it so. My declaring that the sky is polka dotted and my car runs on unicorn wishes doesn't make that so, either. This thread is nearly 100 posts long now. It's put up or shut up time. If you wish to continue to declare statistics 'invalid', you had best start providing evidence of it.

AdvRoboticsE529 · April 2, 2014

You defend statistics by contrasting it to the faults of the failures in in science, the scientific example is not necessarily the best method for confirmation.

And, you took the argument of authority by claiming I lack knowledge and am ignorant in the field.

Other comments aside, I never made any extraordinary claim, it is statistics that made the claim of deduction / induction (specifically) from uncertainty, yet lack proofs that holds to be true without context or definitions, they are only as true as the definitions and theories of which conform to them.

I do not need to provide evidence, it is time for statistics and statisticians to live up to their own word, my study in economics is consistent in uncertainty in that statistics fails to address true relationships or fails to determine the true variables.

Again, I will concede that I do not know of the application of probability in quantum mechanics, however, I will not concede that statistics is as true as pure maths or certain branches of maths of which proofs does not require context to be true, and addresses the true relationship.

Application in reality does not validate the method, we are going in circles.

Further, its application in reality does not always holds true, and in certain cases are extremely accurate even if correct on paper.

The best example is probably the paper: "On Default Correlation: A Copula Function Approach."

Whereby its application in certain areas are accurately enough, yet is quite inaccurate and is claimed that in the application of the statistical method, it has resulted in a market crash. Before the crash, the method was validated by statisticians in the context and definitions of statistics.

It is those complexity which makes me think that statistics is a field to be respected, I used to think that even if initial definitions and unproven mathematical theories (to repeat, only proven by their own context, and to be untrue when context is removed) may be ambiguous at least it resulted in a field of great deduction from induction, I don't think so.

I don't think we'll be able to discuss, because you are quite confident in your view as am I, and I don't want to go in circles.

Unity+ · April 2, 2014

You defend statistics by contrasting it to the faults of the failures in in science, the scientific example is not necessarily the best method for confirmation.

And, you took the argument of authority by claiming I lack knowledge and am ignorant in the field.

Other comments aside, I never made any extraordinary claim, it is statistics that made the claim of deduction / induction (specifically) from uncertainty, yet lack proofs that holds to be true without context or definitions, they are only as true as the definitions and theories of which conform to them.

I do not need to provide evidence, it is time for statistics and statisticians to live up to their own word, my study in economics is consistent in uncertainty in that statistics fails to address true relationships or fails to determine the true variables.

Again, I will concede that I do not know of the application of probability in quantum mechanics, however, I will not concede that statistics is as true as pure maths or certain branches of maths of which proofs does not require context to be true, and addresses the true relationship.

Application in reality does not validate the method, we are going in circles.

I don't know if you have before, but can you point out the supposed flaws in statistics? I think statistics provides a great and accurate way to predict with probabilities of events.

If you learn about bell curves, error bounds, and confidence levels you will be thankful for statistics because it allows one to say "Because nature is so unpredictable with its values, let us provide a range in which the value can lie in so we can classify it as X instead of Y"

Without statistics, we would be dealing with uncertainties.

I do not need to provide evidence, it is time for statistics and statisticians to live up to their own word, my study in economics is consistent in uncertainty in that statistics fails to address true relationships or fails to determine the true variables.

In my book, that statement is an enemy to mathematics. You MUST always provide evidence if you claim something to be true. You cannot simply say 0 = 1 and not present any proof.

Statistics accounts for economy because it accounts for uncertainties(or at least most). In bell-curve analysis, there is a huge ability to classify data in probabilities using the 65%, 95%, and 99,7% rule.

Other comments aside, I never made any extraordinary claim, it is statistics that made the claim of deduction / induction (specifically) from uncertainty, yet lack proofs that holds to be true without context or definitions, they are only as true as the definitions and theories of which conform to them.

That may be because a majority of statistics contains axioms rather than theorems because of the nature of probability and statistics. You do realize in mathematics that axioms do not have proofs because they are simply seen as being correct without proof.

You defend statistics by contrasting it to the faults of the failures in in science, the scientific example is not necessarily the best method for confirmation.

Technically, it is the best method and proven throughout history.

One important use of statistics is for particle physics. At CERN, when looking for particles, they look for ranges rather for precise values because of course the values they get will not fit the particle exactly. Therefore, they must set a range in order to determine of the test is successful or not.

Edited April 2, 2014 by Unity+

AdvRoboticsE529 · April 2, 2014

Yes, I talked about some of the definitions before which fails to addresses the true relationship, they're artificial.

Exactly, you should not say that 0 = 1, as the fact of probability theory should not be claimed if insufficient evidence or true pure mathematical proofs are unavailable, this is the exact case, I am not making the claim, statistics is where the claims are made.

Also, the "nature" of statistics and probability included may not validate it as a method.

The scientific method fails to confirm if there is a lack of technology, one example is the constant modification of the atomic model, lack of technology results in uncertainty, long-term it will amend itself, short-term it can be uncertain.

John Cuthber · April 2, 2014

I'm not sure what your argument is here. I can write the normal distribution in terms of the precision (1/variance), the standard deviation, or any other weird quantity I'd like; the original normal distribution was differently parametrized than the one we have now. So while the sample variance is a good way of estimating the variance parameter, we could equally well calculate the sample precision or sample standard deviation.

But I also don't understand AdvRoboticsE529's point.

The point I' making addresses his assertion

" In the definition of variance, I have made this statement before, a component is the summation of the difference of 'x' squared values minus the mean, depending on the person I have spoken with before, the set of reasons are different, most common answers is that it inflates the values, ensures positive value"

The SD is the root mean square deviation from the mean.

It seems he's been told that the use of the square of the difference between the mean and the value is that "is that it inflates the values, ensures positive value" which sounds arbitrary.

If you just want positive numbers you could take the absolute value of the difference and if you wanted to exaggerate it you could take the 4th power.

My point is that the choice of averaging ( x- the mean value of x) is not arbitrary- it's the one that actually gives the right answer (for the normal distribution).

He seems to have been misinformed and thinks that stats is a set of choices and you pick the definitions arbitrarily.

To some extent that's true.

But they have to be chosen in such a way as to be consistent.

He seems not to realise that.

Sign In

Validity of Statistics

Recommended Posts

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Cap'n Refsmmat

Bignose

Bignose

Create an account or sign in to comment

Create an account

Sign in

Important Information