Validity of Statistics

Arete · April 2, 2014

At the moment, I'm working on a genomic dataset, where we have approximately 1.5 billion nucleotides of data. The sequencer has an approximately 0.5 - 1% error rate in nucleotide calling. This means, that when I align this 1.5 billion base pair dataset to a reference genome, approximately 7.5 to 15 million mutations will actually be false positives due base calling error.

Therefore we calculate summary statistics, like read coverage depth and quality score and use them to determine the likelihood that a mutation is genuine, and not a false positive - and thus reduce the number of false positives in the final dataset. However, because of the very existence of sequencing error, there's always the possibility that a mutation will be a false positive due to sequencing error - and when you're talking about a dataset with millions/billions of individual data points, it's almost a certainty that some false positives slip through.

As such, with this type of data, we will never have certainty, and statistical summation and analysis is integral to it being useful. By the OP's logic it would seems that the answer is to abandon the entire field as lacking absolute certainty - which I would strongly disagree is a reasonable position, given the practical outcomes and discoveries which genomics has led to.

Bignose · April 2, 2014

You defend statistics by contrasting it to the faults of the failures in in science, the scientific example is not necessarily the best method for confirmation.

I disagree. If you don't think so, then propose a better one. I'm going to stick with the tried-and-true scientific method.

And, you took the argument of authority by claiming I lack knowledge and am ignorant in the field.

I disagree. My claim is not an appeal to authority because I did not tell you that statistics is right because Einstein, Newton, Hawking, or the Tooth Fairy says it is right. I am saying that you are demonstrating a great deal of ignorance on the subject by continuing to claim that statistics doesn't have rigorous proofs. If anything, the logical fallacy here is your appeal to ignorance, because you are trying to tell us that because YOU don't understand the proofs in probability theory, those proofs must clear be wrong (or not 'pure' or 'rigorous' or whatever other adjective you try to put here). It is no longer an appeal to authority if an entire community has demonstrated the usefulness of the work.

Other comments aside, I never made any extraordinary claim, it is statistics that made the claim of deduction / induction (specifically) from uncertainty, yet lack proofs that holds to be true without context or definitions, they are only as true as the definitions and theories of which conform to them.

I have already addressed this supposed "lack of proofs" above. And it IS your extraordinary claim that statistics isn't valid. When it has been demonstrated valid many, many, many times over. It is on you to show this.

I do not need to provide evidence, it is time for statistics and statisticians to live up to their own word, my study in economics is consistent in uncertainty in that statistics fails to address true relationships or fails to determine the true variables.

Statistician live up to their word every single day. It is you that is not living up to your word of invalidity.

Again, I will concede that I do not know of the application of probability in quantum mechanics, however, I will not concede that statistics is as true as pure maths or certain branches of maths of which proofs does not require context to be true, and addresses the true relationship.

Fortunately, whether statistics is useful or not doesn't rely on your concession. If you chose not to seek greater understanding that is your problem. But this claim of invalidity needs to be dropped without extraordinary evidence.

I also want to ask: what proofs in math don't require 'context'? You mentioned the calculation of the area of a circle earlier. That requires context, too, you know. I.e. the context of plane geometry. This claim that statistics is lessor because it requires context demonstrates an ignorance of all mathematics in my mind.

Application in reality does not validate the method, we are going in circles.

You're on a science forum. Application to reality is THE primary metric of the success of science.

Furthermore, if you think it doesn't validate it, then propose some metric that would. It is easy to lob hand grenades at something to tear it down. But unless you are willing to replace it with something, all it is is spiteful destruction. You want to know the best thing? If you can actually propose something even better, science will gladly embrace it. BECAUSE SCIENCE ALWAYS GOES TOWARD THE IDEA WITH THE BEST APPLICATION TO REALITY.

So, if you hate statistics so much, what should we replace it with? You need to demonstrate that your replacement will be just as successful or more than statistics, though. And that will be a hard climb. Because whether you like it or not, statistics has proven supremely valuable and accurate.

Even if you think the definitions aren't 'real' or 'representative of reality', the fact that it works still means a great, great deal. A good example is entropy. Entropy isn't 'real' in that you can't measure it directly with an entropy-meter like you can measure temperature with a thermometer. And it has was initially appears to be a very goofy definition. But, it has proven its usefulness over and over. Just like the values calculated in statistics. You may not like entropy or variance, but their usefulness has been demonstrated time and time and time again.

Further, its application in reality does not always holds true, and in certain cases are extremely accurate even if correct on paper.

Again, this is an interpretation issue. Statistics itself will actually even tell you this. It can even tell you how likely its predictions are to be wrong. Pretty amazing, isn't it?

The best example is probably the paper: "On Default Correlation: A Copula Function Approach."

Whereby its application in certain areas are accurately enough, yet is quite inaccurate and is claimed that in the application of the statistical method, it has resulted in a market crash. Before the crash, the method was validated by statisticians in the context and definitions of statistics.

I have not read the paper. It would be nice if you cited it completely (i.e. who the authors are, what journal it came in, what year, etc.) so that others can look it up.

Nonetheless, I think some questions about this are: was the statistics wrong, or the interpretation wrong? Was there a wrong assumption? Are you sure it was statistics that was wrong, or was the economic model wrong? Does the community agree that it was 'statistics' that was wrong or some misinterpretation thereof? Or is it just you?

I have invited you several times now to discuss specific issues. If you want to present more of the background of this paper and discuss it, I am willing to do that. But just throwing out a single paper, with no greater framework around it, doesn't demonstrate anything.

Nor does the market crash demonstrate that statistics was wrong. Again, if anything, statistics in the broad sense would have demonstrated that market crashes will always happen. If this is the best evidence you can dig up of statistics being invalid, I'm not sure you're going to get much traction.

It is those complexity which makes me think that statistics is a field to be respected, I used to think that even if initial definitions and unproven mathematical theories (to repeat, only proven by their own context, and to be untrue when context is removed) may be ambiguous at least it resulted in a field of great deduction from induction, I don't think so.

Again, I object strongly to the use of the word 'unproven' here. Just because you don't understand the proofs, doesn't mean they are unproven.

I don't think we'll be able to discuss, because you are quite confident in your view as am I, and I don't want to go in circles.

Then why did you come to a discussion forum?!?

It's actually a bigger question about this whole thread. Namely, what was the point?

If you are here to fill some of your gaps about statistics, then why the hostility to what I and many other have been saying?

If you are here to demonstrate to us that statistics aren't 'valid', then as I wrote above, it's put up or shut up time. Start bringing evidence.

If you are here just to preach from your own gospel of your personal views against statistics, then you are in violation of the rules of this science forum, and the mods ought to close the thread.

If something else, please enlighten us. Because the discussion part of this thread has been very lacking on your side. You keep just saying things are 'invalid' or 'not pure' or 'ambiguous' but refuse to take any time to understand when someone tries to give you a different point of view. Or define the terms or give examples when asked. Are you being deliberately obtuse? Or are you actually here to try to understand why I and many, many other people have no problem with accepting its validity?

Statistics isn't just something I accept because I was taught it in school, or read it in a book. I accept it because I use statistics every single day, and by golly it works! That's the highest praise for a scientific idea. And, again, when models don't work perfectly, that doesn't mean you just completely toss them out. You refine those models, and see if you can't improve their predictions.

I, for one, am glad that all of statics wasn't abandoned the first time a bridge collapsed.

I, for one, am glad that all of pharmacology wasn't abandoned the first time a drug didn't work.

I am not saying that statistics and its interpretation is perfect -- I think I've been very clear on this all thread. But, I am also not willing to just nuke the entire thing just because it hasn't worked perfectly every single time. Again, name a single science anywhere at any time that has lived up to that standard.

So, in conclusion, if you have an improvement to statistics to make it better in your mind, let's see it. Otherwise, I don't see what the point in continuing this thread is just so you can continue to baselessly claim that statistics doesn't have proofs and rigor or any other of your wishy-washy ill-defined adjectives. This forum doesn't exist so you can spout your personal gripes. You can start your own webpage for that. This forum exists for us to all learn from one another. I have tried to learn why you have a problem with statistics, though you haven't been very forthcoming and clear, so I don't think I've helped much if any. But it becoming rather clear that you aren't here to learn from us, either.

----------------------------------

Added: In response to the paper above, here is a good quote:

Chris Rogers from the University of Cambridge, sums up the view of many academics working in financial maths: “the problem is not that mathematics was used by the banking industry, the problem was that it was abused by the banking industry”

from here: http://www.actuaries.org/ASTIN/Colloquia/Helsinki/Presentations/Embrechts.pdf

Which is what I've been saying from the very beginning. Everything you've brought up is a problem with the use of the statistics, not statistics itself. Yet, you are trying to throw the mathematics under the bus. I don't see why.

studiot · April 2, 2014

+1 to Bignose for have far more patience than I do and for putting many of my thoughts more politely than I could.

There is one further point, howevr, that I mentioned in post36, that has not received sufficient attention.

All the discussion about statistics so far in this thread has been about analysis.

It is far, far easier to find formula to analyse something that is already there (or given).

A rocket of mass, m ; a cylinder of gas of volume, V and so on.

than it is to put something specific in place. This process is called synthesis and statistics has greatly helped improve Man's ingenuity in the real world to synthesise.

For instance if you have a valley, there is no formula that says bridge = XXXX or roadway embankment = that

The design of such a bridge or embankment is part of the process of sythesis and statistics plays a vital part in the modern world of design.

Again when someone goes to construct this embankment, there is nothing there to start with.

Just uneven, sloping ground.

He knows where the top of the embankment is to end up

But he has to start at the bottom and build up.

So what does he say to the first lorry driver who says here is your first load of fill material, where shall I dump it?

In other words where is the edge of the base of the embankment?

Incidentally statistics will not help answer this question, any more than deterministic geometry or mechanics.

In school most questions are about analysis and usually deterministic.

It can be a great shock on entering the real world where most activity is about synthesis and neither deterministic or even capable of being expressed by formula.

Unity+ · April 2, 2014

+1 to Bignose for have far more patience than I do and for putting many of my thoughts more politely than I could.

There is one further point, howevr, that I mentioned in post36, that has not received sufficient attention.

All the discussion about statistics so far in this thread has been about analysis.

It is far, far easier to find formula to analyse something that is already there (or given).

A rocket of mass, m ; a cylinder of gas of volume, V and so on.

than it is to put something specific in place. This process is called synthesis and statistics has greatly helped improve Man's ingenuity in the real world to synthesise.

For instance if you have a valley, there is no formula that says bridge = XXXX or roadway embankment = that

The design of such a bridge or embankment is part of the process of sythesis and statistics plays a vital part in the modern world of design.

Again when someone goes to construct this embankment, there is nothing there to start with.

Just uneven, sloping ground.

He knows where the top of the embankment is to end up

But he has to start at the bottom and build up.

So what does he say to the first lorry driver who says here is your first load of fill material, where shall I dump it?

In other words where is the edge of the base of the embankment?

Incidentally statistics will not help answer this question, any more than deterministic geometry or mechanics.

In school most questions are about analysis and usually deterministic.

It can be a great shock on entering the real world where most activity is about synthesis and neither deterministic or even capable of being expressed by formula.

To be honest, I don't know why this thread is still in the Mathematics section. It should be moved to speculation.

EDIT: And if not that, it should be deleted.

Edited April 2, 2014 by Unity+

John Cuthber · April 2, 2014

Quantifying the errors of statistics with statistics does not seem to be practical.

I'm still waiting for an answer to my earlier question.

To whom does that seem impractical?

Because people do it a lot.

Cap'n Refsmmat · April 2, 2014

I'm still waiting for an answer to my earlier question.

To whom does that seem impractical?

Because people do it a lot.

Indeed. A great deal of theoretical statistics is about quantifying the errors of different methods and determining which ones perform the best under different circumstances.

There's been a lot of work on robustness, for instance, where estimators are checked to see how well they behave when their assumptions are not met.

studiot · April 2, 2014

Indeed. A great deal of theoretical statistics is about quantifying the errors of different methods and determining which ones perform the best under different circumstances.

There's been a lot of work on robustness, for instance, where estimators are checked to see how well they behave when their assumptions are not met.

It is interesting how this relates to the safety factors I was talking about.

John Cuthber · April 2, 2014

It will be interesting iff the poster concerned actually answers.

I'm not holding my breath.

davidivad · April 2, 2014

does anyone need a stick because i have one here.

but lord, they may mistake us for christians.

Genecks · April 3, 2014

OK so smileys over we can return to rational discussion of the topic.

Robowhatsit, there are many uses (and abuses) of statistics.

You seem to have embraced the so called 'clockwork universe' in your thinking. If you have not heard of this you should look it up, it was a milestone in scientific development several hundred years ago.

In its day it was a fine concept, but it has been surpassed in the theoretical world.
...

But it also has been brought into the design process so that now it is fully integrated.

You may, or may not, have heard the phrase 'Limit State Design'.

It is the philosophy that underlies modern design codes and actual practices.

...

I will grant you something that no designer has ever possessed or is ever likeley to possess.

A perfectly accurate model at the time of design.

How are your calculations coming along?

In relation to the last three lines you write, perhaps it would be best to argue that entropy has prevented any possibility of understanding reality due to the increase of chaos or complexity from time "zero." However, the clockwork universe is simply another way of saying "growing block universe." And feel free to present your arguments and evidence to refute the block universe, which Albert Einstein believed. At the least, I assume you're making an age fallacy in relation to the clockwork universe. I only mention Einstein, because he was born after the development of the "clockwork universe" philosophy. So, maybe you're saying special relativity defeats the idea of the clockwork universe or growing block universe and is better that we're in a block universe (everything already is) rather than a growing block universe (determinism; things are flowing from some beginning point in time).

And as the original poster has discussed, there are different levels of math where things break down. And I keep coming to see the problem of "the measurement problem." These issues with statistics and philosophy are starting to cause me to question Benjamin Libet's free will experiments due to the measurement problem. I don't know why no one ever taught me about "the measurement problem" except the Heisenberg uncertainty principle (which was a riddle; and I believed I once had a solution; but then I got distracted). Perhaps if children learned about it, they'd all write off modern science, get high on drugs, and tune out. Yes, there may indeed be some level of professionalism to all of it, to not project a block universe view, and argue for indeterminism as if to gather money from those willing to believe in the philosophy of indeterminism.

Much of me is starting to feel that statistics is a "social invention." It was a way of discussing one's observation of a particular phenomena. But more foundational, it looks like statistics and various mathematics are social: They're meant to describe individual experience to others. Statistics has utility if we have free will, there is some aspect of our universe that is indeterministic, and that by having free will, we can take advantage of those statistics to change the value of our "lives." Otherwise, if it's the block universe and everything is fixed, then statistics are little less than noise. Statistics has allegedly been around for a long time. However, it developed more academically and intellectually in the 1800s. So, we've started to discover if not invent statistics.

You build a bridge and statistics are involved: The bridge is a social invention. The drugs are social inventions. And even if you build a bridge for yourself, perhaps as some hobby, it would appear to be individually relevant. However, all presented statistics are individually relevant; but they can be used to discuss something in society. Historically, statistics have been used to discuss things in the environment: Gambling.

No, I assume I understand what the original poster is going about. It's a philosophical issue rather than a scientific one, and perhaps that would have been the best reply while either explicitly or intentionally mentioning that it goes back to the philosophy of science. It's a philosophical issue, thread creator.

Also, since I'm typing about statistics, and not to de-rail the thread, it bothers me that the universe is measured to be flat with something like a -/+ 0.5% error. Like, wtf? That a triangle exists means it exists with 180-degrees -/+ 0.5%*180-degrees???? That seems to imply indeterminism is true.

I posted a thread about the solomon curve in the past week. But deriving the theory "Driving closer to the speed of other drivers in my lane will reduce the likelihood of getting into a car accident," does not necessarily mean that I will not get into an accident. Furthermore, being able to derive a theory does not mean that the theory is correct. And it does not mean that there is such thing as "likelihood."

All of it is mind-boggling, indeed.

I think if there is something that the thread creator should walk away with, it's this thought: "It's just theory."

You do know that the binomial distribution holds whatever the probability of getting heads is, don't you?

It assumes that there is a probability and that the probability is constant with time- that's all.

After that, it's just maths and , as such, objective.

Perhaps it would be better if you said what problem you are actually trying to solve. otherwise I think we will just go round in circles,

In theory.

Isn't in the "real world" that heads is more likely to show up on a U.S. coin?

And in the real world, we're more likely to have males, despite it being alleged that there is a 50% chance of having either a boy or girl as a child, independent of how many times a person has had a child?

- The y-chromosome weighs less

In theory, a six-sided die will potentially give any of the sides at an equal probability in accordance with the other sides...

But then again, it depends on initial conditions of the die...

And as Einstein argued in the past, "God does not play dice."

http://physicsworld.com/cws/article/multimedia/2013/mar/04/why-did-einstein-say-god-doesnt-play-dice

It could be that we're the dice: Hence, the measurement problem.

Again, as I've stated, this is all a philosophical issue.

I will grant you something that no designer has ever possessed or is ever likeley to possess.

A perfectly accurate model at the time of design.

That is, unfortunately, saying that we've never truly repeated any experiment ever done in science, because we lacked the initial conditions.

Edited April 3, 2014 by Genecks

studiot · April 3, 2014

Genecks, thank you for taking the interest in some of my posts in this long thread.

They were all intended to promote discussion, not stifle it.

As regards your comments:

That is, unfortunately, saying that we've never truly repeated any experiment ever done in science, because we lacked the initial conditions.

Good point.

In relation to the last three lines you write, perhaps it would be best to argue that entropy has prevented any possibility of understanding reality due to the increase of chaos or complexity from time "zero." However, the clockwork universe is simply another way of saying "growing block universe."

I am at a loss on this one since your reply was not about the last three lines I wrote. These you also quoted elsewhere.

So there may have been some mix up somewhere.

I will deal first with the material you appear to have replied to, namely the clockwork universe.

I do not agree that this is the same as the block universe, particularly as described by Einstein.

The essence of the clockwork universe is cause and effect and predictability (= determinism). That is a given cause will always give rise to a specific, calculable and predictable effect. So given enough information a superbeing could, in principle, predict the future course of the entire universe. This view requires that time flows from past to future.

Einstein's view of the block unverse was that time does not flow from past to future in his words "it just is". That is the essence of the block universe.

But this is a digression from statistics.

Back to my last three lines. these were about the same subject expanded on in my post103.

Namely the difference between analysis and synthesis.

In particular they were about the use of statistics in the synthesis process.

Finally the OP philosophy.

So far as I can see this may be summed up as.

The OP rejects statistics and seeks a deterministic world.

That part is fair enough and makes for a good discussion.

However the OP goes further and accuses the professionals of a world wide conspiracy to suppress objections to statistics.

IHMO This part is both seriously misguided and insulting to professionals who spend their daily toil dealing with the fact that they know all too well they do not have (and never can have) all the information. This situation is most evident in the synthesis process.

Genecks · April 3, 2014

Well, that the original poster wants to accuse professionals of a conspiracy, then original poster would need to prove intent. However, I think that most professional, at least scientific professionals, are skeptics. And I believe many if not most are skeptics by nature. So, even though scientific professionals are using statistics, there may be a level of doublethink involved from not holding absolute skepticism while allowing for the possibility that a particular philosophy of time, if not philosophy of space and time, is possible. As an aside, proving intent may come with some kind of free will argument and acceptance of free will, unless the OP is arguing for legal compatibalism. I don't think it's really double think. I think it's skepticism with hope, and the hope is that something will pan out to lead to a greater truth. Otherwise, yeah, there is the problem of induction: And everything we know could indeed be a lie. https://en.wikipedia.org/wiki/Leap_of_faith

What has bothered me about the validity of statistics, if we were to take into account "free will" or choice is how to account for "free will" as a variable in the experiment? It seems like a confounding variable if it exists, thus preventing any possibilty for reproduction of an experiment. But even that could be said for a superdeterministic universe. Statistics has some utility from an indeterminist perspective. No, it's not perfect, exact, or absolute. At least, if it's not a social invention for people to act smart and talk as though to increase Darwinian fitness, then there is some utility and greater theory that can lead to a better understanding of nature and the universe. With that, from my scientific perspective, that means there is a better method out there.

Edited April 3, 2014 by Genecks

nellythedog · May 2, 2014

Laplace once said that

The theory of chance consists in reducing all the events of the same kind to a certain number of cases equally possible, that is to say, to such as we may be equally undecided about in regard to their existence, and in determining the number of cases favorable to the event whose probability is sought. The ratio of this number to that of all the cases possible is the measure of this probability, which is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible.

Before going further, it is important to pin down the sense of the words chance and probability. We look upon a thing as the effect of chance when we see nothing regular in it, nothing that manifests design, and when furthermore we are ignorant of the causes that brought it about. Thus, chance has no reality in itself. It is nothing but a term for expressing our ignorance of the way in which the various aspects of a phenomenon are interconnected and related to the rest of nature.

Does it make sense?

Edited May 2, 2014 by nellythedog

Sign In

Validity of Statistics

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Cap'n Refsmmat

Bignose

Bignose

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Important Information