GnothiSeauton Posted August 5, 2017 Posted August 5, 2017 Hi all I have a question concerning detecting outliers in small samples. My results are: 14.1, 4.1, 8.1, 9.1, 8.2, 8.6 - i suspect outliers are 14.1 and 4.1 Which statistical test should i use to analyze this data? I thought about Extreme studentized deviate test but it turned out to be used for big sample sizes. Thank you so much in advance ^^
Cap'n Refsmmat Posted August 5, 2017 Posted August 5, 2017 Why specifically do you want to detect outliers, and what are you going to do with the data once you identify outliers? There's no one hard-and-fast test that identify outliers, because there's no one reason for something to be an outlier. Do you have reason to believe the outlying results are somehow erroneous, or are they as valid as the others?
Klaynos Posted August 6, 2017 Posted August 6, 2017 What are the error characteristics of your measurement system? You have 6 data points. Are they all supposes to be a repeat measurement of the others? With the information you've posted and only 6 data points it's impossible to give any good advice. (with the possible exception of repeat the experiment some more) 2
studiot Posted August 6, 2017 Posted August 6, 2017 55 minutes ago, Klaynos said: What are the error characteristics of your measurement system? You have 6 data points. Are they all supposes to be a repeat measurement of the others? With the information you've posted and only 6 data points it's impossible to give any good advice. (with the possible exception of repeat the experiment some more) A first class answer. +1 I would add that your description needs to identify these data points properly so the appropriate modelling distribution and sidedness can be chosen.
GnothiSeauton Posted August 6, 2017 Author Posted August 6, 2017 In my experiment I was observing much product was formed in enzymatic reaction at different time rates. Experiment was done in 6 parallels. Reaction lasted 2 minutes - my results are 14.1, 4.1, 8.1, 9.1, 8.2, 8.6 [g/L] Reaction lasted 10 minutes - my results are 8, 14.9, 14, 17, 14.4, 15 [g/L] Reaction lasted 30 minutes - my results are 22, 20, 33, 16, 21.8, 23 [g/L] I need to construct graph where on y axis is concentration, a on x axis is time.
studiot Posted August 6, 2017 Posted August 6, 2017 (edited) That's better, but I would start by plotting the course of 6 individual trials of the reaction. That is rearrange your data properly so it doesn't look as if you have mixed up the reading from flask 1 with flask 3 etc. I say this because the in first entry 2 minute row is 14.1 Yet the first entry in the 10 minute row is 8 suggesting the reaction went backwards. Edited August 6, 2017 by studiot 1
Cap'n Refsmmat Posted August 6, 2017 Posted August 6, 2017 I'm a big fan of boxplots for this type of data, although you have small enough samples that you can just plot each point directly. What kind of analysis do you intend to do once you've removed the outliers? Are you trying to test for significant differences, or model the data somehow? I ask because, to me as a statistician, there isn't really such a thing as an "outlier". Unless some data points arise from a mistake in the experiment, they're all real measurements from the true distribution of possible outcomes. Removing some data can be valid for analysis, but it depends on what your goals are. I'd be able to give better advice if I knew what you were trying to achieve. 1
Klaynos Posted August 6, 2017 Posted August 6, 2017 6 hours ago, Cap'n Refsmmat said: I'm a big fan of boxplots for this type of data, although you have small enough samples that you can just plot each point directly. Are you familiar with violin plots? https://en.m.wikipedia.org/wiki/Violin_plot http://www.sthda.com/english/wiki/ggplot2-violin-plot-quick-start-guide-r-software-and-data-visualization I can't add anything else to the last two replies to help.
Cap'n Refsmmat Posted August 6, 2017 Posted August 6, 2017 49 minutes ago, Klaynos said: Are you familiar with violin plots? I am, but I'm not a big fan; they can be hard to read when you've got a bunch lined up, whereas you can still interpret a bunch of boxplots together.
studiot Posted August 6, 2017 Posted August 6, 2017 (edited) Well this looks like a typical rate of reaction determination to me, but I'm sorry to say rather sloppily recorded. In the first place we have 24 not 18 data points since at time zero there should be no product in each of the 6 reaction flasks. ie all the curves must pass through the origin. Secondly the results are stated as quantities of product, but are recorded as concentrations. Thirdly, as I have already pointed out, the readings are tabulated in an odd, seemingly impossible, manner. If this last issue were sorted out so each reading could be properly attributed to one or other flask of reactants, then I'm sure the curves (they look like a power law to me) would appear more sensible. We could then propose a rate law and deduce the deviations of each trial from this for statistical analysis. Since this is about chemical calculations which are rather specialised, perhaps this thread should be drawn to the attention of our chemistry experts. Edited August 6, 2017 by studiot
GnothiSeauton Posted August 6, 2017 Author Posted August 6, 2017 Thank you all so much for your replies In ideal situation concentrations of all 6 parallels at each time should be roughly the same (eg. after 2 minutes concentrations in all 6 flasks should be around 8 g/L; after 10 minutes concentrations should be around 16 g/L; after 30 minutes around 20 g/l). and my results scatter quite a bit so I can't deduce a lot of things from that. It is quite possible that I did some of these parallels sloppily or that some of my enzymes denatured. I think the best thing should then be to repeat the experiment with more parallels and more time rates.
Klaynos Posted August 7, 2017 Posted August 7, 2017 If I were you I'd plot each parallel individually as lines between points. All on the same plot. Probably also on the same plot I'd include a box plot for each time. I might do this on the same plot but that might be too messy. Those are just the first steps. Next would depend on how they look. Caveat here is that I'm a physicist rather than a statistician or chemist.
studiot Posted August 7, 2017 Posted August 7, 2017 9 hours ago, GnothiSeauton said: Thank you all so much for your replies In ideal situation concentrations of all 6 parallels at each time should be roughly the same (eg. after 2 minutes concentrations in all 6 flasks should be around 8 g/L; after 10 minutes concentrations should be around 16 g/L; after 30 minutes around 20 g/l). and my results scatter quite a bit so I can't deduce a lot of things from that. It is quite possible that I did some of these parallels sloppily or that some of my enzymes denatured. I think the best thing should then be to repeat the experiment with more parallels and more time rates. Before rushing off to repeat the experiment (and perhaps the mistakes) you should consider the method very carefully .Firstly the mechanics of doing the trials. Are the trials carried out in six different flasks at the same time or is a trial repeated in one flask six times? Are the flasks clean? Especially if you repeat in the same flask which risks cross contamination. How are you measuring product concentration? How long does it take to make a concentration determination? You are recording instantaneous concentrations. How would a 15 second error in timing make to the 2min, 10min and 20min marks? The recorded concentration is only valid if the reaction mixture is homogeneous. Is it stirred? or how else do you ensure this? How temperature sensitive is this reaction? How much heat is evolved? Are you monitoring to see if all the trials have the same conditions? How are you noting down the results? I presume that each column in your table of results is meant to represent a single trial. If so the result 14,8,22 indicates some sort of recording error. If you can't sort this out looking back then this trial needs to be discarded - it is worse than an outlier. Secondly the reaction itself You say it is a catalysed reaction. Is it autocatalysed or are you adding a catalyst? Assume the reaction is [A] + = [C] Is either [A] or very large compared to the other so effectively constant? How about [C] ? is this always small or does the reaction approach completion? What reaction rate equation are you assuming to give the figures you have stated - 8, 16, 20 g/L 1
studiot Posted August 7, 2017 Posted August 7, 2017 (edited) My apologies the last post was a victim of forum timeouts. The equations should have been Chemical equation A + B = C since you mention only one product I assume it is not a dissociation reaction. With rate equation Edited August 7, 2017 by studiot
studiot Posted August 7, 2017 Posted August 7, 2017 .Damn stupid editor changed my typing d/dt [C] = k [A]ab
studiot Posted August 7, 2017 Posted August 7, 2017 (edited) d/dt [C] = k [A]ab Edit The text at the top is how it comes out The screenshot extract below is how it should look and how it does look before posting Edited August 7, 2017 by studiot
GnothiSeauton Posted August 7, 2017 Author Posted August 7, 2017 I will now reconsider each step of both experiment and further analysis carefully. Thank you allso much for your help and time
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now