Jump to content

Recommended Posts

Posted

I am about working on the Final effluent of a Waste water treatment plant. I will be collecting samples for a period of 12 months; 1 (one) sample per month. Sample collections will be split between autumn, winter, summer and spring. The physiochemical parameters like Temperature, pH, Electrical conductivity (EC), Turbidity, total dissolved oxygen (TDS) and dissolved oxygen (DO) will be taken along with the microbiological components like bacteria (Feacal coliform , E.coli and Vibrio) and viruses (Enteri virus, Rotavirus, Norovirus and Adenovirus)will also be looked into. The whole data collected is to access the quality of the final effluent wastewater and what interaction exist between the physiochemical components and microbiological components

 

I want to know which statistic analysis method best analyse the data collected and why?

 

If more than one statistic method can be use, please can all responders be simple and clear about their explanation on their choice of methods. Advices and suggestions will be highly appreciated. Thank you

Posted

First of all I agree that once a month increases the variability of the data set and makes detailed comparisons very complicated. To assess what kind of analyses may be useful you need to add the single most important bit to your post. What do you actually want to see? What are the parameters? Do you want to find associations between parameters (e.g. temperature and bacterial content)? Or seasonal changes, or...? The more parameter you collect the more likely it is that you find random associations that will become significant with specific tests.

 

Thus, instead of a fishing expedition (which are prone to false positive identifications) you should think about what hypotheses you want to test. Random data mining rarely yields results that survive scrutiny. In that regard I would recommend reading the Ionannidis paper (2005 PlosMed).

Posted

Here's something that may help. Looks like a lot of work but interesting. I have often thought that statistical analysis is a great tool to figure out what the heck is really going on. The ultimate truth filter.

 

here's links

 

http://davidmlane.com/hyperstat/Statistical_analyses.html#gen

 

http://spotfire.tibco.com/products/s-plus/statistical-analysis-software.aspx

 

http://www.ehow.com/how_5666129_conduct-water-quality-statistical-analysis.html

 

 

 

Posted

One sample a month? I would want at least one sample a day to do that effectively...

 

 

The reason for collecting a sample per month is because of the distance of the treatment work to the laboratory. It takes about 5 hours journeysad.gif

 

Here's something that may help. Looks like a lot of work but interesting. I have often thought that statistical analysis is a great tool to figure out what the heck is really going on. The ultimate truth filter.

 

here's links

 

http://davidmlane.co...alyses.html#gen

 

http://spotfire.tibc...s-software.aspx

 

http://www.ehow.com/...l-analysis.html

 

 

 

 

Thank you for the links...the first two links are not really relevant since I will be using SPSS, the third link was helpful. I wish for more similar guide

 

First of all I agree that once a month increases the variability of the data set and makes detailed comparisons very complicated. To assess what kind of analyses may be useful you need to add the single most important bit to your post. What do you actually want to see? What are the parameters? Do you want to find associations between parameters (e.g. temperature and bacterial content)? Or seasonal changes, or...? The more parameter you collect the more likely it is that you find random associations that will become significant with specific tests.

 

Thus, instead of a fishing expedition (which are prone to false positive identifications) you should think about what hypotheses you want to test. Random data mining rarely yields results that survive scrutiny. In that regard I would recommend reading the Ionannidis paper (2005 PlosMed).

 

 

I thought I stated the parameters (pH, BOD, etc). and Yes I want to find assoication between the parameters. Collecting more parameters; the research is limited to the time stipulated. Can please throw more light on "Random data mining.......". (Just reading the Ioannidis paper). You sounded more enlighten on the use of statistic, please do give more guide..Thank you

Posted

Stating a list of parameters but not having a hypothesis of the interdependencies is precisely on of the problems. In this case it would be the multiple-hypothesis problem. What happens is that if you test every parameter against everything else (which is very tempting to do as it is rather issue to amass a lot of data) then you are basically treating the parameters as independent. However, they are all from the same sample. I.e. if you test enough parameters you are bound to find something that correlates, by pure chance, with something else.

 

Think of flipping an unloaded coin. If you do it 20 times, you expect an outcome of 50% head to tail (or whatever is on the coin). However, if you let, say hundreds of people do it, you are likely to find at least a few in which the ratio is skewed. If you had blindly tested, you would have assumed that these coins were loaded, however it is basically due to repeated testings (in this particular case one would have had to average over all individuals, something you cannot do with the different parameters of the water sample).

 

Note that you will find many publications (especially epidemiological or environmental engineering papers) in which this is totally ignored. What you could do, is adjust the outcome of e.g. regression analyses depending on parameters tested. The most rigorous one are Bonferroni corrections. However, since they throw away a lot of true positives, usually one of the modified versions are used.

 

However, imo the correct way to do these kinds of analyses is to pre-define the dependencies that you expect. E.g. pH is temperature dependent. And then only utilize that relevant data and test for those. The correction are more minor that way (as you test fewer hypotheses on the same data set).

Posted
The reason for collecting a sample per month is because of the distance of the treatment work to the laboratory ... takes about 5 hours journeysad.gif

One sample a month for 12 months results in 12 samples, and depending on the statistics you use, 12 samples may be an insufficient amount from which to draw reliable conclusions. I always understood 30 samples to be a minimum number. This would be one every 12 days for you.

 

As to showing cause and effect, would a Pareto Chart be appropriate?

  • 5 years later...
Posted (edited)

That's good but in Polish, not English

Commercial link removed by moderator

On this website i find all answers 

Edited by Phi for All
No advertising, please

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.