Statistical tool to identify outliers in a data set

pavelcherepan · October 16, 2016

Hi all!

At the moment I'm looking at some huge array of geological data and need to quickly analyse it. What I'm mostly looking at is partitioning the data set into smaller chunks, while ensuring that the particular way this splitting is done would give me chunks of data with lowest variability possible.

I have some statistical software in the office, but I'm in the middle of nowhere and won't have access to it before the due date.

Any thoughts on what statistical measure I can use?

Thanks in advance!

**Klaynos** · October 16, 2016

I'd use R for the processing (it's free, RStudio is a pretty good ide for it).

My first attempt here would be to remove anything two standard deviations from the mean. That's the traditional first stab at removing outliers.

pavelcherepan · October 16, 2016

Thanks for the help, although I think I did phrase the discussion title in a wrong way.

I'll try to be more clear this time. I have a set of points with spatial location and some grade values. I need to separate the entire data set into an arbitrary number of spatially correlated chunks. Obviously, I can do it in many ways. So what statistical measure I could use for these resulting "chunks" of data to compare the variability of grade between different ways it can be split?

I hope this makes more sense.

studiot · October 16, 2016

I guess you are attempting something like this?

pavelcherepan · October 16, 2016

Not exactly, but similar. And going by your picture I need to separate for example those peaks in the bottom-right corner, but as there can be various ways I can do it, I need to somehow compare whether one is better than the other. I tried to go simple with stdev, but as the data is not entirely random I get lower stdev if I use an entire data set. If I go more than 20-30 points stdev levels off and starts decreasing and goes to its lowest with the entire data being used.

studiot · October 16, 2016

This is a difficult and modern subject to hit me with on a Sunday morning, especially with so little information about what your goal is, although I understand what you have told us so far.

Is this an exercise in geomorphology, terrain analysis, ground feature analysis or what?

It looks as though I have guessed correctly about the need for recognising features described by your contours and that you are looking for a way to best partition your 'map' into zones that best show these features?

pavelcherepan · October 16, 2016

It looks as though I have guessed correctly about the need for recognising features described by your contours and that you are looking for a way to best partition your 'map' into zones that best show these features?

Pretty much exactly this.

Is this an exercise in geomorphology, terrain analysis, ground feature analysis or what?

More of the practical grade control or rather trying to show colleagues that the way they are doing things is wrong. I intuitively see that it's wrong and I'm capable of doing it better, but I can't figure out what statistical tool can be used on a regular basis to compare.

studiot · October 16, 2016

OK so have you developed the contours themselves or do you just have a grid of number values?

**Klaynos** · October 16, 2016

Ok, I understand a bit better now. Sorry.

Depending on what your data is and what you're trying to show would something like hog spot analysis work? http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/hot-spot-analysis.htm

studiot · October 16, 2016

Ok, I understand a bit better now. Sorry.

Depending on what your data is and what you're trying to show would something like hog spot analysis work? http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/hot-spot-analysis.htm

ESRI have a good long standing reputation for all sorts of GIS (geographic information systems), though it is nearly twenty years since I last used their products.

It is good to be brought up to date.

pavelcherepan · October 16, 2016

Ok, I understand a bit better now. Sorry.

Depending on what your data is and what you're trying to show would something like hog spot analysis work? http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/hot-spot-analysis.htm

Thanks heaps! I'll have a look at this, see if it will work.

blue89 · November 2, 2016

Hi all!

At the moment I'm looking at some huge array of geological data and need to quickly analyse it. What I'm mostly looking at is partitioning the data set into smaller chunks, while ensuring that the particular way this splitting is done would give me chunks of data with lowest variability possible.

I have some statistical software in the office, but I'm in the middle of nowhere and won't have access to it before the due date.

Any thoughts on what statistical measure I can use?

Thanks in advance!

if you are asking a question about how to calculate a type of "probability",then you have to provide us more information and choose suitable distribution.

probably poisson distribution or one type of "continuoum distribution" (like: beta-gamma- cauchy,normal ,std.normal ...etc.) be suitable.

but I would inform that the analysis is quite different part of math.

a notation: these are commonly used at "statistical" analysis 1) Variance E[x] -E[x^2] ,2)covariance. 3)correlation=k (-1 <k <1 and k might be equal to -+1) ,5)deviation. 6)standart deviation. 7) trends ..etc.

Edited November 2, 2016 by blue89

Sign In

Statistical tool to identify outliers in a data set

Recommended Posts

pavelcherepan

Klaynos

pavelcherepan

studiot

pavelcherepan

studiot

pavelcherepan

studiot

Klaynos

studiot

pavelcherepan

blue89

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Important Information