Hi all what I wan't should be really simple for somebody here..I want to remove a row from a numpy array in a loop like:
for i in range(len(self.Finalweight)):
if self.Finalweight[i] >= self.cutoffOutliers:
"remove line[i from self.wData"
I'm trying to remove outliers from a dataset. My full code os the method is like:
def calculate_Outliers(self):
def calcWeight(Value):
pFinal = abs(Value - self.pMed)/ self.pDev_abs_Med
gradFinal = abs(gradient(Value) - self.gradMed) / self.gradDev_abs_Med
return pFinal * gradFinal
self.pMed = median(self.wData[:,self.yColum-1])
self.pDev_abs_Med = median(abs(self.wData[:,self.yColum-1] - self.pMed))
self.gradMed = median(gradient(self.wData[:,self.yColum-1]))
self.gradDev_abs_Med = median(abs(gradient(self.wData[:,self.yColum-1]) - self.gradMed))
self.workingData= self.wData[calcWeight(self.wData)<self.cutoffOutliers]
self.xData = self.workingData[:,self.xColum-1]
self.yData = self.workingData[:,self.yColum-1]
I'm getting the following error:
ile "bin/dmtools", line 201, in plot_gride self.calculate_Outliers() File "bin/dmtools", line 188, in calculate_Outliers self.workingData= self.wData[calcWeight(self.wData)>self.cutoffOutliers] ValueError: too many indices for array
1 Answer 1
There is actually a tool in NumPy specifically made to mask out outliers and invalid data points: masked arrays. Example from the linked page:
x = numpy.array([1, 2, 3, -1, 5])
mx = numpy.ma.masked_array(x, mask=[0, 0, 0, 1, 0])
print mx.mean()
prints
2.75
1 Comment
x[x!=-1].mean().
for i in range(...construction could be replaced with simple statement `self.wData= self.wData[self.Finalweight>= self.cutoffOutliers] Right? An other observation, if your calculation variables are temporal in nature there is no need to treat them as instance variables. Thanksfor i in range(... construct? Anyway these are my guidelines in general how to avoid deleting rows, columns, or elements ofnumpy-array in loops. I haven't considered your actualself.Finalweightcalculation logic at all. Please clarify if my suggested way to handle 'deleting items' fromnumpy-array is no way applicable to you. Thanks