scikit_posthocs.outliers_gesd

scikit_posthocs.outliers_gesd(x: Union[List, numpy.ndarray], outliers: int = 5, hypo: bool = False, report: bool = False, alpha: float = 0.05) Union[numpy.ndarray, str]

The generalized (Extreme Studentized Deviate) ESD test is used to detect one or more outliers in a univariate data set that follows an approximately normal distribution 1.

Parameters
  • x (Union[List, np.ndarray]) – An array, any object exposing the array interface, containing data to test for outliers.

  • outliers (int = 5) – Number of potential outliers to test for. Test is two-tailed, i.e. maximum and minimum values are checked for potential outliers.

  • hypo (bool = False) – Specifies whether to return a bool value of a hypothesis test result. Returns True when we can reject the null hypothesis. Otherwise, False. Available options are: 1) True - return a hypothesis test result. 2) False - return a filtered array without an outlier (default).

  • report (bool = False) – Specifies whether to return a summary table of the test. Available options are: 1) True - return a summary table. 2) False - return the array with outliers removed (default).

  • alpha (float = 0.05) – Significance level for a hypothesis test.

Returns

Returns the filtered array if alternative hypo is True, otherwise an unfiltered (input) array. If report argument is True, test report is returned instead of the result.

Return type

Union[np.ndarray, str]

Notes

1

Rosner, Bernard (May 1983), Percentage Points for a Generalized ESD Many-Outlier Procedure,Technometrics, 25(2), pp. 165-172.

Examples

>>> data = np.array([-0.25, 0.68, 0.94, 1.15, 1.2, 1.26, 1.26, 1.34,
    1.38, 1.43, 1.49, 1.49, 1.55, 1.56, 1.58, 1.65, 1.69, 1.7, 1.76,
    1.77, 1.81, 1.91, 1.94, 1.96, 1.99, 2.06, 2.09, 2.1, 2.14, 2.15,
    2.23, 2.24, 2.26, 2.35, 2.37, 2.4, 2.47, 2.54, 2.62, 2.64, 2.9,
    2.92, 2.92, 2.93, 3.21, 3.26, 3.3, 3.59, 3.68, 4.3, 4.64, 5.34,
    5.42, 6.01])
>>> outliers_gesd(data, 5)
array([-0.25,  0.68,  0.94,  1.15,  1.2 ,  1.26,  1.26,  1.34,  1.38,
        1.43,  1.49,  1.49,  1.55,  1.56,  1.58,  1.65,  1.69,  1.7 ,
        1.76,  1.77,  1.81,  1.91,  1.94,  1.96,  1.99,  2.06,  2.09,
        2.1 ,  2.14,  2.15,  2.23,  2.24,  2.26,  2.35,  2.37,  2.4 ,
        2.47,  2.54,  2.62,  2.64,  2.9 ,  2.92,  2.92,  2.93,  3.21,
        3.26,  3.3 ,  3.59,  3.68,  4.3 ,  4.64])
>>> outliers_gesd(data, outliers = 5, report = True)
H0: no outliers in the data
Ha: up to 5 outliers in the data
Significance level:  α = 0.05
Reject H0 if Ri > Critical Value (λi)
Summary Table for Two-Tailed Test
---------------------------------------
      Exact           Test     Critical
  Number of      Statistic    Value, λi
Outliers, i      Value, Ri          5 %
---------------------------------------
          1          3.119        3.159
          2          2.943        3.151
          3          3.179        3.144 *
          4           2.81        3.136
          5          2.816        3.128