scikit_posthocs.outliers_gesd

scikit_posthocs.outliers_gesd(x: Union[List[T], numpy.ndarray], outliers: int = 5, hypo: bool = False, report: bool = False, alpha: float = 0.05) → Union[numpy.ndarray, str]

The generalized (Extreme Studentized Deviate) ESD test is used to detect one or more outliers in a univariate data set that follows an approximately normal distribution [1].

Parameters:
  • x (Union[List, np.ndarray]) – An array, any object exposing the array interface, containing data to test for outliers.
  • outliers (int = 5) – Number of potential outliers to test for. Test is two-tailed, i.e. maximum and minimum values are checked for potential outliers.
  • hypo (bool = False) – Specifies whether to return a bool value of a hypothesis test result. Returns True when we can reject the null hypothesis. Otherwise, False. Available options are: 1) True - return a hypothesis test result. 2) False - return a filtered array without an outlier (default).
  • report (bool = False) – Specifies whether to return a summary table of the test. Available options are: 1) True - return a summary table. 2) False - return the array with outliers removed (default).
  • alpha (float = 0.05) – Significance level for a hypothesis test.
Returns:

Returns the filtered array if alternative hypo is True, otherwise an unfiltered (input) array. If report argument is True, test report is returned instead of the result.

Return type:

Union[np.ndarray, str]

Notes

[1]Rosner, Bernard (May 1983), Percentage Points for a Generalized ESD Many-Outlier Procedure,Technometrics, 25(2), pp. 165-172.

Examples

>>> data = np.array([-0.25, 0.68, 0.94, 1.15, 1.2, 1.26, 1.26, 1.34,
    1.38, 1.43, 1.49, 1.49, 1.55, 1.56, 1.58, 1.65, 1.69, 1.7, 1.76,
    1.77, 1.81, 1.91, 1.94, 1.96, 1.99, 2.06, 2.09, 2.1, 2.14, 2.15,
    2.23, 2.24, 2.26, 2.35, 2.37, 2.4, 2.47, 2.54, 2.62, 2.64, 2.9,
    2.92, 2.92, 2.93, 3.21, 3.26, 3.3, 3.59, 3.68, 4.3, 4.64, 5.34,
    5.42, 6.01])
>>> outliers_gesd(data, 5)
array([-0.25,  0.68,  0.94,  1.15,  1.2 ,  1.26,  1.26,  1.34,  1.38,
        1.43,  1.49,  1.49,  1.55,  1.56,  1.58,  1.65,  1.69,  1.7 ,
        1.76,  1.77,  1.81,  1.91,  1.94,  1.96,  1.99,  2.06,  2.09,
        2.1 ,  2.14,  2.15,  2.23,  2.24,  2.26,  2.35,  2.37,  2.4 ,
        2.47,  2.54,  2.62,  2.64,  2.9 ,  2.92,  2.92,  2.93,  3.21,
        3.26,  3.3 ,  3.59,  3.68,  4.3 ,  4.64])
>>> outliers_gesd(data, outliers = 5, report = True)
H0: no outliers in the data
Ha: up to 5 outliers in the data
Significance level:  α = 0.05
Reject H0 if Ri > Critical Value (λi)
Summary Table for Two-Tailed Test
---------------------------------------
      Exact           Test     Critical
  Number of      Statistic    Value, λi
Outliers, i      Value, Ri          5 %
---------------------------------------
          1          3.119        3.159
          2          2.943        3.151
          3          3.179        3.144 *
          4           2.81        3.136
          5          2.816        3.128