scikit_posthocs.posthoc_dscf

scikit_posthocs.posthoc_dscf(a, val_col=None, group_col=None, sort=False)

Dwass, Steel, Critchlow and Fligner all-pairs comparison test for a one-factorial layout with non-normally distributed residuals. As opposed to the all-pairs comparison procedures that depend on Kruskal ranks, the DSCF test is basically an extension of the U-test as re-ranking is conducted for each pairwise test [1], [2], [3].

Parameters:
  • a (array_like or pandas DataFrame object) – An array, any object exposing the array interface or a pandas DataFrame.
  • val_col (str, optional) – Name of a DataFrame column that contains dependent variable values (test or response variable). Values should have a non-nominal scale. Must be specified if a is a pandas DataFrame object.
  • group_col (str, optional) – Name of a DataFrame column that contains independent variable values (grouping or predictor variable). Values should have a nominal scale (categorical). Must be specified if a is a pandas DataFrame object.
  • sort (bool, optional) – If True, sort data by block and group columns.
Returns:

Return type:

Pandas DataFrame containing p values.

Notes

The p values are computed from the Tukey-distribution.

References

[1]Douglas, C. E., Fligner, A. M. (1991) On distribution-free multiple comparisons in the one-way analysis of variance, Communications in Statistics - Theory and Methods, 20, 127-139.
[2]Dwass, M. (1960) Some k-sample rank-order tests. In Contributions to Probability and Statistics, Edited by: I. Olkin, Stanford: Stanford University Press.
[3]Steel, R. G. D. (1960) A rank sum test for comparing all pairs of treatments, Technometrics, 2, 197-207.

Examples

>>> import scikit_posthocs as sp
>>> import pandas as pd
>>> x = pd.DataFrame({"a": [1,2,3,5,1], "b": [12,31,54,62,12], "c": [10,12,6,74,11]})
>>> x = x.melt(var_name='groups', value_name='values')
>>> sp.posthoc_dscf(x, val_col='values', group_col='groups')