netflow.methods.stats#
Functions
|
Compute Pearson correlation with each row in Y for a given row in X. |
|
Compute Pearson correlation between each row of X and all rows of Y in parallel. |
|
Compute Spearman correlation with each row in Y for a given row in X. |
|
Compute Spearman correlation between each row of X and all rows of Y in parallel. |
|
Perform the Mann-Whitney U rank test on two independent samples. |
|
|
|
Perform statistical test between datasets with FWER correction. |
|
Calculate the T-test for the means of two independent samples of scores. |
|
The Wilcoxon signed-rank test. |
- netflow.methods.stats.compute_pearson(row_x, row_y)[source]#
Compute Pearson correlation with each row in Y for a given row in X.
- Parameters:
row_x (array_like) – 1-D arrays representing multiple observations of a single variable. The correlation is computed between
row_xandrow_y.row_y (array_like) – 1-D arrays representing multiple observations of a single variable. The correlation is computed between
row_xandrow_y.
- Returns:
correlation (float) – The correlation.
p_value (float) – The p-value.
- netflow.methods.stats.compute_pearson_parallel(X, Y, num_processors=None, chunksize=None)[source]#
Compute Pearson correlation between each row of X and all rows of Y in parallel.
- Parameters:
X (pandas.DataFrame) – Dataframes containing multiple variables and observations. Each row represents a variable and each column is an observation of each variable. X and Y must have the same number of columns (i.e., the same observations) but they need not have the same number of variables.
Y (pandas.DataFrame) – Dataframes containing multiple variables and observations. Each row represents a variable and each column is an observation of each variable. X and Y must have the same number of columns (i.e., the same observations) but they need not have the same number of variables.
num_processors (int) – Number of processors to use. Defaults to None (uses all available).
- Returns:
correlations (dict) – The resulting correlations in the form
{index_row_X: {index_row_Y: corr}}p_values (dict) – The resulting p_values in the form
{index_row_X: {index_row_Y: p_value}}
- netflow.methods.stats.compute_spearman(row_x, row_y)[source]#
Compute Spearman correlation with each row in Y for a given row in X.
- Parameters:
row_x (array_like) – 1-D arrays representing multiple observations of a single variable. The correlation is computed between
row_xandrow_y.row_y (array_like) – 1-D arrays representing multiple observations of a single variable. The correlation is computed between
row_xandrow_y.
- Returns:
correlation (float) – The correlation.
p_value (float) – The p-value.
- netflow.methods.stats.compute_spearman_parallel(X, Y, num_processors=None, chunksize=None)[source]#
Compute Spearman correlation between each row of X and all rows of Y in parallel.
- Parameters:
X (pandas.DataFrame) – Dataframes containing multiple variables and observations. Each row represents a variable and each column is an observation of each variable. X and Y must have the same number of columns (i.e., the same observations) but they need not have the same number of variables.
Y (pandas.DataFrame) – Dataframes containing multiple variables and observations. Each row represents a variable and each column is an observation of each variable. X and Y must have the same number of columns (i.e., the same observations) but they need not have the same number of variables.
num_processors (int) – Number of processors to use. Defaults to None (uses all available).
- Returns:
correlations (dict) – The resulting correlations in the form
{index_row_X: {index_row_Y: corr}}p_values (dict) – The resulting p_values in the form
{index_row_X: {index_row_Y: p_value}}
- netflow.methods.stats.mann_whitney_u_test(values1, values2, alternative='two-sided', **kwargs)[source]#
Perform the Mann-Whitney U rank test on two independent samples.
The Mann-Whitney U test is a nonparametric test of the null hypothesis that the distribution underlying sample x is the same as the distribution underlying sample y. It is often used as a test of difference in location between distributions.
Computed via
scipy.stats.mannwhitneyu.- Parameters:
values1 (array-like) – The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default), which can be specified in
kwargs.values2 (array-like) – The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default), which can be specified in
kwargs.alternative ({'two-sided', 'less', 'greater'}, optional) –
Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):
’two-sided’: the means of the distributions underlying the samples are unequal.
’less’: the mean of the distribution underlying the first sample is less than the mean of the distribution underlying the second sample.
’greater’: the mean of the distribution underlying the first sample is greater than the mean of the distribution underlying the second sample.
kwarags (dict) – Key-word arguments passed to
scipy.stats.mannwhitneyu.
- Returns:
p_value – The p-value.
- Return type:
float
- netflow.methods.stats.stat_test(df1, df2, test='MWU', alpha=0.05, method='fdr_bh', **kwargs)[source]#
Perform statistical test between datasets with FWER correction.
The statistical tests are Computed via
scipy.stats.- Parameters:
df1 (pandas.DataFrame) – The measurements, where rows are features and columns are observations. The dataframes must have the same number of features (rows). If
test='wilcoxon', they must also have the same number of observationas (columns).df2 (pandas.DataFrame) – The measurements, where rows are features and columns are observations. The dataframes must have the same number of features (rows). If
test='wilcoxon', they must also have the same number of observationas (columns).test (str) –
The statistical test that should be performed. Options are:
’MWU’ : Mann Whitney-U Test (default).
’t-test’ : T-test
’wilcoxon’ : Wilcoxon Signed Rank Test
alpha (float) – The family-wise error rate (FWER), should be between 0 and 1.
method (str) –
Method for multiple test correction, default=’fdr_bh’.
Options:
bonferroni : one-step correction
sidak : one-step correction
holm-sidak : step down method using Sidak adjustments
holm : step-down method using Bonferroni adjustments
simes-hochberg : step-up method (independent)
hommel : closed method based on Simes tests (non-negative)
fdr_bh : Benjamini/Hochberg (non-negative)
fdr_by : Benjamini/Yekutieli (negative)
fdr_tsbh : two stage fdr correction (non-negative)
fdr_tsbky : two stage fdr correction (non-negative)
kwargs (dict) – Key-word arguments passed to
scipy.statsfor performing the statistical test.
- Returns:
record – Record of each feature, p-value, and corrected p-value.
- Return type:
pandas.DataFrame
- netflow.methods.stats.t_test(values1, values2, alternative='two-sided', **kwargs)[source]#
Calculate the T-test for the means of two independent samples of scores.
This is a test for the null hypothesis that 2 independent samples have identical average (expected) values. This test assumes that the populations have identical variances by default.
Computed via
scipy.stats.ttest_ind.- Parameters:
values1 (array-like) – The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default), which can be specified in
kwargs.values2 (array-like) – The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default), which can be specified in
kwargs.alternative ({'two-sided', 'less', 'greater'}, optional) –
Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):
’two-sided’: the means of the distributions underlying the samples are unequal.
’less’: the mean of the distribution underlying the first sample is less than the mean of the distribution underlying the second sample.
’greater’: the mean of the distribution underlying the first sample is greater than the mean of the distribution underlying the second sample.
kwarags (dict) – Key-word arguments passed to
scipy.stats.ttest_ind.
- Returns:
p_value – The p-value.
- Return type:
float
- netflow.methods.stats.wilcoxon_signed_rank_test(values1, values2=None, alternative='two-sided', **kwargs)[source]#
The Wilcoxon signed-rank test.
The Wilcoxon signed-rank test tests the null hypothesis that two related paired samples come from the same distribution. In particular, it tests whether the distribution of the differences x - y is symmetric about zero. It is a non-parametric version of the paired T-test.
Computed via
scipy.stats.wilcoxon.- Parameters:
values1 (array-like) – Either the first set of measurements (in which case
yis the second set of measurements), or the differences between two sets of measurements (in which caseyis not to be specified.) Must be one-dimensional.values2 (array-like) – Optional. Either the second set of measurements (if
xis the first set of measurements), or not specified (ifxis the differences between two sets of measurements.) Must be one-dimensional.alternative ({'two-sided', 'less', 'greater'}, optional) –
Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):
’two-sided’: the means of the distributions underlying the samples are unequal.
’less’: the mean of the distribution underlying the first sample is less than the mean of the distribution underlying the second sample.
’greater’: the mean of the distribution underlying the first sample is greater than the mean of the distribution underlying the second sample.
kwarags (dict) – Key-word arguments passed to
scipy.stats.wilcoxon.
- Returns:
p_value – The p-value.
- Return type:
float