netflow.methods.stats#
Functions
|
Compute Pearson correlation with each row in Y for a given row in X. |
|
Compute Pearson correlation between each row of X and all rows of Y in parallel. |
|
Compute Spearman correlation with each row in Y for a given row in X. |
|
Compute Spearman correlation between each row of X and all rows of Y in parallel. |
|
Perform the Mann-Whitney U rank test on two independent samples. |
|
|
|
Perform statistical test between datasets with FWER correction. |
|
Calculate the T-test for the means of two independent samples of scores. |
|
The Wilcoxon signed-rank test. |
- netflow.methods.stats.compute_pearson(row_x, row_y)[source]#
Compute Pearson correlation with each row in Y for a given row in X.
- Parameters:
row_x (array_like) – 1-D arrays representing multiple observations of a single variable. The correlation is computed between
row_x
androw_y
.row_y (array_like) – 1-D arrays representing multiple observations of a single variable. The correlation is computed between
row_x
androw_y
.
- Returns:
correlation (float) – The correlation.
p_value (float) – The p-value.
- netflow.methods.stats.compute_pearson_parallel(X, Y, num_processors=None, chunksize=None)[source]#
Compute Pearson correlation between each row of X and all rows of Y in parallel.
- Parameters:
X (pandas.DataFrame) – Dataframes containing multiple variables and observations. Each row represents a variable and each column is an observation of each variable. X and Y must have the same number of columns (i.e., the same observations) but they need not have the same number of variables.
Y (pandas.DataFrame) – Dataframes containing multiple variables and observations. Each row represents a variable and each column is an observation of each variable. X and Y must have the same number of columns (i.e., the same observations) but they need not have the same number of variables.
num_processors (int) – Number of processors to use. Defaults to None (uses all available).
- Returns:
correlations (dict) – The resulting correlations in the form
{index_row_X: {index_row_Y: corr}}
p_values (dict) – The resulting p_values in the form
{index_row_X: {index_row_Y: p_value}}
- netflow.methods.stats.compute_spearman(row_x, row_y)[source]#
Compute Spearman correlation with each row in Y for a given row in X.
- Parameters:
row_x (array_like) – 1-D arrays representing multiple observations of a single variable. The correlation is computed between
row_x
androw_y
.row_y (array_like) – 1-D arrays representing multiple observations of a single variable. The correlation is computed between
row_x
androw_y
.
- Returns:
correlation (float) – The correlation.
p_value (float) – The p-value.
- netflow.methods.stats.compute_spearman_parallel(X, Y, num_processors=None, chunksize=None)[source]#
Compute Spearman correlation between each row of X and all rows of Y in parallel.
- Parameters:
X (pandas.DataFrame) – Dataframes containing multiple variables and observations. Each row represents a variable and each column is an observation of each variable. X and Y must have the same number of columns (i.e., the same observations) but they need not have the same number of variables.
Y (pandas.DataFrame) – Dataframes containing multiple variables and observations. Each row represents a variable and each column is an observation of each variable. X and Y must have the same number of columns (i.e., the same observations) but they need not have the same number of variables.
num_processors (int) – Number of processors to use. Defaults to None (uses all available).
- Returns:
correlations (dict) – The resulting correlations in the form
{index_row_X: {index_row_Y: corr}}
p_values (dict) – The resulting p_values in the form
{index_row_X: {index_row_Y: p_value}}
- netflow.methods.stats.mann_whitney_u_test(values1, values2, alternative='two-sided', **kwargs)[source]#
Perform the Mann-Whitney U rank test on two independent samples.
The Mann-Whitney U test is a nonparametric test of the null hypothesis that the distribution underlying sample x is the same as the distribution underlying sample y. It is often used as a test of difference in location between distributions.
Computed via
scipy.stats.mannwhitneyu
.- Parameters:
values1 (array-like) – The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default), which can be specified in
kwargs
.values2 (array-like) – The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default), which can be specified in
kwargs
.alternative ({'two-sided', 'less', 'greater'}, optional) –
Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):
’two-sided’: the means of the distributions underlying the samples are unequal.
’less’: the mean of the distribution underlying the first sample is less than the mean of the distribution underlying the second sample.
’greater’: the mean of the distribution underlying the first sample is greater than the mean of the distribution underlying the second sample.
kwarags (dict) – Key-word arguments passed to
scipy.stats.mannwhitneyu
.
- Returns:
p_value – The p-value.
- Return type:
float
- netflow.methods.stats.stat_test(df1, df2, test='MWU', alpha=0.05, method='fdr_bh', **kwargs)[source]#
Perform statistical test between datasets with FWER correction.
The statistical tests are Computed via
scipy.stats
.- Parameters:
df1 (pandas.DataFrame) – The measurements, where rows are features and columns are observations. The dataframes must have the same number of features (rows). If
test='wilcoxon'
, they must also have the same number of observationas (columns).df2 (pandas.DataFrame) – The measurements, where rows are features and columns are observations. The dataframes must have the same number of features (rows). If
test='wilcoxon'
, they must also have the same number of observationas (columns).test (str) –
The statistical test that should be performed. Options are:
’MWU’ : Mann Whitney-U Test (default).
’t-test’ : T-test
’wilcoxon’ : Wilcoxon Signed Rank Test
alpha (float) – The family-wise error rate (FWER), should be between 0 and 1.
method (str) –
Method for multiple test correction, default=’fdr_bh’.
Options:
bonferroni : one-step correction
sidak : one-step correction
holm-sidak : step down method using Sidak adjustments
holm : step-down method using Bonferroni adjustments
simes-hochberg : step-up method (independent)
hommel : closed method based on Simes tests (non-negative)
fdr_bh : Benjamini/Hochberg (non-negative)
fdr_by : Benjamini/Yekutieli (negative)
fdr_tsbh : two stage fdr correction (non-negative)
fdr_tsbky : two stage fdr correction (non-negative)
kwargs (dict) – Key-word arguments passed to
scipy.stats
for performing the statistical test.
- Returns:
record – Record of each feature, p-value, and corrected p-value.
- Return type:
pandas.DataFrame
- netflow.methods.stats.t_test(values1, values2, alternative='two-sided', **kwargs)[source]#
Calculate the T-test for the means of two independent samples of scores.
This is a test for the null hypothesis that 2 independent samples have identical average (expected) values. This test assumes that the populations have identical variances by default.
Computed via
scipy.stats.ttest_ind
.- Parameters:
values1 (array-like) – The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default), which can be specified in
kwargs
.values2 (array-like) – The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default), which can be specified in
kwargs
.alternative ({'two-sided', 'less', 'greater'}, optional) –
Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):
’two-sided’: the means of the distributions underlying the samples are unequal.
’less’: the mean of the distribution underlying the first sample is less than the mean of the distribution underlying the second sample.
’greater’: the mean of the distribution underlying the first sample is greater than the mean of the distribution underlying the second sample.
kwarags (dict) – Key-word arguments passed to
scipy.stats.ttest_ind
.
- Returns:
p_value – The p-value.
- Return type:
float
- netflow.methods.stats.wilcoxon_signed_rank_test(values1, values2=None, alternative='two-sided', **kwargs)[source]#
The Wilcoxon signed-rank test.
The Wilcoxon signed-rank test tests the null hypothesis that two related paired samples come from the same distribution. In particular, it tests whether the distribution of the differences x - y is symmetric about zero. It is a non-parametric version of the paired T-test.
Computed via
scipy.stats.wilcoxon
.- Parameters:
values1 (array-like) – Either the first set of measurements (in which case
y
is the second set of measurements), or the differences between two sets of measurements (in which casey
is not to be specified.) Must be one-dimensional.values2 (array-like) – Optional. Either the second set of measurements (if
x
is the first set of measurements), or not specified (ifx
is the differences between two sets of measurements.) Must be one-dimensional.alternative ({'two-sided', 'less', 'greater'}, optional) –
Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):
’two-sided’: the means of the distributions underlying the samples are unequal.
’less’: the mean of the distribution underlying the first sample is less than the mean of the distribution underlying the second sample.
’greater’: the mean of the distribution underlying the first sample is greater than the mean of the distribution underlying the second sample.
kwarags (dict) – Key-word arguments passed to
scipy.stats.wilcoxon
.
- Returns:
p_value – The p-value.
- Return type:
float