netflow.utils#
Functions
|
|
|
compute edge weights from given nodal weights e_normalized = True AND e_sqrt = True : w_ij = 1/sqrt([(p_ij + p_ji)/2]) e_normalized = True AND e_sqrt = False : w_ij = 1/[(p_ij + p_ji)/2] e_normalized = False AND e_sqrt = True : w_ij = 1/sqrt(w_i * w_j) e_normalized = False AND e_sqrt = False : w_ij = (1/w_i)*(1/w_j) NOTE: w_ij = INF if w_i=0 or w_j=0 e_wprob = True (then e_normalizezd and e_sqrt are ignored) : w_ij = 1 / (p_ij + p_ji - (p_ij * p_ji)) |
|
Compute eigen decomposition of (transition) matrix. |
|
Returns the weighted hop distance matrix for graph with nodes indexed from \(0, 1, ..., n-1\) where \(n\) is the number of nodes.. |
|
Returns transpose of random walk (rw) Laplacian matrix from Transition matrix constructed from node attribute 'weight' for anisotropic diffusion. |
|
Data dispersion computed as the absolute value of the variance-to-mean ratio where the variance and mean is computed on the values over the requested axis. |
|
Returns the x-location of a (single) knee of curve y=f(x). |
|
Smooth an array using Gaussian kernel. |
|
Gaussian window of width |
|
get array of simulation time-points |
|
Compute the action of the matrix exponential of \(-timestep * laplacian\) on the profile, a.k.a., the result of the action .math::exp(-tL)*profile. |
|
|
|
Compute the invariant measure of the profile (node weights) on the network. |
|
Calculate a Kendall's tau correlation coefficient with associated p-value using |
|
Compute the 1-step Markov transition probability of going from source to target node in G Note: not a lazy walk (i.e. |
|
Calculate a Spearman correlation coefficient with associated p-value using scipy.stats.spearmanr. |
|
Stack the upper triangular entries of the dataframe above the diagonal .. note::. |
|
Stack the upper triangular entries of the dataframe above the diagonal where the condition is True .. note::. |
|
Unstack pandas Series with upper triangular entries to symmetric matrix. |
|
Compute Von Neumann Entropy (VNE) of the data at increasing scales. |
- netflow.utils.clustermap(data, observations=None, features=None, transform=True, is_symmetric=True, optimal_nodes_ordering=True, linkage_kwargs={'method': 'ward', 'metric': 'euclidean'}, title='', vis_kwargs={'cbar_pos': (-0.02, 0.84, 0.02, 0.16), 'center': 0, 'cmap': 'RdBu', 'dendrogram_ratio': 0.07, 'figsize': (7, 7)})[source]#
- Parameters:
data (pandas.DataFrame, (m, n)) – A dataframe with \(m\) observations that are \(n\) dimensional.
observations (list) – Rows in
data
to use. If None, all rows are used.features (list) – Columns in
data
to use. If None, all columns are used.transform (bool) – If True, translate
data
with specifiedobservations
andfeatures
by feature mean and divide by feature standard deviation.is_symmetric (bool) – If True, treat
data
as symmetric and use the same clustering on rows and columns. Otherwise, if False, perform independent clustering and ordering on rows and columns.optimal_nodes_ordering (bool) – If True, optimize node ordering.
linkage_kwargs (dict) – Key-word arguments passed to
fastcluster.linkage
.title (str) – Figure title.
vis_kwargs (dict) – Key-word arguments passed to
seaborn.clustermap
.
- Returns:
cm – The plotted clustermap.
- Return type:
seaborn.matrix.ClusterGrid
- netflow.utils.compute_edge_weights(G: Graph, n_weight='weight', e_weight='weight', e_normalized=True, e_sqrt=True, e_wprob=False)[source]#
compute edge weights from given nodal weights e_normalized = True AND e_sqrt = True : w_ij = 1/sqrt([(p_ij + p_ji)/2]) e_normalized = True AND e_sqrt = False : w_ij = 1/[(p_ij + p_ji)/2] e_normalized = False AND e_sqrt = True : w_ij = 1/sqrt(w_i * w_j) e_normalized = False AND e_sqrt = False : w_ij = (1/w_i)*(1/w_j) NOTE: w_ij = INF if w_i=0 or w_j=0 e_wprob = True (then e_normalizezd and e_sqrt are ignored) : w_ij = 1 / (p_ij + p_ji - (p_ij * p_ji))
- netflow.utils.compute_eigen(T, n_comps: int = 0, sort: Literal['decrease', 'increase'] = 'decrease', random_state: None | int | RandomState = 0)[source]#
Compute eigen decomposition of (transition) matrix.
Code taken from
scanpy.neighbors.__init__.py
- Parameters:
T (numpy.ndarray, (n, n)) – Matrix that eigen decomposition will be computed on (likely the transition matrix).
n_comps – Number of eigenvalues/vectors to be computed, set
n_comps = 0
to compute the whole spectrum. Alternatively, if setn_comps >= n
, the whole spectrum will be computed.sort ({"decrease", "increase"}) – Order to sort eigenvalues.
random_state – A numpy random seed.
- Returns:
eigen_values (numpy.ndarray) – Eigenvalues of transition matrix.
eigen_basis (numpy.ndarray) – Matrix of eigenvectors (stored in columns).
.eigen_basis
is projection of data matrix on right eigenvectors, that is, the projection on the diffusion components. these are simply the components of the right eigenvectors and can directly be used for plotting.
- netflow.utils.compute_graph_distances(G, weight='weight')[source]#
Returns the weighted hop distance matrix for graph with nodes indexed from \(0, 1, ..., n-1\) where \(n\) is the number of nodes..
- Parameters:
G (networkx.Graph) – The graph with nodes assumed to be labeled consecutively from \(0, 1, ..., n-1\) where \(n\) is the number of nodes.
weight (str, optional) – Edge attribute of weights used for computing the weighted hop distance. If None, compute the unweighted distance. That is, rather than minimizing the sum of weights over all paths between every two nodes, minimize the number of edges.
- Returns:
dist – An n x n matrix of node-pairwise graph distances between the n nodes.
- Return type:
numpy ndarray
- netflow.utils.construct_anisotropic_laplacian_matrix(G, weight, use_spectral_gap=True)[source]#
Returns transpose of random walk (rw) Laplacian matrix from Transition matrix constructed from node attribute ‘weight’ for anisotropic diffusion.
Note
\(A\) is the binary (symmetric) adjacency matrix,
\(w\) is the array of node weights,
\(D\) is the diagonal matrix of the node-weighted degrees where \(D_{ii} = \sum_{j~i} w_j\), and
\(P = [p_{ij}]\) is the transition matrix where \(p_{ij}\) is the probability associated with transitioning from node \(i\) to node \(j\) defined as
\[ \begin{align}\begin{aligned}p_{ij} = \frac{w_j}{D_{ii}}, \, if \, i \sim j\\ 0, \, otherwise.\end{aligned}\end{align} \]Note
The graph Laplacian (\(L\)) and graph random-walk Laplacian (\(L_{rw}\)) are then defined as:
\[ \begin{align}\begin{aligned}L = D-A\\P = D^{-1}A\\L_{rw} = D^{-1}L = I - D^{-1}A = I - P\end{aligned}\end{align} \]Note
The transpose of the random-walk Laplacian is returned, \(L_{rw}^T\).
- netflow.utils.dispersion_(data, axis=0)[source]#
Data dispersion computed as the absolute value of the variance-to-mean ratio where the variance and mean is computed on the values over the requested axis.
- Parameters:
data (pandas.DataFrame) – Data used to compute dispersion.
axis ({0, 1}) –
Axis on which the variance and mean is applied on computed.
Options :
0 : for each column, apply function to the values over the index
1 : for each index, apply function to the values over the columns
- Returns:
vmr – Variance-to-mean ratio (vmr) quantifying the disperion.
- Return type:
pandas.Series
- netflow.utils.find_knee_point(y, x=None)[source]#
Returns the x-location of a (single) knee of curve y=f(x).
Taken from KrishnaswamyLab/spARC.
- Parameters:
y (array, shape=[n]) – data for which to find the knee point
x (array, optional, shape=[n], default=np.arange(len(y))) – indices of the data points of y, if these are not in order and evenly spaced
- Returns:
knee_point – The index (or x value) of the knee point on y
- Return type:
int
Examples
>>> import numpy as np >>> import phate >>> x = np.arange(20) >>> y = np.exp(-x/10) >>> phate.vne.find_knee_point(y,x) 8
- netflow.utils.gauss_conv(array, window_size=5, smoothness=2.5)[source]#
Smooth an array using Gaussian kernel.
- Parameters:
array (numpy.ndarray) – Array that will be smoothed.
window_size (int, default = 5) – Window size.
smoothness (float, default = 2.5) – Smoothness of curve.
- Returns:
smoothed_array – The smoothed array.
- Return type:
numpy.ndarray
- netflow.utils.gauss_window(window_size=5, smoothness=2.5)[source]#
Gaussian window of width
window_size
.- Parameters:
window_size (int, default = 5) – Window size.
smoothness (float, default = 2.5) – Smoothness of curve.
- Returns:
gauss – Gaussian window.
- Return type:
numpy.ndarray
- netflow.utils.get_times(times=None, t_min=-1.5, t_max=1.0, n_t=20, log_time=True)[source]#
get array of simulation time-points
- Parameters:
times ({None, numpy.ndarray}) – Array of times to evaluate the diffusion simulation. Note, if given,
t_min
,t_max
andn_t
are ignored.t_min (float) – First time point to evaluate the diffusion simulation. Note,
t_min
is ignored iftimes
is not None.t_max (float) – Last time point to evaluate the diffusion simulation. Note,
t_max
must be greater thant_min
, i.e, \(t_max > t_min\) andt_max
is ignored iftimes
is not None.n_t (int) – Number of time points to generate. Note,
n_t
is ignored iftimes
is not None.log_time (bool) – If True, return
n_t
numbers spaced evenly on a log scale, where the time sequence starts at10 ** t_min
, ends with10 ** t_max
, and the sequence of times if of the form10 ** t
wheret
is the n_t evenly spaced points between (and including)t_min
andt_max
. For example,_get_times(t_min=1, t_max=3, n_t=3, log_time=True) = array([10 ** 1, 10 ** 2, 10 ** 3]) = array([10., 100., 1000.])
. If False, returnn_t
numbers evenly spaced on a linear scale, where the sequence starts att_min
and ends witht_max
. For example,_get_times(t_min=1, t_max=3, n_t=3, log_time=False) = array([1. ,2., 3.])
.
- netflow.utils.heat_kernel(profile, laplacian, timestep)[source]#
Compute the action of the matrix exponential of \(-timestep * laplacian\) on the profile, a.k.a., the result of the action .math::exp(-tL)*profile.
- Parameters:
profile (numpy.ndarray, (n,)) – Feature profile.
- netflow.utils.interaction_iterator(G: Graph, interactions: object | None = None)[source]#
- Parameters:
G (networkx graph) – The graph.
interactions ({None, iterable, 'all'}) –
- netflow.utils.invariant_measure(profiles, G=None, adj=None)[source]#
Compute the invariant measure of the profile (node weights) on the network.
..Note:: Either the graph
G
or the adjacency matrixadj
must be provided.- Parameters:
profiles ({pandas.DataFrame (n_features, n_observations), numpy.ndarray (n_features, n_observations)}) – The network profile used as node weights for the initial condition of the random walk.
G (networkx.Graph) – The graph used to compute the adjacency matrix. Note, this is ignored if
adj
is provided. Ifprofile
is a pandas.DataFrame, the node ids inG
are expected to match theprofile
index. Otherwise,list(G)
is expected to match theprofile
order.adj ({numpy.ndarray, scipy.sparse.csr_matrix} (n_features, n_features)) – The (binary) adjacency matrix. Rows and columns are expected to be in the same order as in
profiles
. Note, if provided,G
is ignored.
- Returns:
IM – The invariant measures whose features and observations are in the same order as
profiles
.- Return type:
numpy.ndarray (n_features, n_observations)
- netflow.utils.kendall_tau_(data, **kwargs)[source]#
Calculate a Kendall’s tau correlation coefficient with associated p-value using
scipy.stats.kendalltau
.- Parameters:
data (numpy.ndarray, (n_observations, n_features)) – 2-D array containing multiple variables and observations, where each column represents a variable, with observations in the rows.
**kwargs (dict) – Optional key-word arguments passed to
scipy.stats.spearmanr
.
- Returns:
R (pandas.DataFrame) – Spearman correlation matrix. The correlation matrix is square with length equal to total number of variables (columns or rows).
pvalue (float) – The p-value for a hypothesis test whose null hypotheisis is that two sets of data are uncorrelated. See documentation for
scipy.stats.kendalltau
. for alternative hypotheses.pvalue
has the same shape asR
.
- netflow.utils.pij(G, source, target, n_weight='weight', EPS=1e-07)[source]#
Compute the 1-step Markov transition probability of going from source to target node in G Note: not a lazy walk (i.e. alpha=0)
- netflow.utils.spearmanr_(data, **kwargs)[source]#
Calculate a Spearman correlation coefficient with associated p-value using scipy.stats.spearmanr.
- Parameters:
data (numpy.ndarray, (n_observations, n_features)) – 2-D array containing multiple variables and observations, where each column represents a variable, with observations in the rows.
**kwargs (dict) – Optional key-word arguments passed to
scipy.stats.spearmanr
.
- Returns:
R (pandas.DataFrame) – Spearman correlation matrix. The correlation matrix is square with length equal to total number of variables (columns or rows).
pvalue (float) – The p-value for a hypothesis test whose null hypotheisis is that two sets of data are uncorrelated. See documentation for scipy.stats.spearmanr for alternative hypotheses.
pvalue
has the same shape asR
.
- netflow.utils.stack_triu_(df, name=None)[source]#
Stack the upper triangular entries of the dataframe above the diagonal .. note:
Useful for symmetric dataframes like correlations or distances.
- Parameters:
df (pandas.DataFrame) – Dataframe to stack. Note, upper triangular entries are taken from
df
as provided, with no check that the rows and columns are symmetric.name (str) – Optional name of pandas Series output
df_stacked
.
- Returns:
df_stacked – The stacked upper triangular entries above the diagonal of the dataframe.
- Return type:
pandas.Series
- netflow.utils.stack_triu_where_(df, condition, name=None)[source]#
Stack the upper triangular entries of the dataframe above the diagonal where the condition is True .. note:
Useful for symmetric dataframes like correlations or distances.
- Parameters:
df (pandas.DataFrame) – Dataframe to stack. Note, upper triangular entries are taken from
df
as provided, with no check that the rows and columns are symmetric.condition (pandas.DataFrame) – Boolean dataframe of the same size and order of rows and columns as
df
indicating values, where True, to include in the stacked dataframe.name (str) – Optional name of pandas Series output
df_stacked
.
- Returns:
df_stacked – The stacked upper triangular entries above the diagonal of the dataframe, where
condition
is True.- Return type:
pandas.Series
- netflow.utils.unstack_triu_(series, diag=0.0, index=None)[source]#
Unstack pandas Series with upper triangular entries to symmetric matrix.
- Parameters:
series (pandas.Series) – The stacked upper triangular entries to be unstacked.
diag (float) – The value to be used on the diagonal.
index (list-like, optional) – If provided, return unstacked matrix with rows and columns sorted by index. If None, rows and columns are alphebetically sorted.
- Returns:
M – Symmetric unstacked matrix with
diag
on the diagonal.- Return type:
pandas.DataFrame
- netflow.utils.von_neumann_entropy(X, tau_max=None)[source]#
Compute Von Neumann Entropy (VNE) of the data at increasing scales.
As described in GSPA and PHATE KrishnaswamyLab/spARC, https://pdfs.semanticscholar.org/16ab/e92b7630d5b84b904bde97dad9b9fbce406c.pdf.
Taken from KrishnaswamyLab/spARC.
- Parameters:
X (array-like) – Matrix to compute VNE on. Expected to be
n x n
transition matrix.tau_max (int) – Max scale
tau
(default is 100).
- Returns:
vne – The VNE at each scale up to
tau_max
.- Return type:
np.array`[``tau_max`]