netflow.utils#

Functions

clustermap(data[, observations, features, ...])

param data:

A dataframe with \(m\) observations that are \(n\) dimensional.

compute_edge_weights(G[, n_weight, ...])

compute edge weights from given nodal weights e_normalized = True AND e_sqrt = True : w_ij = 1/sqrt([(p_ij + p_ji)/2]) e_normalized = True AND e_sqrt = False : w_ij = 1/[(p_ij + p_ji)/2] e_normalized = False AND e_sqrt = True : w_ij = 1/sqrt(w_i * w_j) e_normalized = False AND e_sqrt = False : w_ij = (1/w_i)*(1/w_j) NOTE: w_ij = INF if w_i=0 or w_j=0 e_wprob = True (then e_normalizezd and e_sqrt are ignored) : w_ij = 1 / (p_ij + p_ji - (p_ij * p_ji))

compute_eigen(T[, n_comps, sort, random_state])

Compute eigen decomposition of (transition) matrix.

compute_graph_distances(G[, weight])

Returns the weighted hop distance matrix for graph with nodes indexed from \(0, 1, ..., n-1\) where \(n\) is the number of nodes..

construct_anisotropic_laplacian_matrix(G, weight)

Returns transpose of random walk (rw) Laplacian matrix from Transition matrix constructed from node attribute 'weight' for anisotropic diffusion.

dispersion_(data[, axis])

Data dispersion computed as the absolute value of the variance-to-mean ratio where the variance and mean is computed on the values over the requested axis.

find_knee_point(y[, x])

Returns the x-location of a (single) knee of curve y=f(x).

gauss_conv(array[, window_size, smoothness])

Smooth an array using Gaussian kernel.

gauss_window([window_size, smoothness])

Gaussian window of width window_size.

get_times([times, t_min, t_max, n_t, log_time])

get array of simulation time-points

heat_kernel(profile, laplacian, timestep)

Compute the action of the matrix exponential of \(-timestep * laplacian\) on the profile, a.k.a., the result of the action .math::exp(-tL)*profile.

interaction_iterator(G[, interactions])

param G:

The graph.

invariant_measure(profiles[, G, adj])

Compute the invariant measure of the profile (node weights) on the network.

kendall_tau_(data, **kwargs)

Calculate a Kendall's tau correlation coefficient with associated p-value using scipy.stats.kendalltau.

pij(G, source, target[, n_weight, EPS])

Compute the 1-step Markov transition probability of going from source to target node in G Note: not a lazy walk (i.e.

spearmanr_(data, **kwargs)

Calculate a Spearman correlation coefficient with associated p-value using scipy.stats.spearmanr.

stack_triu_(df[, name])

Stack the upper triangular entries of the dataframe above the diagonal .. note::.

stack_triu_where_(df, condition[, name])

Stack the upper triangular entries of the dataframe above the diagonal where the condition is True .. note::.

unstack_triu_(series[, diag, index])

Unstack pandas Series with upper triangular entries to symmetric matrix.

von_neumann_entropy(X[, tau_max])

Compute Von Neumann Entropy (VNE) of the data at increasing scales.

netflow.utils.clustermap(data, observations=None, features=None, transform=True, is_symmetric=True, optimal_nodes_ordering=True, linkage_kwargs={'method': 'ward', 'metric': 'euclidean'}, title='', vis_kwargs={'cbar_pos': (-0.02, 0.84, 0.02, 0.16), 'center': 0, 'cmap': 'RdBu', 'dendrogram_ratio': 0.07, 'figsize': (7, 7)})[source]#
Parameters:
  • data (pandas.DataFrame, (m, n)) – A dataframe with \(m\) observations that are \(n\) dimensional.

  • observations (list) – Rows in data to use. If None, all rows are used.

  • features (list) – Columns in data to use. If None, all columns are used.

  • transform (bool) – If True, translate data with specified observations and features by feature mean and divide by feature standard deviation.

  • is_symmetric (bool) – If True, treat data as symmetric and use the same clustering on rows and columns. Otherwise, if False, perform independent clustering and ordering on rows and columns.

  • optimal_nodes_ordering (bool) – If True, optimize node ordering.

  • linkage_kwargs (dict) – Key-word arguments passed to fastcluster.linkage.

  • title (str) – Figure title.

  • vis_kwargs (dict) – Key-word arguments passed to seaborn.clustermap.

Returns:

cm – The plotted clustermap.

Return type:

seaborn.matrix.ClusterGrid

netflow.utils.compute_edge_weights(G: Graph, n_weight='weight', e_weight='weight', e_normalized=True, e_sqrt=True, e_wprob=False)[source]#

compute edge weights from given nodal weights e_normalized = True AND e_sqrt = True : w_ij = 1/sqrt([(p_ij + p_ji)/2]) e_normalized = True AND e_sqrt = False : w_ij = 1/[(p_ij + p_ji)/2] e_normalized = False AND e_sqrt = True : w_ij = 1/sqrt(w_i * w_j) e_normalized = False AND e_sqrt = False : w_ij = (1/w_i)*(1/w_j) NOTE: w_ij = INF if w_i=0 or w_j=0 e_wprob = True (then e_normalizezd and e_sqrt are ignored) : w_ij = 1 / (p_ij + p_ji - (p_ij * p_ji))

netflow.utils.compute_eigen(T, n_comps: int = 0, sort: Literal['decrease', 'increase'] = 'decrease', random_state: None | int | RandomState = 0)[source]#

Compute eigen decomposition of (transition) matrix.

Code taken from scanpy.neighbors.__init__.py

Parameters:
  • T (numpy.ndarray, (n, n)) – Matrix that eigen decomposition will be computed on (likely the transition matrix).

  • n_comps – Number of eigenvalues/vectors to be computed, set n_comps = 0 to compute the whole spectrum. Alternatively, if set n_comps >= n, the whole spectrum will be computed.

  • sort ({"decrease", "increase"}) – Order to sort eigenvalues.

  • random_state – A numpy random seed.

Returns:

  • eigen_values (numpy.ndarray) – Eigenvalues of transition matrix.

  • eigen_basis (numpy.ndarray) – Matrix of eigenvectors (stored in columns). .eigen_basis is projection of data matrix on right eigenvectors, that is, the projection on the diffusion components. these are simply the components of the right eigenvectors and can directly be used for plotting.

netflow.utils.compute_graph_distances(G, weight='weight')[source]#

Returns the weighted hop distance matrix for graph with nodes indexed from \(0, 1, ..., n-1\) where \(n\) is the number of nodes..

Parameters:
  • G (networkx.Graph) – The graph with nodes assumed to be labeled consecutively from \(0, 1, ..., n-1\) where \(n\) is the number of nodes.

  • weight (str, optional) – Edge attribute of weights used for computing the weighted hop distance. If None, compute the unweighted distance. That is, rather than minimizing the sum of weights over all paths between every two nodes, minimize the number of edges.

Returns:

dist – An n x n matrix of node-pairwise graph distances between the n nodes.

Return type:

numpy ndarray

netflow.utils.construct_anisotropic_laplacian_matrix(G, weight, use_spectral_gap=True)[source]#

Returns transpose of random walk (rw) Laplacian matrix from Transition matrix constructed from node attribute ‘weight’ for anisotropic diffusion.

Note

\(A\) is the binary (symmetric) adjacency matrix,

\(w\) is the array of node weights,

\(D\) is the diagonal matrix of the node-weighted degrees where \(D_{ii} = \sum_{j~i} w_j\), and

\(P = [p_{ij}]\) is the transition matrix where \(p_{ij}\) is the probability associated with transitioning from node \(i\) to node \(j\) defined as

\[ \begin{align}\begin{aligned}p_{ij} = \frac{w_j}{D_{ii}}, \, if \, i \sim j\\ 0, \, otherwise.\end{aligned}\end{align} \]

Note

The graph Laplacian (\(L\)) and graph random-walk Laplacian (\(L_{rw}\)) are then defined as:

\[ \begin{align}\begin{aligned}L = D-A\\P = D^{-1}A\\L_{rw} = D^{-1}L = I - D^{-1}A = I - P\end{aligned}\end{align} \]

Note

The transpose of the random-walk Laplacian is returned, \(L_{rw}^T\).

netflow.utils.dispersion_(data, axis=0)[source]#

Data dispersion computed as the absolute value of the variance-to-mean ratio where the variance and mean is computed on the values over the requested axis.

Parameters:
  • data (pandas.DataFrame) – Data used to compute dispersion.

  • axis ({0, 1}) –

    Axis on which the variance and mean is applied on computed.

    Options :

    • 0 : for each column, apply function to the values over the index

    • 1 : for each index, apply function to the values over the columns

Returns:

vmr – Variance-to-mean ratio (vmr) quantifying the disperion.

Return type:

pandas.Series

netflow.utils.find_knee_point(y, x=None)[source]#

Returns the x-location of a (single) knee of curve y=f(x).

Taken from KrishnaswamyLab/spARC.

Parameters:
  • y (array, shape=[n]) – data for which to find the knee point

  • x (array, optional, shape=[n], default=np.arange(len(y))) – indices of the data points of y, if these are not in order and evenly spaced

Returns:

knee_point – The index (or x value) of the knee point on y

Return type:

int

Examples

>>> import numpy as np
>>> import phate
>>> x = np.arange(20)
>>> y = np.exp(-x/10)
>>> phate.vne.find_knee_point(y,x)
8
netflow.utils.gauss_conv(array, window_size=5, smoothness=2.5)[source]#

Smooth an array using Gaussian kernel.

Parameters:
  • array (numpy.ndarray) – Array that will be smoothed.

  • window_size (int, default = 5) – Window size.

  • smoothness (float, default = 2.5) – Smoothness of curve.

Returns:

smoothed_array – The smoothed array.

Return type:

numpy.ndarray

netflow.utils.gauss_window(window_size=5, smoothness=2.5)[source]#

Gaussian window of width window_size.

Parameters:
  • window_size (int, default = 5) – Window size.

  • smoothness (float, default = 2.5) – Smoothness of curve.

Returns:

gauss – Gaussian window.

Return type:

numpy.ndarray

netflow.utils.get_times(times=None, t_min=-1.5, t_max=1.0, n_t=20, log_time=True)[source]#

get array of simulation time-points

Parameters:
  • times ({None, numpy.ndarray}) – Array of times to evaluate the diffusion simulation. Note, if given, t_min, t_max and n_t are ignored.

  • t_min (float) – First time point to evaluate the diffusion simulation. Note, t_min is ignored if times is not None.

  • t_max (float) – Last time point to evaluate the diffusion simulation. Note, t_max must be greater than t_min, i.e, \(t_max > t_min\) and t_max is ignored if times is not None.

  • n_t (int) – Number of time points to generate. Note, n_t is ignored if times is not None.

  • log_time (bool) – If True, return n_t numbers spaced evenly on a log scale, where the time sequence starts at 10 ** t_min, ends with 10 ** t_max, and the sequence of times if of the form 10 ** t where t is the n_t evenly spaced points between (and including) t_min and t_max. For example, _get_times(t_min=1, t_max=3, n_t=3, log_time=True) = array([10 ** 1, 10 ** 2, 10 ** 3]) = array([10., 100., 1000.]). If False, return n_t numbers evenly spaced on a linear scale, where the sequence starts at t_min and ends with t_max. For example, _get_times(t_min=1, t_max=3, n_t=3, log_time=False) = array([1. ,2., 3.]).

netflow.utils.heat_kernel(profile, laplacian, timestep)[source]#

Compute the action of the matrix exponential of \(-timestep * laplacian\) on the profile, a.k.a., the result of the action .math::exp(-tL)*profile.

Parameters:

profile (numpy.ndarray, (n,)) – Feature profile.

netflow.utils.interaction_iterator(G: Graph, interactions: object | None = None)[source]#
Parameters:
  • G (networkx graph) – The graph.

  • interactions ({None, iterable, 'all'}) –

netflow.utils.invariant_measure(profiles, G=None, adj=None)[source]#

Compute the invariant measure of the profile (node weights) on the network.

..Note:: Either the graph G or the adjacency matrix adj must be provided.

Parameters:
  • profiles ({pandas.DataFrame (n_features, n_observations), numpy.ndarray (n_features, n_observations)}) – The network profile used as node weights for the initial condition of the random walk.

  • G (networkx.Graph) – The graph used to compute the adjacency matrix. Note, this is ignored if adj is provided. If profile is a pandas.DataFrame, the node ids in G are expected to match the profile index. Otherwise, list(G) is expected to match the profile order.

  • adj ({numpy.ndarray, scipy.sparse.csr_matrix} (n_features, n_features)) – The (binary) adjacency matrix. Rows and columns are expected to be in the same order as in profiles. Note, if provided, G is ignored.

Returns:

IM – The invariant measures whose features and observations are in the same order as profiles.

Return type:

numpy.ndarray (n_features, n_observations)

netflow.utils.kendall_tau_(data, **kwargs)[source]#

Calculate a Kendall’s tau correlation coefficient with associated p-value using scipy.stats.kendalltau.

Parameters:
  • data (numpy.ndarray, (n_observations, n_features)) – 2-D array containing multiple variables and observations, where each column represents a variable, with observations in the rows.

  • **kwargs (dict) – Optional key-word arguments passed to scipy.stats.spearmanr.

Returns:

  • R (pandas.DataFrame) – Spearman correlation matrix. The correlation matrix is square with length equal to total number of variables (columns or rows).

  • pvalue (float) – The p-value for a hypothesis test whose null hypotheisis is that two sets of data are uncorrelated. See documentation for scipy.stats.kendalltau. for alternative hypotheses. pvalue has the same shape as R.

netflow.utils.pij(G, source, target, n_weight='weight', EPS=1e-07)[source]#

Compute the 1-step Markov transition probability of going from source to target node in G Note: not a lazy walk (i.e. alpha=0)

netflow.utils.spearmanr_(data, **kwargs)[source]#

Calculate a Spearman correlation coefficient with associated p-value using scipy.stats.spearmanr.

Parameters:
  • data (numpy.ndarray, (n_observations, n_features)) – 2-D array containing multiple variables and observations, where each column represents a variable, with observations in the rows.

  • **kwargs (dict) – Optional key-word arguments passed to scipy.stats.spearmanr.

Returns:

  • R (pandas.DataFrame) – Spearman correlation matrix. The correlation matrix is square with length equal to total number of variables (columns or rows).

  • pvalue (float) – The p-value for a hypothesis test whose null hypotheisis is that two sets of data are uncorrelated. See documentation for scipy.stats.spearmanr for alternative hypotheses. pvalue has the same shape as R.

netflow.utils.stack_triu_(df, name=None)[source]#

Stack the upper triangular entries of the dataframe above the diagonal .. note:

Useful for symmetric dataframes like correlations or distances.
Parameters:
  • df (pandas.DataFrame) – Dataframe to stack. Note, upper triangular entries are taken from df as provided, with no check that the rows and columns are symmetric.

  • name (str) – Optional name of pandas Series output df_stacked.

Returns:

df_stacked – The stacked upper triangular entries above the diagonal of the dataframe.

Return type:

pandas.Series

netflow.utils.stack_triu_where_(df, condition, name=None)[source]#

Stack the upper triangular entries of the dataframe above the diagonal where the condition is True .. note:

Useful for symmetric dataframes like correlations or distances.
Parameters:
  • df (pandas.DataFrame) – Dataframe to stack. Note, upper triangular entries are taken from df as provided, with no check that the rows and columns are symmetric.

  • condition (pandas.DataFrame) – Boolean dataframe of the same size and order of rows and columns as df indicating values, where True, to include in the stacked dataframe.

  • name (str) – Optional name of pandas Series output df_stacked.

Returns:

df_stacked – The stacked upper triangular entries above the diagonal of the dataframe, where condition is True.

Return type:

pandas.Series

netflow.utils.unstack_triu_(series, diag=0.0, index=None)[source]#

Unstack pandas Series with upper triangular entries to symmetric matrix.

Parameters:
  • series (pandas.Series) – The stacked upper triangular entries to be unstacked.

  • diag (float) – The value to be used on the diagonal.

  • index (list-like, optional) – If provided, return unstacked matrix with rows and columns sorted by index. If None, rows and columns are alphebetically sorted.

Returns:

M – Symmetric unstacked matrix with diag on the diagonal.

Return type:

pandas.DataFrame

netflow.utils.von_neumann_entropy(X, tau_max=None)[source]#

Compute Von Neumann Entropy (VNE) of the data at increasing scales.

As described in GSPA and PHATE KrishnaswamyLab/spARC, https://pdfs.semanticscholar.org/16ab/e92b7630d5b84b904bde97dad9b9fbce406c.pdf.

Taken from KrishnaswamyLab/spARC.

Parameters:
  • X (array-like) – Matrix to compute VNE on. Expected to be n x n transition matrix.

  • tau_max (int) – Max scale tau (default is 100).

Returns:

vne – The VNE at each scale up to tau_max.

Return type:

np.array`[``tau_max`]