netflow.methods.network_propagation#

Functions

diffuse_network_mutation_profiles(G, data[, ...])

Compute diffused (network-propagated) profiles for each sample.

netflow.methods.network_propagation._build_operator_W(G: ~networkx.classes.graph.Graph, feats_order: list[str], transition_norm: ~typing.Literal['random_walk', 'symmetric'] = 'random_walk', weight: str | None = None, dtype: str | ~numpy.dtype = <class 'numpy.float64'>) → csr_matrix[source]#

Build the sparse graph diffusion operator W over a specified feature ordering.

This constructs adjacency A in the exact order featss_order, then normalizes it into W.

Parameters:

G (nx.Graph) – Graph containing the features in feats_order.
feats_order – List of features (e.g., genes) IDs defining the matrix row/column order.
transition_norm ({"random_walk", "symmetric"}, default="random_walk") –
How to normalize adjacency into the transition matrix / diffusion operator W.

Let A be adjacency and D the diagonal degree matrix (D_ii = sum_j A_ij). We construct a sparse operator W used via left-multiplication: W @ X.

options:
- ”random_walk”: Computes the random-walk transition P := D^{-1}A.
  The random walk with restart (RWR) update rule is implemented using the operator W := P^T: X_{t+1} = (1-α) X0 + α W X_t. Recommended default choice for network propagation of sparse binary mutations.
- ”symmetric”: Computes the symmetric normalized adjacency: W = D^{-1/2} A D^{-1/2}.
  Recommended if a symmetric operator is wanted (often nicer mathematically / numerically). Sometimes preferable because it can reduce some degree-driven bias compared with pure random-walk operators.
weight ({str, None}, default=None) – Edge attribute name in G to use as weights. If None, edges are treated as weight=1.
dtype ({numpy dtype, str}, default=np.float64) – Numeric dtype used for computations.

Returns:

W – W with shape (n_feats, n_feats), CSR format.

Return type:

scipy.sparse.csr_matrix

Notes

Self-loops are removed from the adjacency by default (diagonal set to 0).
Isolated nodes (degree 0) will not propagate; restart-based diffusions keep them anchored.

netflow.methods.network_propagation._closed_form_solve( M: csr_matrix, B: ndarray, chunksize: int, ) → ndarray[source]#

Solve the sparse linear system M X = B for multiple RHS columns via one LU factorization.

Used for closed-form stationary solutions:

RWR: (I - αW) X* = (1-α)X0
Laplacian: (I + λL) X* = X0

Implementation#

Factorize M once using sparse LU (splu).
Solve in blocks of RHS columns to limit peak memory.

param M:: Sparse square matrix (n x n).
type M:: csr_matrix
param B:: Dense RHS matrix (n x m).
type B:: np.ndarray
param chunksize:: Number of RHS columns to solve at a time.
type chunksize:: int
returns:: X – Dense solution X with shape (n x m).
rtype:: np.ndarray

Notes

For extremely large n, might need iterative linear solvers (cg/gmres) + preconditioning. This LU approach is simple and usually robust for medium-to-large graphs.

netflow.methods.network_propagation._fixed_point_rwr( W: csr_matrix, X0: ndarray, alpha: float, max_iter: int, tol: float, ) → ndarray[source]#

Fixed-point iteration solver for Random Walk with Restart (RWR) diffusion.

Model#

Update rule:: X_{t+1} = (1-α) X0 + α W X_t
Stationary solution (if it converges):: X* = (1-α) (I - αW)^{-1} X0

param W:

Sparse diffusion operator (n x n).

type W:

csr_matrix

param X0:

Seed matrix (n x m) with m samples.

type X0:

np.ndarray

param alpha:

0 <= α < 1. Larger α => more diffusion.

type alpha:

float

param max_iter:

Maximum iterations.

type max_iter:

int

param tol:

Relative convergence criterion on Frobenius norm:: ||X_new - X|| / (||X|| + eps) < tol

type tol:

float

returns:

X – Diffused profiles X* (n x m).

rtype:

np.ndarray

netflow.methods.network_propagation.diffuse_network_mutation_profiles( G: ~networkx.classes.graph.Graph, data: str | ~pandas.core.frame.DataFrame, transition_norm: ~typing.Literal['random_walk', 'symmetric'] = 'random_walk', weight: str | None = None, propagation: ~typing.Literal['rwr', 'heat_kernel', 'laplacian_solve'] = 'rwr', alpha: float = 0.7, t: float = 1.0, lam: float = 1.0, solve_mode: ~typing.Literal['iter', 'closed_form'] = 'iter', max_iter: int = 200, tol: float = 1e-08, chunksize: int = 64, dtype: str | ~numpy.dtype = <class 'numpy.float64'>, ) → DataFrame[source]#

Compute diffused (network-propagated) profiles for each sample.

Given a feature network G (e.g., an undirected PPI) and an input feature matrix X (features x samples, often binary 0/1, e.g., mutations) given by data, compute the diffused / network-smoothed profiles matrix X_out (same shape/order) for each sample by propagating signal along network edges.

Parameters:

G (nx.Graph) – Feature network (e.g., nodes are genes). Node IDs must match the IDs in the input data matrix row index.
data ({str, pd.DataFrame}) – Feature-by-sample data matrix (rows=features, columns=samples)
transition_norm ({"random_walk", "symmetric"}, default="random_walk") –
How to normalize adjacency into the transition matrix / diffusion operator W.

Let A be adjacency and D the diagonal degree matrix (D_ii = sum_j A_ij). We construct a sparse operator W used via left-multiplication: W @ X.

transition_norm options:
- ”random_walk”: Computes the random-walk transition P = D^{-1}A.
  The random walk with restart (RWR) update rule is implemented using the operator W := P^T: X_{t+1} = (1-α) X0 + α P^T X_t. Recommended default choice for network propagation of sparse binary mutations.
- ”symmetric”: Computes the symmetric normalized adjacency: W = D^{-1/2} A D^{-1/2}.
  Recommended if a symmetric operator is wanted (often nicer mathematically / numerically). Sometimes preferable because it can reduce some degree-driven bias compared with pure random-walk operators.
weight ({str, None}, default=None) – Edge attribute name in G to use as weights. If None, edges are treated as weight=1.
propagation ({"rwr", "heat_kernel", "laplacian_solve"}, default="rwr") –
Which diffusion model to use.

propagation options
- ”rwr”Random Walk with Restart
  Update rule:
  X_{k+1} = (1-α) X0 + α W X_k
  
  Closed form (stationary solution):
  X* = (1-α) (I - αW)^{-1} X0
  
  Interpretation: - Each step mixes an “anchor to original mutation profile” term (X0) with a
  
  “spread across network” term (W X_t).
  - α controls locality: smaller α -> more local; larger α -> more global smoothing.
  Examples in literature: - Network-based stratification / network propagation pipelines (e.g., NBS-style smoothing) - Classic “network propagation” for biological signal diffusion
  
  Recommendation: - Default for smoothing sparse binary mutation profiles prior to similarity/clustering. - Start α ≈ 0.7; sweep 0.5–0.85 for sensitivity. - Set transition_norm = “random_walk”
  
  Uses parameters: alpha, solve_mode, (max_iter,tol if solve_mode=”iter”),
  (chunksize if solve_mode=”closed_form”)
  
  Ignores parameters: beta, t, lam
- ”heat_kernel”Heat-kernel diffusion
  
  Continuous-time diffusion:
  X*(t) = exp(-t L) X0, with L = I - W
  
  Recommendation: - Use if you want a “diffusion time” knob t with a multi-scale interpretation. - Often explored by scanning t across scales. - Set transition_norm = “symmetric”
  
  Uses parameters: t Ignores parameters: alpha, beta, lam, solve_mode, max_iter, tol, chunksize
- ”laplacian_solve” Laplacian-regularized smoothing
  
  Linear solve:
  X* = (I + λL)^{-1} X0, with L = I - W
  
  Equivalent optimization view:
  argmin_X ||X - X0||_F^2 + λ * Tr(X^T L X)
  
  Recommendation: - Use when you prefer a regularization knob λ and a single sparse solve. - Often stable and efficient for large graphs and many samples. - Set transition_norm = “symmetric”
  
  Uses parameters: lam, chunksize Ignores parameters: alpha, beta, t, solve_mode, max_iter, tol
alpha (float, default=0.7) –
RWR diffusion strength (restart complement). Must satisfy 0 <= alpha < 1. Larger alpha => more diffusion; smaller alpha => more anchored to X0 (less diffusion). Used only if propagation=”rwr”. Ignored otherwise.

Practical guidance based on literature (binary mutations on undirected PPI): - alpha ~ 0.7 is a strong default - sweep 0.5–0.85 for sensitivity
t (float, default=1.0) – Heat-kernel diffusion time (t >= 0). Used only if propagation=”heat_kernel”. Ignored otherwise.
lam (float, default=1.0) – Laplacian smoothing strength (lambda >= 0). Used only if propagation=”laplacian_solve”. Ignored otherwise.
solve_mode ({"iter", "closed_form"}, default="iter") –
How to compute stationary solutions for restart diffusion. Used only if propagation=”rwr”. Ignored otherwise. Note: propagation=”heat_kernel” uses expm_multiply and propagation=”laplacian_solve” uses a sparse LU solver

options:
- ”iter”: fixed-point iteration to convergence
- ”closed_form”: sparse LU solve of the corresponding linear system
  (I - αW) X* = (1-α) X0
max_iter (int, default=200) – Max iterations for solve_mode=”iter”. Used only when propagation=”rwr” AND solve_mode=”iter”. Ignored otherwise.
tol (float, default=1e-8) – Convergence tolerance for solve_mode=”iter”. Used only when propagation=”rwr” AND solve_mode=”iter”. Ignored otherwise.
chunksize (int, default=64) –
RHS block size (number of samples per block) for LU-based multi-RHS solves. Used when:
- propagation=”rwr” AND solve_mode=”closed_form”
- propagation=”laplacian_solve”
Ignored otherwise.
dtype ({numpy dtype, str}, default=np.float64) – Numeric dtype used for computations.

Returns:

X_out – Diffused feature-by-sample matrix with features (e.g., genes) and samples ordered exactly as in input.

Return type:

pd.DataFrame

Notes

Recommended defaults for undirected PPI + binary mutation matrix:: propagation=”rwr”, transition_norm=”random_walk”, alpha=0.7, solve_mode=”iter”

Examples

RWR default:: df_out = diffuse_network_mutation_profiles(G, df, propagation=”rwr”, alpha=0.7)
Symmetric operator + RWR:: df_out = diffuse_network_mutation_profiles(G, df, transition_norm=”symmetric”, propagation=”rwr”, alpha=0.7)
Heat kernel:: df_out = diffuse_network_mutation_profiles(G, df, propagation=”heat_kernel”, t=1.0)