netflow.methods.network_propagation#

Functions

diffuse_network_mutation_profiles(G, data[, ...])

Compute diffused (network-propagated) profiles for each sample.

netflow.methods.network_propagation._build_operator_W(G: ~networkx.classes.graph.Graph, feats_order: list[str], transition_norm: ~typing.Literal['random_walk', 'symmetric'] = 'random_walk', weight: str | None = None, dtype: str | ~numpy.dtype = <class 'numpy.float64'>) csr_matrix[source]#

Build the sparse graph diffusion operator W over a specified feature ordering.

This constructs adjacency A in the exact order featss_order, then normalizes it into W.

Parameters:
  • G (nx.Graph) – Graph containing the features in feats_order.

  • feats_order – List of features (e.g., genes) IDs defining the matrix row/column order.

  • transition_norm ({"random_walk", "symmetric"}, default="random_walk") –

    How to normalize adjacency into the transition matrix / diffusion operator W.

    Let A be adjacency and D the diagonal degree matrix (D_ii = sum_j A_ij). We construct a sparse operator W used via left-multiplication: W @ X.

    options:

    • ”random_walk”: Computes the random-walk transition P := D^{-1}A.

      The random walk with restart (RWR) update rule is implemented using the operator W := P^T: X_{t+1} = (1-α) X0 + α W X_t. Recommended default choice for network propagation of sparse binary mutations.

    • ”symmetric”: Computes the symmetric normalized adjacency: W = D^{-1/2} A D^{-1/2}.

      Recommended if a symmetric operator is wanted (often nicer mathematically / numerically). Sometimes preferable because it can reduce some degree-driven bias compared with pure random-walk operators.

  • weight ({str, None}, default=None) – Edge attribute name in G to use as weights. If None, edges are treated as weight=1.

  • dtype ({numpy dtype, str}, default=np.float64) – Numeric dtype used for computations.

Returns:

W – W with shape (n_feats, n_feats), CSR format.

Return type:

scipy.sparse.csr_matrix

Notes

  • Self-loops are removed from the adjacency by default (diagonal set to 0).

  • Isolated nodes (degree 0) will not propagate; restart-based diffusions keep them anchored.

netflow.methods.network_propagation._closed_form_solve(
M: csr_matrix,
B: ndarray,
chunksize: int,
) ndarray[source]#

Solve the sparse linear system M X = B for multiple RHS columns via one LU factorization.

Used for closed-form stationary solutions:
  • RWR: (I - αW) X* = (1-α)X0

  • Laplacian: (I + λL) X* = X0

Implementation#

  • Factorize M once using sparse LU (splu).

  • Solve in blocks of RHS columns to limit peak memory.

param M:

Sparse square matrix (n x n).

type M:

csr_matrix

param B:

Dense RHS matrix (n x m).

type B:

np.ndarray

param chunksize:

Number of RHS columns to solve at a time.

type chunksize:

int

returns:

X – Dense solution X with shape (n x m).

rtype:

np.ndarray

Notes

  • For extremely large n, might need iterative linear solvers (cg/gmres) + preconditioning. This LU approach is simple and usually robust for medium-to-large graphs.

netflow.methods.network_propagation._fixed_point_rwr(
W: csr_matrix,
X0: ndarray,
alpha: float,
max_iter: int,
tol: float,
) ndarray[source]#

Fixed-point iteration solver for Random Walk with Restart (RWR) diffusion.

Model#

Update rule:

X_{t+1} = (1-α) X0 + α W X_t

Stationary solution (if it converges):

X* = (1-α) (I - αW)^{-1} X0

param W:

Sparse diffusion operator (n x n).

type W:

csr_matrix

param X0:

Seed matrix (n x m) with m samples.

type X0:

np.ndarray

param alpha:

0 <= α < 1. Larger α => more diffusion.

type alpha:

float

param max_iter:

Maximum iterations.

type max_iter:

int

param tol:
Relative convergence criterion on Frobenius norm:

||X_new - X|| / (||X|| + eps) < tol

type tol:

float

returns:

X – Diffused profiles X* (n x m).

rtype:

np.ndarray

netflow.methods.network_propagation.diffuse_network_mutation_profiles(
G: ~networkx.classes.graph.Graph,
data: str | ~pandas.core.frame.DataFrame,
transition_norm: ~typing.Literal['random_walk',
'symmetric'] = 'random_walk',
weight: str | None = None,
propagation: ~typing.Literal['rwr',
'heat_kernel',
'laplacian_solve'] = 'rwr',
alpha: float = 0.7,
t: float = 1.0,
lam: float = 1.0,
solve_mode: ~typing.Literal['iter',
'closed_form'] = 'iter',
max_iter: int = 200,
tol: float = 1e-08,
chunksize: int = 64,
dtype: str | ~numpy.dtype = <class 'numpy.float64'>,
) DataFrame[source]#

Compute diffused (network-propagated) profiles for each sample.

Given a feature network G (e.g., an undirected PPI) and an input feature matrix X (features x samples, often binary 0/1, e.g., mutations) given by data, compute the diffused / network-smoothed profiles matrix X_out (same shape/order) for each sample by propagating signal along network edges.

Parameters:
  • G (nx.Graph) – Feature network (e.g., nodes are genes). Node IDs must match the IDs in the input data matrix row index.

  • data ({str, pd.DataFrame}) – Feature-by-sample data matrix (rows=features, columns=samples)

  • transition_norm ({"random_walk", "symmetric"}, default="random_walk") –

    How to normalize adjacency into the transition matrix / diffusion operator W.

    Let A be adjacency and D the diagonal degree matrix (D_ii = sum_j A_ij). We construct a sparse operator W used via left-multiplication: W @ X.

    transition_norm options:

    • ”random_walk”: Computes the random-walk transition P = D^{-1}A.

      The random walk with restart (RWR) update rule is implemented using the operator W := P^T: X_{t+1} = (1-α) X0 + α P^T X_t. Recommended default choice for network propagation of sparse binary mutations.

    • ”symmetric”: Computes the symmetric normalized adjacency: W = D^{-1/2} A D^{-1/2}.

      Recommended if a symmetric operator is wanted (often nicer mathematically / numerically). Sometimes preferable because it can reduce some degree-driven bias compared with pure random-walk operators.

  • weight ({str, None}, default=None) – Edge attribute name in G to use as weights. If None, edges are treated as weight=1.

  • propagation ({"rwr", "heat_kernel", "laplacian_solve"}, default="rwr") –

    Which diffusion model to use.

    propagation options

    • ”rwr”Random Walk with Restart
      Update rule:

      X_{k+1} = (1-α) X0 + α W X_k

      Closed form (stationary solution):

      X* = (1-α) (I - αW)^{-1} X0

      Interpretation: - Each step mixes an “anchor to original mutation profile” term (X0) with a

      “spread across network” term (W X_t).

      • α controls locality: smaller α -> more local; larger α -> more global smoothing.

      Examples in literature: - Network-based stratification / network propagation pipelines (e.g., NBS-style smoothing) - Classic “network propagation” for biological signal diffusion

      Recommendation: - Default for smoothing sparse binary mutation profiles prior to similarity/clustering. - Start α ≈ 0.7; sweep 0.5–0.85 for sensitivity. - Set transition_norm = “random_walk”

      Uses parameters: alpha, solve_mode, (max_iter,tol if solve_mode=”iter”),

      (chunksize if solve_mode=”closed_form”)

      Ignores parameters: beta, t, lam

    • ”heat_kernel”Heat-kernel diffusion
      Continuous-time diffusion:

      X*(t) = exp(-t L) X0, with L = I - W

      Recommendation: - Use if you want a “diffusion time” knob t with a multi-scale interpretation. - Often explored by scanning t across scales. - Set transition_norm = “symmetric”

      Uses parameters: t Ignores parameters: alpha, beta, lam, solve_mode, max_iter, tol, chunksize

    • ”laplacian_solve” Laplacian-regularized smoothing
      Linear solve:

      X* = (I + λL)^{-1} X0, with L = I - W

      Equivalent optimization view:

      argmin_X ||X - X0||_F^2 + λ * Tr(X^T L X)

      Recommendation: - Use when you prefer a regularization knob λ and a single sparse solve. - Often stable and efficient for large graphs and many samples. - Set transition_norm = “symmetric”

      Uses parameters: lam, chunksize Ignores parameters: alpha, beta, t, solve_mode, max_iter, tol

  • alpha (float, default=0.7) –

    RWR diffusion strength (restart complement). Must satisfy 0 <= alpha < 1. Larger alpha => more diffusion; smaller alpha => more anchored to X0 (less diffusion). Used only if propagation=”rwr”. Ignored otherwise.

    Practical guidance based on literature (binary mutations on undirected PPI): - alpha ~ 0.7 is a strong default - sweep 0.5–0.85 for sensitivity

  • t (float, default=1.0) – Heat-kernel diffusion time (t >= 0). Used only if propagation=”heat_kernel”. Ignored otherwise.

  • lam (float, default=1.0) – Laplacian smoothing strength (lambda >= 0). Used only if propagation=”laplacian_solve”. Ignored otherwise.

  • solve_mode ({"iter", "closed_form"}, default="iter") –

    How to compute stationary solutions for restart diffusion. Used only if propagation=”rwr”. Ignored otherwise. Note: propagation=”heat_kernel” uses expm_multiply and propagation=”laplacian_solve” uses a sparse LU solver

    options:

    • ”iter”: fixed-point iteration to convergence

    • ”closed_form”: sparse LU solve of the corresponding linear system

      (I - αW) X* = (1-α) X0

  • max_iter (int, default=200) – Max iterations for solve_mode=”iter”. Used only when propagation=”rwr” AND solve_mode=”iter”. Ignored otherwise.

  • tol (float, default=1e-8) – Convergence tolerance for solve_mode=”iter”. Used only when propagation=”rwr” AND solve_mode=”iter”. Ignored otherwise.

  • chunksize (int, default=64) –

    RHS block size (number of samples per block) for LU-based multi-RHS solves. Used when:

    • propagation=”rwr” AND solve_mode=”closed_form”

    • propagation=”laplacian_solve”

    Ignored otherwise.

  • dtype ({numpy dtype, str}, default=np.float64) – Numeric dtype used for computations.

Returns:

X_out – Diffused feature-by-sample matrix with features (e.g., genes) and samples ordered exactly as in input.

Return type:

pd.DataFrame

Notes

Recommended defaults for undirected PPI + binary mutation matrix:

propagation=”rwr”, transition_norm=”random_walk”, alpha=0.7, solve_mode=”iter”

Examples

RWR default:

df_out = diffuse_network_mutation_profiles(G, df, propagation=”rwr”, alpha=0.7)

Symmetric operator + RWR:

df_out = diffuse_network_mutation_profiles(G, df, transition_norm=”symmetric”, propagation=”rwr”, alpha=0.7)

Heat kernel:

df_out = diffuse_network_mutation_profiles(G, df, propagation=”heat_kernel”, t=1.0)