netflow.methods.network_propagation#
Functions
|
Compute diffused (network-propagated) profiles for each sample. |
- netflow.methods.network_propagation._build_operator_W(G: ~networkx.classes.graph.Graph, feats_order: list[str], transition_norm: ~typing.Literal['random_walk', 'symmetric'] = 'random_walk', weight: str | None = None, dtype: str | ~numpy.dtype = <class 'numpy.float64'>) csr_matrix[source]#
Build the sparse graph diffusion operator W over a specified feature ordering.
This constructs adjacency A in the exact order featss_order, then normalizes it into W.
- Parameters:
G (nx.Graph) – Graph containing the features in feats_order.
feats_order – List of features (e.g., genes) IDs defining the matrix row/column order.
transition_norm ({"random_walk", "symmetric"}, default="random_walk") –
How to normalize adjacency into the transition matrix / diffusion operator W.
Let A be adjacency and D the diagonal degree matrix (D_ii = sum_j A_ij). We construct a sparse operator W used via left-multiplication: W @ X.
options:
- ”random_walk”: Computes the random-walk transition P := D^{-1}A.
The random walk with restart (RWR) update rule is implemented using the operator W := P^T: X_{t+1} = (1-α) X0 + α W X_t. Recommended default choice for network propagation of sparse binary mutations.
- ”symmetric”: Computes the symmetric normalized adjacency: W = D^{-1/2} A D^{-1/2}.
Recommended if a symmetric operator is wanted (often nicer mathematically / numerically). Sometimes preferable because it can reduce some degree-driven bias compared with pure random-walk operators.
weight ({str, None}, default=None) – Edge attribute name in G to use as weights. If None, edges are treated as weight=1.
dtype ({numpy dtype, str}, default=np.float64) – Numeric dtype used for computations.
- Returns:
W – W with shape (n_feats, n_feats), CSR format.
- Return type:
scipy.sparse.csr_matrix
Notes
Self-loops are removed from the adjacency by default (diagonal set to 0).
Isolated nodes (degree 0) will not propagate; restart-based diffusions keep them anchored.
- netflow.methods.network_propagation._closed_form_solve(
- M: csr_matrix,
- B: ndarray,
- chunksize: int,
Solve the sparse linear system M X = B for multiple RHS columns via one LU factorization.
- Used for closed-form stationary solutions:
RWR: (I - αW) X* = (1-α)X0
Laplacian: (I + λL) X* = X0
Implementation#
Factorize M once using sparse LU (splu).
Solve in blocks of RHS columns to limit peak memory.
- param M:
Sparse square matrix (n x n).
- type M:
csr_matrix
- param B:
Dense RHS matrix (n x m).
- type B:
np.ndarray
- param chunksize:
Number of RHS columns to solve at a time.
- type chunksize:
int
- returns:
X – Dense solution X with shape (n x m).
- rtype:
np.ndarray
Notes
For extremely large n, might need iterative linear solvers (cg/gmres) + preconditioning. This LU approach is simple and usually robust for medium-to-large graphs.
- netflow.methods.network_propagation._fixed_point_rwr( ) ndarray[source]#
Fixed-point iteration solver for Random Walk with Restart (RWR) diffusion.
Model#
- Update rule:
X_{t+1} = (1-α) X0 + α W X_t
- Stationary solution (if it converges):
X* = (1-α) (I - αW)^{-1} X0
- param W:
Sparse diffusion operator (n x n).
- type W:
csr_matrix
- param X0:
Seed matrix (n x m) with m samples.
- type X0:
np.ndarray
- param alpha:
0 <= α < 1. Larger α => more diffusion.
- type alpha:
float
- param max_iter:
Maximum iterations.
- type max_iter:
int
- param tol:
- Relative convergence criterion on Frobenius norm:
||X_new - X|| / (||X|| + eps) < tol
- type tol:
float
- returns:
X – Diffused profiles X* (n x m).
- rtype:
np.ndarray
- netflow.methods.network_propagation.diffuse_network_mutation_profiles(
- G: ~networkx.classes.graph.Graph,
- data: str | ~pandas.core.frame.DataFrame,
- transition_norm: ~typing.Literal['random_walk',
- 'symmetric'] = 'random_walk',
- weight: str | None = None,
- propagation: ~typing.Literal['rwr',
- 'heat_kernel',
- 'laplacian_solve'] = 'rwr',
- alpha: float = 0.7,
- t: float = 1.0,
- lam: float = 1.0,
- solve_mode: ~typing.Literal['iter',
- 'closed_form'] = 'iter',
- max_iter: int = 200,
- tol: float = 1e-08,
- chunksize: int = 64,
- dtype: str | ~numpy.dtype = <class 'numpy.float64'>,
Compute diffused (network-propagated) profiles for each sample.
Given a feature network
G(e.g., an undirected PPI) and an input feature matrix X (features x samples, often binary 0/1, e.g., mutations) given bydata, compute the diffused / network-smoothed profiles matrixX_out(same shape/order) for each sample by propagating signal along network edges.- Parameters:
G (nx.Graph) – Feature network (e.g., nodes are genes). Node IDs must match the IDs in the input
datamatrix row index.data ({str, pd.DataFrame}) – Feature-by-sample data matrix (rows=features, columns=samples)
transition_norm ({"random_walk", "symmetric"}, default="random_walk") –
How to normalize adjacency into the transition matrix / diffusion operator W.
Let A be adjacency and D the diagonal degree matrix (D_ii = sum_j A_ij). We construct a sparse operator W used via left-multiplication: W @ X.
transition_norm options:
- ”random_walk”: Computes the random-walk transition P = D^{-1}A.
The random walk with restart (RWR) update rule is implemented using the operator W := P^T: X_{t+1} = (1-α) X0 + α P^T X_t. Recommended default choice for network propagation of sparse binary mutations.
- ”symmetric”: Computes the symmetric normalized adjacency: W = D^{-1/2} A D^{-1/2}.
Recommended if a symmetric operator is wanted (often nicer mathematically / numerically). Sometimes preferable because it can reduce some degree-driven bias compared with pure random-walk operators.
weight ({str, None}, default=None) – Edge attribute name in G to use as weights. If None, edges are treated as weight=1.
propagation ({"rwr", "heat_kernel", "laplacian_solve"}, default="rwr") –
Which diffusion model to use.
propagation options
- ”rwr”Random Walk with Restart
- Update rule:
X_{k+1} = (1-α) X0 + α W X_k
- Closed form (stationary solution):
X* = (1-α) (I - αW)^{-1} X0
Interpretation: - Each step mixes an “anchor to original mutation profile” term (X0) with a
“spread across network” term (W X_t).
α controls locality: smaller α -> more local; larger α -> more global smoothing.
Examples in literature: - Network-based stratification / network propagation pipelines (e.g., NBS-style smoothing) - Classic “network propagation” for biological signal diffusion
Recommendation: - Default for smoothing sparse binary mutation profiles prior to similarity/clustering. - Start α ≈ 0.7; sweep 0.5–0.85 for sensitivity. - Set transition_norm = “random_walk”
- Uses parameters: alpha, solve_mode, (max_iter,tol if solve_mode=”iter”),
(chunksize if solve_mode=”closed_form”)
Ignores parameters: beta, t, lam
- ”heat_kernel”Heat-kernel diffusion
- Continuous-time diffusion:
X*(t) = exp(-t L) X0, with L = I - W
Recommendation: - Use if you want a “diffusion time” knob t with a multi-scale interpretation. - Often explored by scanning t across scales. - Set transition_norm = “symmetric”
Uses parameters: t Ignores parameters: alpha, beta, lam, solve_mode, max_iter, tol, chunksize
- ”laplacian_solve” Laplacian-regularized smoothing
- Linear solve:
X* = (I + λL)^{-1} X0, with L = I - W
- Equivalent optimization view:
argmin_X ||X - X0||_F^2 + λ * Tr(X^T L X)
Recommendation: - Use when you prefer a regularization knob λ and a single sparse solve. - Often stable and efficient for large graphs and many samples. - Set transition_norm = “symmetric”
Uses parameters: lam, chunksize Ignores parameters: alpha, beta, t, solve_mode, max_iter, tol
alpha (float, default=0.7) –
RWR diffusion strength (restart complement). Must satisfy 0 <= alpha < 1. Larger alpha => more diffusion; smaller alpha => more anchored to X0 (less diffusion). Used only if propagation=”rwr”. Ignored otherwise.
Practical guidance based on literature (binary mutations on undirected PPI): - alpha ~ 0.7 is a strong default - sweep 0.5–0.85 for sensitivity
t (float, default=1.0) – Heat-kernel diffusion time (t >= 0). Used only if propagation=”heat_kernel”. Ignored otherwise.
lam (float, default=1.0) – Laplacian smoothing strength (lambda >= 0). Used only if propagation=”laplacian_solve”. Ignored otherwise.
solve_mode ({"iter", "closed_form"}, default="iter") –
How to compute stationary solutions for restart diffusion. Used only if propagation=”rwr”. Ignored otherwise. Note: propagation=”heat_kernel” uses expm_multiply and propagation=”laplacian_solve” uses a sparse LU solver
options:
”iter”: fixed-point iteration to convergence
- ”closed_form”: sparse LU solve of the corresponding linear system
(I - αW) X* = (1-α) X0
max_iter (int, default=200) – Max iterations for solve_mode=”iter”. Used only when propagation=”rwr” AND solve_mode=”iter”. Ignored otherwise.
tol (float, default=1e-8) – Convergence tolerance for solve_mode=”iter”. Used only when propagation=”rwr” AND solve_mode=”iter”. Ignored otherwise.
chunksize (int, default=64) –
RHS block size (number of samples per block) for LU-based multi-RHS solves. Used when:
propagation=”rwr” AND solve_mode=”closed_form”
propagation=”laplacian_solve”
Ignored otherwise.
dtype ({numpy dtype, str}, default=np.float64) – Numeric dtype used for computations.
- Returns:
X_out – Diffused feature-by-sample matrix with features (e.g., genes) and samples ordered exactly as in input.
- Return type:
pd.DataFrame
Notes
- Recommended defaults for undirected PPI + binary mutation matrix:
propagation=”rwr”, transition_norm=”random_walk”, alpha=0.7, solve_mode=”iter”
Examples
- RWR default:
df_out = diffuse_network_mutation_profiles(G, df, propagation=”rwr”, alpha=0.7)
- Symmetric operator + RWR:
df_out = diffuse_network_mutation_profiles(G, df, transition_norm=”symmetric”, propagation=”rwr”, alpha=0.7)
- Heat kernel:
df_out = diffuse_network_mutation_profiles(G, df, propagation=”heat_kernel”, t=1.0)