netflow.pose.organization#
organization#
Description
The purpose of This module is to construct the organization of the schema from a distance matrix and a single (or multiple) data point(s) designated as the source.
This is done by using the branch detection algorithm from the diffusion pseudo-time (DPT) algorithm for reconstructing developmental progression and differentiation of cells proposed in [Haghverdi16] as implemented in scanpy.
Acknowledgement
A large portion of the code was taken from
scanpy.tools._dpt.py and code related to
the method scanpy.tools._dpt.dpt
.
Some noted differences made in scanpy implementation :
Add smoothing when computing maximal correlation cutoff
Include points not identified with any branch after split in the trunk (nonunique).
To do:
Set branchable aspect of TreeNode.
Functions
Compute the multi-scale transition matrix based on the elbow of the Von Neumann Entropy (VNE) as described in GSPA and PHATE KrishnaswamyLab/spARC, https://pdfs.semanticscholar.org/16ab/e92b7630d5b84b904bde97dad9b9fbce406c.pdf. |
|
|
Compute the row-stochastic transition matrix. |
Compute the symmetric diffusion affinity transition matrix from KrishnaswamyLab/graphtools. |
|
|
Compute symmetric and asymmetric transition matrices and store in keeper. |
|
Compute the diffusion pseudotime metric between observations, computed from the symmetric transitions. |
|
Compute the pose and saved to keeper. |
|
Returns root index of observation that leads to the largest triangle.. |
Classes
|
|
|
Tree implemenation as a collection of TreeNode objects. |
|
Node of a general tree data structure. |
- class netflow.pose.organization.POSER(keeper, key, root=None, root_as_tip=False, min_branch_size=5, choose_largest_segment=False, flavor='haghverdi16', allow_kendall_tau_shift=False, smooth_corr=True, brute=True, split=True, connect_closest=False, connect_trunk='classic', verbose=None)[source]#
- Parameters:
keeper (netflow.Keeper) – The keeper object that stores the distance matrix of size (n_observations, n_observations).
key (str) – The label used to reference the distance matrix stored in
keeper.distances
, of size (n_observations, n_observations).root ({None, int, ‘density’, ‘density_inv’, ‘ratio’}) –
The root. If None, ‘density’ is used.
Options:
int : index of observation
’density’ : select observation with minimal distance-density
’density_inv’ : select observation with maximal distance-density
’ratio’ : select observation which leads to maximal triangular ratio distance
root_as_tip (bool) – If True, force first tip as the root. Defaults to False following scanpy implementation.
min_branch_size ({int, float}) – During recursive splitting of branches, only consider splitting a branch with at least
min_branch_size > 2
data points. If a float,min_branch_size
refers to the fraction of the total number of data points (0 < min_branch_size < 1
).choose_largest_segment (bool) – If True, select largest segment for branching.
flavor ({'haghverdi16', 'wolf17_tri', 'wolf17_bi', 'wolf17_bi_un'}) – Branching algorithm (based on scanpy implementation).
allow_kendall_tau_shift (bool) – If a very small branch is detected upon splitting, shift away from maximum correlation in Kendall tau criterion of [Haghverdi16] to stabilize the splitting.
smooth_corr (bool, default = False) – If True, smooth correlations before identifying cut points for branch splitting.
brute (bool) – If True, data points not associated with any branch upon split are combined with undecided (trunk) points. Otherwise, if False, they are treated as individual islands, not associated with any branch (and assigned branch index -1).
split (bool (default = True)) – if True, split segment into multiple branches. Otherwise, determine a single branching off of the main segment. This is ignored if flavor is not ‘haghverdi16’. If True,
brute
is ignored.connect_closest (bool (default = False)) – If True, connect branches by points with smallest distance between the branches. Otherwise, connect by continuum of ordering.
connect_trunk ({'classic', 'endpoint', 'dual'}, default = 'classic') –
Specify how to connect segments to unresolved/unidentified trunk. Note, this only applies when a split results in a trunk consisting of unresolved/unidentified points. Additionally, this is ignored if
flavor ~= 'haghverdi16'
. It is also ignored Ifflavor = `haghverdi16'
andsplit = False
.Options:
classic : point identified in trunk is connected to the point in the segment closest to it
endpoint : point identified in trunk is connected to the the segment’s second tip
dual : point identified in trunk is connected to both points determined by classic and endpoint
- __detect_branching_haghverdi16(Dseg: ndarray, tips: ndarray) ndarray #
Detect branching on given segment.
Compute point that maximizes kendall tau correlation of the sequences of distances to the second and the third tip, respectively, when ‘moving away’ from the first tip: tips[0]. ‘Moving away’ means moving in the direction of increasing distance from the first tip.
- Parameters:
Dseg – The distance matrix restricted to segment.
tips – The three tip points in local coordinates to the segment. They form a ‘triangle’ that contains the data.
- Returns:
branch – Segment obtained from “splitting away the first tip data point”, where k is the number of data points in the branch.
- Return type:
numpy.ndarray (k,)
- __init__(keeper, key, root=None, root_as_tip=False, min_branch_size=5, choose_largest_segment=False, flavor='haghverdi16', allow_kendall_tau_shift=False, smooth_corr=True, brute=True, split=True, connect_closest=False, connect_trunk='classic', verbose=None)[source]#
- __module__ = 'netflow.pose.organization'#
- _construct_mst_topology(annotate=True)[source]#
Construct MST backbone topology graph.
- Parameters:
annotate (bool) – If True, add observation label as node attribute, referenced by ‘name’.
- Returns:
Gmst – Graph where each node is a data point and edges reflect MST connections between them.
- Return type:
networkx.Graph
- _construct_topology(segs, annotate=True)[source]#
Construct POSE connections between data points.
- Parameters:
segs (dict) – The banched segments indexed by the node’s unique identifier.
annotate (bool) – If True, annotate edges with edge origin and distance.
- Returns:
G – Graph where each node is a data point and edges reflect connections between them. If
annotate
is True, the following annotations are added:Edges have attributes:
’connection’ : (str) ‘intra-branch’ or ‘inter-branch’}
Nodes have attributes:
’branch’ : (int) -1, 0, 1, … where -1 indicates the data point was not identified with a branch
’undecided’ : (bool) True if the data point is part of a trunk and False otherwise
’name’ : (str) Original label if given data was a dataframe, otherwise the same as the node id
’unidentified’ : (0 or 1) 1 if data point was ever not associated with any branch upon split, 0 otherwise.
- Return type:
networkx.Graph
- _detect_branch(Dseg: ndarray, tips: ndarray)[source]#
Detect branching on given segment.
If
self.split
, Call function __detect_branching three times for all three orderings of tips. Points that do not belong to the same segment in all three orderings are assigned to a fourth segment. The latter is, by Haghverdi et al. (2016) referred to as ‘undecided points’ (which make up the so-called ‘trunk’). Otherwise, only the branch off the main segment is detected from the third tip.If
split
andflavor == 'haghverdi16'
: If any of the branches from the three consist of zero unique observations, resulting in an empty branch, the process is terminated and no branching is performed on the current segment...note:
In practice, this has only occurred in small segments. If finer resolution partitioning is desired, this may be changed in a future release to account for an offshoot resulting in two branches (and possibly a trunk with undecided points).
- Parameters:
Dseg – The distance matrix restricted to segment.
tips (numpy.ndarray) – Tips in local coordinates relative to the segment.
- Returns:
ssegs (list[list]) – Stores branched segments in local coordinates.
ssegs_tips (list[list]) – Stores all tip points in local coordinates for the segments in
ssegs
.ssegs_connects list[list] – A list of k lists, where k is the number of inter-segment connections between the segments in
ssegs
. Each entry is a 2-list of the form [[index of first seg inssegs
, index of second seg inssegs
], [source observation, target observation]].trunk (int) – Index of segment in ssegs that all other segments in
ssegs
stem from.trunk_undecided (bool) – If True, the trunk are made up of undecided points.
unidentified_points (set) – Points in local coordinates relative to the segment before branching that are not associated with any branch after splitting.
- _detect_branching_single_haghverdi16(Dseg: ndarray, tips: ndarray)[source]#
Detect branching on given segment.
- Parameters:
Dseg –
tips –
- Returns:
ssegs – The branched segments.
- Return type:
list[numpy.ndarray]
- _kendall_tau_add(len_old: int, diff_pos: int, tau_old: float)[source]#
Compute Kendall tau delta.
The new sequence has length len_old + 1.
- Parameters:
len_old – The length of the old sequence, used to compute tau_old.
diff_pos – Difference between concordant and non-concordant pairs.
tau_old – Kendall rank correlation of the old sequence.
- _kendall_tau_diff(a: ndarray, b: ndarray, i) Tuple[int, int] [source]#
Compute difference in concordance of pairs in split sequences.
Consider splitting a and b at index i.
- Parameters:
a (numpy.ndarray) – One dimensional sequences.
b (numpy.ndarray) – One dimensional sequences.
i (int) – Index for splitting
a
andb
.
- Returns:
diff_pos – Difference between concordant pairs for both subsequences.
diff_neg – Difference between non-concordant pairs for both subsequences.
- _kendall_tau_subtract(len_old: int, diff_neg: int, tau_old: float)[source]#
Compute Kendall tau delta.
The new sequence has length len_old - 1.
- Parameters:
len_old – The length of the old sequence, used to compute tau_old.
diff_neg – Difference between concordant and non-concordant pairs.
tau_old – Kendall rank correlation of the old sequence.
- branchings_segments(n_branches, until_branched=False, annotate=True)[source]#
Detect up to n_branches branches and partition the data into corresponding segments.
- Parameters:
n_branches (int) – Number of branches to look for (
n_branches > 0
).until_branched (bool) –
If True, iteratively find segment to branch and perform branching until a segement is successfully branched or no branchable segments remain. Otherwise, if False, attempt to perform branching only once on the next potentially branchable segment.
..note:
This is only applicable when branching is being performed. If previous iterations of branching has already been performed, it is not possible to identify the number of iterations where no branching was performed.
annotate (bool) – If True, annotate nodes with root and tips.
- Returns:
G – The graph of the resulting POSE.
- Return type:
nx.Graph
- construct_pose_mst_nn_topology(G, mutual=False, k_mnn=1, annotate=True)[source]#
Add nearest neighbor (nn) edges to MST POSE topology.
Note
Mutual nns tend to be sparser than nns so allow to select more than just the first nn if restricting to mutual neighbors.
- Parameters:
G (networkx.Graph) – Nearest-neighbor edges are added to a copy of the MST POSE graph.
mutual (bool (default = False)) – If True, add
k_mnn
mutual nn edges. Otherwise, add single nn edge. When False,k_mnn
is ignored.k_mnn (int (
0 < k_mnn < len(G)
)) – The number of nns to consider when extracting mutual nns. Note, this is ignored whenmutual
is False.annotate (bool) – If True, annotate edges.
- Returns:
Gnn – The updated graph with nearest neighbor edges. If
annotate
is True, edge attribute “edge_origin” is added with the possible values :”POSE” : for edges in the original MST graph that are not nearest neighbor edges
”NN” : for nearest neighbor edges that were not in the original MST graph
”POSE + NN” : for edges in the original MST graph that are also nearest neighbor edges
- Return type:
networkx.Graph
- construct_pose_mst_topology(G)[source]#
Construct pose topology with minimum spanning tree (MST) edges.
- Parameters:
G (networkx.Graph) – The POSE graph. MST edges are added to a copy of the graph.
- Returns:
Gmst – The updated graph with MST edges and edge attribute “edge_origin” with the possible values :
”POSE” : for edges in the original graph that are not MST edges
”MST” : for MST edges that were not in the original graph
”POSE + MST” : for edges in the original graph that are also MST edges
- Return type:
networkx.Graph
- construct_pose_nn_topology(G, mutual=False, k_mnn=3, annotate=True)[source]#
Add nearest neighbor (nn) edges to POSE topology.
Note
Mutual nns tend to be sparser than nns so allow to select more than just the first nn if restricting to mutual neighbors.
- Parameters:
G (networkx.Graph) – Nearest-neighbor edges are added to a copy of the POSE graph.
mutual (bool (default = False)) – If True, add
k_mnn
mutual nn edges. Otherwise, add single nn edge. When False,k_mnn
is ignored.k_mnn (int (
0 < k_mnn < len(G)
)) – The number of nns to consider when extracting mutual nns. Note, this is ignored whenmutual
is False.annotate (bool) – If True, annotate edges.
- Returns:
Gnn – The updated graph with nearest neighbor edges. If
annotate
is True, edge attribute “edge_origin” is added with the possible values :”POSE” : for edges in the original graph that are not nearest neighbor edges
”NN” : for nearest neighbor edges that were not in the original graph
”POSE + NN” : for edges in the original graph that are also nearest neighbor edges
- Return type:
networkx.Graph
- detect_branches(n_branches, until_branched=False)[source]#
Detect up to
n_branches
branchings and update tree in place.- Parameters:
n_branches (int) – Number of branch splits to perform (
n_branches > 0
).until_branched (bool) –
If True, iteratively find segment to branch and perform branching until a segement is successfully branched or no branchable segments remain. Otherwise, if False, attempt to perform branching only once on the next potentially branchable segment.
..note:
This is only applicable when branching is being performed. If previous iterations of branching has already been performed, it is not possible to identify the number of iterations where no branching was performed.
- detect_branching(node)[source]#
Detect branching on a given segment and update TreeNode parameters in place.
- Parameters:
node (TreeNode) – The node of the segment to be branched.
- Returns:
updated – True if segment is successfully branched, False otherwise.
- Return type:
bool
- extract_branchings(n_branches)[source]#
Extract POSE from up to n_branches branchings
- Parameters:
n_branches (int) – Number of branches to look for (
n_branches > 0
).- Returns:
tree – The tree with up to n_branches branchings. If
n_branches
is more than or equal to the number of branchings in the tree, the original tree is returned. Otherwise, a reduced tree is returned.- Return type:
Tree
- identify_local_tips(Dseg, newseg, tip)[source]#
Identify new tips within the new segments
- Parameters:
newseg (list) – New segment (local with respect to original segment).
tip (int) – Local index of the first tip, with respect to the original segment that determinned
Dseg
before the split.
- Returns:
tips – First and second tip indices in local coordinates relative to the original segment, before it was branched.
- Return type:
np.ndarray (2,)
- kendall_tau_split(a, b, min_length=5) int [source]#
Return splitting index that maximizes correlation in the sequences.
Compute difference in Kendall tau for all splitted sequences.
For each splitting index i, compute the difference of the two correlation measures kendalltau(a[:i], b[:i]) and kendalltau(a[i:], b[i:]).
- Returns the splitting index that maximizes
kendalltau(a[:i], b[:i]) - kendalltau(a[i:], b[i:])
- Parameters:
a (numpy.ndarray) – One dimensional sequences.
b (numpy.ndarray) – One dimensional sequences.
min_length (int, (
min_length > 0
)) – Minimum number of data points automatically included in branch.
- Returns:
imax – Splitting index according to above description.
- Return type:
int
- select_segment()[source]#
Select segment with most distant triangulated data point.
- Returns:
node – The node corresponding to the selected segment. If no nodes are branchable, returns None.
- Return type:
TreeNode
- single_branch(until_branched=False)[source]#
Perform single branching in place.
- Parameters:
until_branched (bool) – If True, iteratively find segment to branch and perform branching until a segement is successfully branched or no branchable segments remain. Otherwise, if False, attempt to perform branching only once on the next potentially branchable segment.
- Returns:
branched_flag – Indicates if branching was successfully completed.
- Return type:
bool
- class netflow.pose.organization.Tree[source]#
Tree implemenation as a collection of TreeNode objects.
Intended to represent the hierarchical branching.
- __module__ = 'netflow.pose.organization'#
- _get_node_from_counter(counter)[source]#
Search and return node in Tree by its counter ID.
Assumes no nodes have the same counter ID.
If no such node is found with the specified counter, None is returned.
- Parameters:
counter (int) – Counter ID of node to search for.
- Returns:
node – Node in the tree. If node is not found, returns None.
- Return type:
TreeNode
- _search_counter(counter)[source]#
Search and return index of node in Tree by its counter ID.
Assumes no nodes have the same couner.
If no such node is found with the specified counter, the value -1 is returned.
- Parameters:
counter (int) – Counter ID of node to search for.
- Returns:
index – Index of node in the tree. If node is not found, returns -1.
- Return type:
int
- co_branch_indicator()[source]#
Return binary symmetric pandas.DataFrame of size (num_data_points, num_data_points) where the (i,j)-th entry is 1 if the i-th and j-th data points are found in the same node (i.e., branch) and i is not the same data point as j. Otherwise, if i = j, or if the i-th and j-th data points are not found in the same node, the (i.j)-th entry is 0.
- get_node_from_name(name, bottom_up=True)[source]#
Search and return node in Tree by its name.
Assumes no nodes at the same depth have the same name. If more than one node has the same name, return the node of the deepest node (farthest from root), when
bottom_up = True
, otherwise, return the index of the shallowest (closest to root) node.If no such node is found with the specified name, None is returned.
- Parameters:
name – Name of node to search for.
bottom_up (bool) – Indicate if the index of the shallowest or deepest node should be returned when more than one node has the same name. It is assumed that no two nodes at the same depth have the same name.
- Returns:
node – Node in the tree. If node is not found, returns None.
- Return type:
TreeNode
- insert(node, index=None, parent=None)[source]#
Insert a node into the Tree.
- Parameters:
node (TreeNode) – Node to insert.
index ({None, int}) –
Index in list of nodes where the node should be inserted. (Intended to match current structure for updating segments until tree structure is fully leveraged (e.g., using tree leaf nodes when searching for which segment to select).
If None, the node is appended to the end of the list.
parent ({None, TreeNode}) – Parent node. If None, node is set as the root node.
- search(name, bottom_up=True)[source]#
Search and return index of node in Tree by its name.
Assumes no nodes at the same depth have the same name. If more than one node has the same name, return the index of the deepest node (farthest from root), when
bottom_up = True
, otherwise, return the index of the shallowest (closest to root) node.If no such node is found with the specified name, the value -1 is returned.
- Parameters:
name – Name of node to search for.
bottom_up (bool) – Indicate if the index of the shallowest or deepest node should be returned when more than one node has the same name. It is assumed that no two nodes at the same depth have the same name.
- Returns:
index – Index of node in the tree. If node is not found, returns -1.
- Return type:
int
- search_data(value, bottom_up=True)[source]#
Search and return index of node in Tree with value in node data.
If the value is in the data of more than one node, return the index of the deepest node (farthest from root), when
bottom_up = True
, otherwise, return the index of the shallowest (closest to root) node.If no such node is found with the specified value in its data, the value -1 is returned.
- Parameters:
value – Value in node data to search for.
bottom_up (bool) – Indicate if the index of the shallowest or deepest node should be returned when more than one node has the same name. It is assumed that no two nodes at the same depth have the same name.
- Returns:
index – Index of node in the tree. If node is not found, returns -1.
- Return type:
int
- class netflow.pose.organization.TreeNode(name='root', data=None, children=None, parent=None, nonunique=None, unidentified=None, branchable=True, is_trunk=None)[source]#
Node of a general tree data structure.
Each node is intended to refer to a branch.
- Parameters:
name – Reference name of node (branch).
data – Data associated with the node. Intended to be a list of indices corresponding to the branch members.
children (list [TreeNode]) – List of children TreeNode objects.
parent (TreeNode) – Parent TreeNode object.
nonunique (bool) – Indicate if node (branch) is the trunk.
unidentified (bool) – Indicate if node (branch) is a set of points that were not identified with a particular branch after splitting.
branchable (bool) – Indicate if node can potentially be further branched.
is_trunk (bool) – Indicate if node referes to undecided trunk branch.
- __init__(name='root', data=None, children=None, parent=None, nonunique=None, unidentified=None, branchable=True, is_trunk=None)[source]#
- __module__ = 'netflow.pose.organization'#
- netflow.pose.organization._compute_transitions(similarity=None, density_normalize: bool = True)[source]#
Compute transition matrix.
- Parameters:
similarity (numpy.ndarray, (n_observations, n_observations)) – Symmetric similarity measure (with 1s on the diagonal).
density_normalize (bool) – The density rescaling of Coifman and Lafon (2006): Then only the geometry of the data matters, not the sampled density.
- Returns:
transitions_asym (numpy.ndarray, (n_observations, n_observations)) – Asymmetric Transition matrix.
transitions_sym (numpy.ndarray, (n_observations, n_observations)) – Symmetric Transition matrix.
Notes
Code copied from scanpy.neighbors.
- netflow.pose.organization._dpt_from_augmented_sym_transitions(T, n_comps: int = 0, return_eigs=False)[source]#
Return the diffusion pseudotime metric between observations, computed from the symmetric transitions.
Note
\(T\) is the symmetric transition matrix
\(M(x,z) = \sum_{i=1}^{n-1} (\lambda_i * (1 - \lambda_i))\psi_i(x)\psi_i^T(z)\)
\(dpt(x,z) = ||M(x, .) - M(y, .)||^2\)
- Parameters:
T (numpy.ndarray, (n_observations, n_observations)) – Symmetric transitions.
n_comps – Number of eigenvalues/vectors to be computed, set
n_comps = 0
to compute the whole spectrum. Alternatively, if setn_comps >= n_observations
, the whole spectrum will be computed.
- Returns:
dpt – Pairwise-observation Diffusion pseudotime distances.
- Return type:
numpy.ndarray, (n_observations, n_observations)
- netflow.pose.organization.compute_multiscale_VNE_transitions_from_similarity(keeper, similarity_key, tau_max=None, do_save=True)[source]#
Compute the multi-scale transition matrix based on the elbow of the Von Neumann Entropy (VNE) as described in GSPA and PHATE KrishnaswamyLab/spARC, https://pdfs.semanticscholar.org/16ab/e92b7630d5b84b904bde97dad9b9fbce406c.pdf.
- Parameters:
keeper (netflow.Keeper) – The keeper object.
similarity_key (str) – Reference key to the numpy.ndarray, (n_observations, n_observations) symmetric similarity measure (with 1s on the diagonal) stored in the similarities in the keeper.
tau_max (int) – Max scale
tau
tested for VNE (default is 100).do_save (bool) – If True, save to
keeper
.
- Returns:
P (numpy.ndarray (n_observations, n_observations)) – The symmetric VNE multi-scale transition matrix (with 0s on the diagonals). If
do_save
is True,P
is added to thekeeper.misc
with the key'transitions_sym_multiscaleVNE_{similarity_key}'
P_asym (numpy.ndarray (n_observations, n_observations)) – The random-walk VNE multi-scale transition matrix (with 0s on the diagonals). If
do_save
is True,P_asym
is added to thekeeper.misc
with the key'transitions_multiscaleVNE_{similarity_key}'
- netflow.pose.organization.compute_rw_transitions(keeper, similarity_key, do_save=True)[source]#
Compute the row-stochastic transition matrix.
- Parameters:
keeper (netflow.Keeper) – The keeper object.
similarity_key (str) – Reference key to the numpy.ndarray, (n_observations, n_observations) symmetric similarity measure (with 1s on the diagonal) stored in the similarities in the keeper.
do_save (bool) – If True, save to
keeper
.
- Returns:
P – The row-stochastic transition matrix (with 0s on the diagonals). If
do_save
is True,P
is added to thekeeper.misc
with the key'transitions_rw_{similarity_key}'
- Return type:
numpy.ndarray (n_observations, n_observations)
- netflow.pose.organization.compute_sym_diffusion_affinity_transitions(keeper, similarity_key, do_save=True)[source]#
Compute the symmetric diffusion affinity transition matrix from KrishnaswamyLab/graphtools.
\[P_{ij} = K_{ij} * (d_i * d_j)^{-1/2}\]where \(d_i = \sum_r K_{ir}\) is the degree (row sum) of observation \(i\).
- Parameters:
keeper (netflow.Keeper) – The keeper object.
similarity_key (str) – Reference key to the numpy.ndarray, (n_observations, n_observations) symmetric similarity measure (with 1s on the diagonal) stored in the similarities in the keeper.
do_save (bool) – If True, save to
keeper
.
- Returns:
P – The symmetric diffusion affinity transition matrix (with 0s on the diagonals). If
do_save
is True,P
is added to thekeeper.misc
with the key'transitions_sym_diff_aff_{similarity_key}'
- Return type:
numpy.ndarray (n_observations, n_observations)
- netflow.pose.organization.compute_transitions(keeper, similarity_key, density_normalize: bool = True)[source]#
Compute symmetric and asymmetric transition matrices and store in keeper.
- Parameters:
keeper (netflow.Keeper) – The keeper object.
similarity_key (str) – Reference key to the numpy.ndarray, (n_observations, n_observations) symmetric similarity measure (with 1s on the diagonal) stored in the similarities in the keeper.
density_normalize (bool) – The density rescaling of Coifman and Lafon (2006): Then only the geometry of the data matters, not the sampled density.
- Returns:
- transitions_asym_{similarity_key}numpy.ndarray, (n_observations, n_observations)
Asymmetric Transition matrix.
- transitions_sym_{similarity_key}numpy.ndarray, (n_observations, n_observations)
Symmetric Transition matrix.
- Return type:
Adds the following to the keeper.misc (with 0s on the diagonals)
Notes
Code primarily copied from scanpy.neighbors.
- netflow.pose.organization.dpt_from_augmented_sym_transitions(keeper, key, n_comps: int = 0, save_eig=False)[source]#
Compute the diffusion pseudotime metric between observations, computed from the symmetric transitions.
Note
\(T\) is the symmetric transition matrix
\(M(x,z) = \sum_{i=1}^{n-1} (\lambda_i * (1 - \lambda_i))\psi_i(x)\psi_i^T(z)\)
\(dpt(x,z) = ||M(x, .) - M(y, .)||^2\)
- Parameters:
key (str) – Reference ID for the symmetric transitions numpy.ndarray, (n_observations, n_observations) stored in
keeper.misc
.n_comps – Number of eigenvalues/vectors to be computed, set
n_comps = 0
to compute the whole spectrum. Alternatively, if setn_comps >= n_observations
, the whole spectrum will be computed.
- Returns:
dpt – Pairwise-observation Diffusion pseudotime distances are stored in keeper.distances[dpt_key] where
dpt_key="dpt_from_{key}"
. If the full spectrum is not used (i.e.,0 < n_comps < n_observations"
), thendpt_key="dpt_from_{key}_{n_comps}comps"
.- Return type:
numpy.ndarray, (n_observations, n_observations)
- netflow.pose.organization.get_pose(keeper, key, label, n_branches, until_branched=False, root=None, min_branch_size=5, choose_largest_segment=False, flavor='haghverdi16', allow_kendall_tau_shift=False, smooth_corr=False, brute=True, split=True, connect_closest=False, connect_trunk='classic', mutual=False, k_mnn=3, verbose=None)[source]#
Compute the pose and saved to keeper.
- Parameters:
keeper (netflow.Keeper) – The keeper object that stores the distance matrix of size (n_observations, n_observations).
key (str) – The label used to reference the distance matrix stored in
keeper.distances
, of size (n_observations, n_observations).label (str) – Label used to store resulting schema in
keeper.misc[label]
and POSE topology inkeeper.graphs[label]
..n_branches (int) – Number of branch splits to perform (
n_branches > 0
).until_branched (bool) – If True, iteratively find segment to branch and perform branching until a segement is successfully branched or no branchable segments remain. Otherwise, if False, attempt to perform branching only once on the next potentially branchable segment.
root ({None, int, ‘density’, ‘density_inv’, ‘ratio’}) –
The root. If None, ‘density’ is used.
Options:
int : index of observation
’density’ : select observation with minimal distance-density
’density_inv’ : select observation with maximal distance-density
’ratio’ : select observation which leads to maximal triangular ratio distance
min_branch_size ({int, float}) – During recursive splitting of branches, only consider splitting a branch with at least
min_branch_size > 2
data points. If a float,min_branch_size
refers to the fraction of the total number of data points (0 < min_branch_size < 1
).choose_largest_segment (bool) – If True, select largest segment for branching.
flavor ({'haghverdi16', 'wolf17_tri', 'wolf17_bi', 'wolf17_bi_un'}) – Branching algorithm (based on scanpy implementation).
allow_kendall_tau_shift (bool) – If a very small branch is detected upon splitting, shift away from maximum correlation in Kendall tau criterion of [Haghverdi16] to stabilize the splitting.
smooth_corr (bool, default = False) – If True, smooth correlations before identifying cut points for branch splitting.
brute (bool) – If True, data points not associated with any branch upon split are combined with undecided (trunk) points. Otherwise, if False, they are treated as individual islands, not associated with any branch (and assigned branch index -1).
split (bool (default = True)) – if True, split segment into multiple branches. Otherwise, determine a single branching off of the main segment. This is ignored if flavor is not ‘haghverdi16’. If True,
brute
is ignored.mutual (bool (default = False)) – If True, add
k_mnn
mutual nn edges. Otherwise, add single nn edge. When False,k_mnn
is ignored.k_mnn (int (
0 < k_mnn < len(G)
)) – The number of nns to consider when extracting mutual nns. Note, this is ignored whenmutual
is False.connect_closest (bool (default = False)) – If True, connect branches by points with smallest distance between the branches. Otherwise, connect by continuum of ordering.
connect_trunk ({'classic', 'endpoint', 'dual'}, default = 'classic') –
Specify how to connect segments to unresolved/unidentified trunk. Note, this only applies when a split results in a trunk consisting of unresolved/unidentified points. Additionally, this is ignored if
flavor ~= 'haghverdi16'
. It is also ignored Ifflavor = `haghverdi16'
andsplit = False
.Options:
classic : point identified in trunk is connected to the point in the segment closest to it
endpoint : point identified in trunk is connected to the the segment’s second tip
dual : point identified in trunk is connected to both points determined by classic and endpoint
- Returns:
poser : POSER
The poser object with the pseudo-organizational branching structure is stored in
keeper.misc['poser_{label}']
.
G_pose : networkx.Graph
The resulting pose topology is stored in
keeper.graphs['pose_{label}']
.
G_pose_nn : networkx.Graph
The resulting pose + nearest-neighbor (nn) topology is stored in
keeper.graphs['pose_nn_{label}]
.
- Return type:
Writes the following to the keeper
- netflow.pose.organization.root_max_ratio(keeper, key)[source]#
Returns root index of observation that leads to the largest triangle..
- Parameters:
keeper (netflow.Keeper) – The keeper object.
key (str) – Reference key of distance in keeper used to determine the root.
- Returns:
root – The root index.
- Return type:
int