netflow.pose.organization#

organization#

Description

The purpose of This module is to construct the organization of the schema from a distance matrix and a single (or multiple) data point(s) designated as the source.

This is done by using the branch detection algorithm from the diffusion pseudo-time (DPT) algorithm for reconstructing developmental progression and differentiation of cells proposed in [Haghverdi16] as implemented in scanpy.

Acknowledgement

A large portion of the code was taken from scanpy.tools._dpt.py and code related to the method scanpy.tools._dpt.dpt.

Some noted differences made in scanpy implementation :

Add smoothing when computing maximal correlation cutoff

Include points not identified with any branch after split in the trunk (nonunique).

To do:

Set branchable aspect of TreeNode.

Functions

`compute_multiscale_VNE_transitions_from_similarity`(...)	Compute the multi-scale transition matrix based on the elbow of the Von Neumann Entropy (VNE) as described in GSPA and PHATE KrishnaswamyLab/spARC, https://pdfs.semanticscholar.org/16ab/e92b7630d5b84b904bde97dad9b9fbce406c.pdf.
`compute_rw_transitions`(keeper, similarity_key)	Compute the row-stochastic transition matrix.
`compute_sym_diffusion_affinity_transitions`(...)	Compute the symmetric diffusion affinity transition matrix from KrishnaswamyLab/graphtools.
`compute_transitions`(keeper, similarity_key)	Compute symmetric and asymmetric transition matrices and store in keeper.
`dpt_from_augmented_sym_transitions`(keeper, key)	Compute the diffusion pseudotime metric between observations, computed from the symmetric transitions.
`get_pose`(keeper, key, label, n_branches[, ...])	Compute the pose and saved to keeper.
`root_max_ratio`(keeper, key)	Returns root index of observation that leads to the largest triangle..

Classes

POSER(keeper, key[, root, root_as_tip, ...])

param keeper:: The keeper object that stores the distance matrix of size (n_observations, n_observations).

Tree()

Tree implemenation as a collection of TreeNode objects.

TreeNode([name, data, children, parent, ...])

Node of a general tree data structure.

class netflow.pose.organization.POSER(keeper, key, root=None, root_as_tip=False, min_branch_size=5, choose_largest_segment=False, flavor='haghverdi16', allow_kendall_tau_shift=False, smooth_corr=True, brute=True, split=True, connect_closest=False, connect_trunk='classic', verbose=None)[source]#

Parameters:

keeper (netflow.Keeper) – The keeper object that stores the distance matrix of size (n_observations, n_observations).
key (str) – The label used to reference the distance matrix stored in keeper.distances, of size (n_observations, n_observations).
root ({None, int, ‘density’, ‘density_inv’, ‘ratio’}) –
The root. If None, ‘density’ is used.

Options:
- int : index of observation
- ’density’ : select observation with minimal distance-density
- ’density_inv’ : select observation with maximal distance-density
- ’ratio’ : select observation which leads to maximal triangular ratio distance
root_as_tip (bool) – If True, force first tip as the root. Defaults to False following scanpy implementation.
min_branch_size ({int, float}) – During recursive splitting of branches, only consider splitting a branch with at least min_branch_size > 2 data points. If a float, min_branch_size refers to the fraction of the total number of data points (0 < min_branch_size < 1).
choose_largest_segment (bool) – If True, select largest segment for branching.
flavor ({'haghverdi16', 'wolf17_tri', 'wolf17_bi', 'wolf17_bi_un'}) – Branching algorithm (based on scanpy implementation).
allow_kendall_tau_shift (bool) – If a very small branch is detected upon splitting, shift away from maximum correlation in Kendall tau criterion of [Haghverdi16] to stabilize the splitting.
smooth_corr (bool, default = False) – If True, smooth correlations before identifying cut points for branch splitting.
brute (bool) – If True, data points not associated with any branch upon split are combined with undecided (trunk) points. Otherwise, if False, they are treated as individual islands, not associated with any branch (and assigned branch index -1).
split (bool (default = True)) – if True, split segment into multiple branches. Otherwise, determine a single branching off of the main segment. This is ignored if flavor is not ‘haghverdi16’. If True, brute is ignored.
connect_closest (bool (default = False)) – If True, connect branches by points with smallest distance between the branches. Otherwise, connect by continuum of ordering.
connect_trunk ({'classic', 'endpoint', 'dual'}, default = 'classic') –
Specify how to connect segments to unresolved/unidentified trunk. Note, this only applies when a split results in a trunk consisting of unresolved/unidentified points. Additionally, this is ignored if flavor ~= 'haghverdi16'. It is also ignored If flavor = `haghverdi16' and split = False.

Options:
- classic : point identified in trunk is connected to the point in the segment closest to it
- endpoint : point identified in trunk is connected to the the segment’s second tip
- dual : point identified in trunk is connected to both points determined by classic and endpoint

__detect_branching_haghverdi16(Dseg: ndarray, tips: ndarray) → ndarray#

Detect branching on given segment.

Compute point that maximizes kendall tau correlation of the sequences of distances to the second and the third tip, respectively, when ‘moving away’ from the first tip: tips[0]. ‘Moving away’ means moving in the direction of increasing distance from the first tip.

Parameters:

Dseg – The distance matrix restricted to segment.
tips – The three tip points in local coordinates to the segment. They form a ‘triangle’ that contains the data.

Returns:

branch – Segment obtained from “splitting away the first tip data point”, where k is the number of data points in the branch.

Return type:

numpy.ndarray (k,)

__init__(keeper, key, root=None, root_as_tip=False, min_branch_size=5, choose_largest_segment=False, flavor='haghverdi16', allow_kendall_tau_shift=False, smooth_corr=True, brute=True, split=True, connect_closest=False, connect_trunk='classic', verbose=None)[source]#

__module__ = 'netflow.pose.organization'#

_construct_mst_topology(annotate=True)[source]#

Construct MST backbone topology graph.

Parameters:: annotate (bool) – If True, add observation label as node attribute, referenced by ‘name’.
Returns:: Gmst – Graph where each node is a data point and edges reflect MST connections between them.
Return type:: networkx.Graph

_construct_topology(segs, annotate=True)[source]#

Construct POSE connections between data points.

Parameters:

segs (dict) – The banched segments indexed by the node’s unique identifier.
annotate (bool) – If True, annotate edges with edge origin and distance.

Returns:

G – Graph where each node is a data point and edges reflect connections between them. If annotate is True, the following annotations are added:

Edges have attributes:
- ’connection’ : (str) ‘intra-branch’ or ‘inter-branch’}
Nodes have attributes:
- ’branch’ : (int) -1, 0, 1, … where -1 indicates the data point was not identified with a branch
- ’undecided’ : (bool) True if the data point is part of a trunk and False otherwise
- ’name’ : (str) Original label if given data was a dataframe, otherwise the same as the node id
- ’unidentified’ : (0 or 1) 1 if data point was ever not associated with any branch upon split, 0 otherwise.

Return type:

networkx.Graph

_detect_branch(Dseg: ndarray, tips: ndarray)[source]#

Detect branching on given segment.

If self.split, Call function __detect_branching three times for all three orderings of tips. Points that do not belong to the same segment in all three orderings are assigned to a fourth segment. The latter is, by Haghverdi et al. (2016) referred to as ‘undecided points’ (which make up the so-called ‘trunk’). Otherwise, only the branch off the main segment is detected from the third tip.

If split and flavor == 'haghverdi16' : If any of the branches from the three consist of zero unique observations, resulting in an empty branch, the process is terminated and no branching is performed on the current segment.

..note:

In practice, this has only occurred in small segments. If finer resolution partitioning
is desired, this may be changed in a future release to account for an offshoot resulting
in two branches (and possibly a trunk with undecided points).

Parameters:

Dseg – The distance matrix restricted to segment.
tips (numpy.ndarray) – Tips in local coordinates relative to the segment.

Returns:

ssegs (list[list]) – Stores branched segments in local coordinates.
ssegs_tips (list[list]) – Stores all tip points in local coordinates for the segments in ssegs.
ssegs_connects list[list] – A list of k lists, where k is the number of inter-segment connections between the segments in ssegs. Each entry is a 2-list of the form [[index of first seg in ssegs, index of second seg in ssegs], [source observation, target observation]].
trunk (int) – Index of segment in ssegs that all other segments in ssegs stem from.
trunk_undecided (bool) – If True, the trunk are made up of undecided points.
unidentified_points (set) – Points in local coordinates relative to the segment before branching that are not associated with any branch after splitting.

_detect_branching_single_haghverdi16(Dseg: ndarray, tips: ndarray)[source]#

Detect branching on given segment.

Parameters:

Dseg –
tips –

Returns:

ssegs – The branched segments.

Return type:

list[numpy.ndarray]

_detect_branching_single_wolf17_bi(Dseg, tips)[source]#

_detect_branching_single_wolf17_tri(Dseg, tips)[source]#

_kendall_tau_add(len_old: int, diff_pos: int, tau_old: float)[source]#

Compute Kendall tau delta.

The new sequence has length len_old + 1.

Parameters:

len_old – The length of the old sequence, used to compute tau_old.
diff_pos – Difference between concordant and non-concordant pairs.
tau_old – Kendall rank correlation of the old sequence.

_kendall_tau_diff(a: ndarray, b: ndarray, i) → Tuple[int, int][source]#

Compute difference in concordance of pairs in split sequences.

Consider splitting a and b at index i.

Parameters:

a (numpy.ndarray) – One dimensional sequences.
b (numpy.ndarray) – One dimensional sequences.
i (int) – Index for splitting a and b.

Returns:

diff_pos – Difference between concordant pairs for both subsequences.
diff_neg – Difference between non-concordant pairs for both subsequences.

_kendall_tau_subtract(len_old: int, diff_neg: int, tau_old: float)[source]#

Compute Kendall tau delta.

The new sequence has length len_old - 1.

Parameters:

len_old – The length of the old sequence, used to compute tau_old.
diff_neg – Difference between concordant and non-concordant pairs.
tau_old – Kendall rank correlation of the old sequence.

_set_pseudo_dist()[source]#: Return pseudo-distance with respect to root point.

branchings_segments(n_branches, until_branched=False, annotate=True)[source]#

Detect up to n_branches branches and partition the data into corresponding segments.

Parameters:

n_branches (int) – Number of branches to look for (n_branches > 0).
until_branched (bool) –
If True, iteratively find segment to branch and perform branching until a segement is successfully branched or no branchable segments remain. Otherwise, if False, attempt to perform branching only once on the next potentially branchable segment.

..note:
```
This is only applicable when branching is being performed. If previous
iterations of branching has already been performed, it is not possible to
identify the number of iterations where no branching was performed.
```
annotate (bool) – If True, annotate nodes with root and tips.

Returns:

G – The graph of the resulting POSE.

Return type:

nx.Graph

construct_pose_mst_nn_topology(G, mutual=False, k_mnn=1, annotate=True)[source]#

Add nearest neighbor (nn) edges to MST POSE topology.

Note

Mutual nns tend to be sparser than nns so allow to select more than just the first nn if restricting to mutual neighbors.

Parameters:

G (networkx.Graph) – Nearest-neighbor edges are added to a copy of the MST POSE graph.
mutual (bool (default = False)) – If True, add k_mnn mutual nn edges. Otherwise, add single nn edge. When False, k_mnn is ignored.
k_mnn (int (0 < k_mnn < len(G))) – The number of nns to consider when extracting mutual nns. Note, this is ignored when mutual is False.
annotate (bool) – If True, annotate edges.

Returns:

Gnn – The updated graph with nearest neighbor edges. If annotate is True, edge attribute “edge_origin” is added with the possible values :

”POSE” : for edges in the original MST graph that are not nearest neighbor edges
”NN” : for nearest neighbor edges that were not in the original MST graph
”POSE + NN” : for edges in the original MST graph that are also nearest neighbor edges

Return type:

networkx.Graph

construct_pose_mst_topology(G)[source]#

Construct pose topology with minimum spanning tree (MST) edges.

Parameters:

G (networkx.Graph) – The POSE graph. MST edges are added to a copy of the graph.

Returns:

Gmst – The updated graph with MST edges and edge attribute “edge_origin” with the possible values :

”POSE” : for edges in the original graph that are not MST edges
”MST” : for MST edges that were not in the original graph
”POSE + MST” : for edges in the original graph that are also MST edges

Return type:

networkx.Graph

construct_pose_nn_topology(G, mutual=False, k_mnn=3, annotate=True)[source]#

Add nearest neighbor (nn) edges to POSE topology.

Note

Mutual nns tend to be sparser than nns so allow to select more than just the first nn if restricting to mutual neighbors.

Parameters:

G (networkx.Graph) – Nearest-neighbor edges are added to a copy of the POSE graph.
mutual (bool (default = False)) – If True, add k_mnn mutual nn edges. Otherwise, add single nn edge. When False, k_mnn is ignored.
k_mnn (int (0 < k_mnn < len(G))) – The number of nns to consider when extracting mutual nns. Note, this is ignored when mutual is False.
annotate (bool) – If True, annotate edges.

Returns:

Gnn – The updated graph with nearest neighbor edges. If annotate is True, edge attribute “edge_origin” is added with the possible values :

”POSE” : for edges in the original graph that are not nearest neighbor edges
”NN” : for nearest neighbor edges that were not in the original graph
”POSE + NN” : for edges in the original graph that are also nearest neighbor edges

Return type:

networkx.Graph

detect_branches(n_branches, until_branched=False)[source]#

Detect up to n_branches branchings and update tree in place.

Parameters:

n_branches (int) – Number of branch splits to perform (n_branches > 0).
until_branched (bool) –
If True, iteratively find segment to branch and perform branching until a segement is successfully branched or no branchable segments remain. Otherwise, if False, attempt to perform branching only once on the next potentially branchable segment.

..note:
```
This is only applicable when branching is being performed. If previous
iterations of branching has already been performed, it is not possible to
identify the number of iterations where no branching was performed.
```

detect_branching(node)[source]#

Detect branching on a given segment and update TreeNode parameters in place.

Parameters:: node (TreeNode) – The node of the segment to be branched.
Returns:: updated – True if segment is successfully branched, False otherwise.
Return type:: bool

extract_branchings(n_branches)[source]#

Extract POSE from up to n_branches branchings

Parameters:: n_branches (int) – Number of branches to look for (n_branches > 0).
Returns:: tree – The tree with up to n_branches branchings. If n_branches is more than or equal to the number of branchings in the tree, the original tree is returned. Otherwise, a reduced tree is returned.
Return type:: Tree

identify_local_tips(Dseg, newseg, tip)[source]#

Identify new tips within the new segments

Parameters:

newseg (list) – New segment (local with respect to original segment).
tip (int) – Local index of the first tip, with respect to the original segment that determinned Dseg before the split.

Returns:

tips – First and second tip indices in local coordinates relative to the original segment, before it was branched.

Return type:

np.ndarray (2,)

kendall_tau_split(a, b, min_length=5) → int[source]#

Return splitting index that maximizes correlation in the sequences.

Compute difference in Kendall tau for all splitted sequences.

For each splitting index i, compute the difference of the two correlation measures kendalltau(a[:i], b[:i]) and kendalltau(a[i:], b[i:]).

Returns the splitting index that maximizes: kendalltau(a[:i], b[:i]) - kendalltau(a[i:], b[i:])

Parameters:

a (numpy.ndarray) – One dimensional sequences.
b (numpy.ndarray) – One dimensional sequences.
min_length (int, (min_length > 0)) – Minimum number of data points automatically included in branch.

Returns:

imax – Splitting index according to above description.

Return type:

int

select_segment()[source]#

Select segment with most distant triangulated data point.

Returns:: node – The node corresponding to the selected segment. If no nodes are branchable, returns None.
Return type:: TreeNode

single_branch(until_branched=False)[source]#

Perform single branching in place.

Parameters:: until_branched (bool) – If True, iteratively find segment to branch and perform branching until a segement is successfully branched or no branchable segments remain. Otherwise, if False, attempt to perform branching only once on the next potentially branchable segment.
Returns:: branched_flag – Indicates if branching was successfully completed.
Return type:: bool

class netflow.pose.organization.Tree[source]#

Tree implemenation as a collection of TreeNode objects.

Intended to represent the hierarchical branching.

__init__()[source]#

__module__ = 'netflow.pose.organization'#

_get_node_from_counter(counter)[source]#

Search and return node in Tree by its counter ID.

Assumes no nodes have the same counter ID.

If no such node is found with the specified counter, None is returned.

Parameters:: counter (int) – Counter ID of node to search for.
Returns:: node – Node in the tree. If node is not found, returns None.
Return type:: TreeNode

_search_counter(counter)[source]#

Search and return index of node in Tree by its counter ID.

Assumes no nodes have the same couner.

If no such node is found with the specified counter, the value -1 is returned.

Parameters:: counter (int) – Counter ID of node to search for.
Returns:: index – Index of node in the tree. If node is not found, returns -1.
Return type:: int

all_data()[source]#: Return sorted set of all data points in all nodes in the tree.

co_branch_indicator()[source]#: Return binary symmetric pandas.DataFrame of size (num_data_points, num_data_points) where the (i,j)-th entry is 1 if the i-th and j-th data points are found in the same node (i.e., branch) and i is not the same data point as j. Otherwise, if i = j, or if the i-th and j-th data points are not found in the same node, the (i.j)-th entry is 0.

disp()[source]#

get_leaves()[source]#: Return leaf nodes in the tree.

get_leaves_indices()[source]#: Return indices of leaf nodes in the tree.

get_node(index)[source]#: Return node by its index.

get_node_from_name(name, bottom_up=True)[source]#

Search and return node in Tree by its name.

Assumes no nodes at the same depth have the same name. If more than one node has the same name, return the node of the deepest node (farthest from root), when bottom_up = True, otherwise, return the index of the shallowest (closest to root) node.

If no such node is found with the specified name, None is returned.

Parameters:

name – Name of node to search for.
bottom_up (bool) – Indicate if the index of the shallowest or deepest node should be returned when more than one node has the same name. It is assumed that no two nodes at the same depth have the same name.

Returns:

node – Node in the tree. If node is not found, returns None.

Return type:

TreeNode

insert(node, index=None, parent=None)[source]#

Insert a node into the Tree.

Parameters:

node (TreeNode) – Node to insert.
index ({None, int}) –
Index in list of nodes where the node should be inserted. (Intended to match current structure for updating segments until tree structure is fully leveraged (e.g., using tree leaf nodes when searching for which segment to select).

If None, the node is appended to the end of the list.
parent ({None, TreeNode}) – Parent node. If None, node is set as the root node.

max_depth()[source]#: Return max depth of the tree.

root()[source]#

search(name, bottom_up=True)[source]#

Search and return index of node in Tree by its name.

Assumes no nodes at the same depth have the same name. If more than one node has the same name, return the index of the deepest node (farthest from root), when bottom_up = True, otherwise, return the index of the shallowest (closest to root) node.

If no such node is found with the specified name, the value -1 is returned.

Parameters:

name – Name of node to search for.
bottom_up (bool) – Indicate if the index of the shallowest or deepest node should be returned when more than one node has the same name. It is assumed that no two nodes at the same depth have the same name.

Returns:

index – Index of node in the tree. If node is not found, returns -1.

Return type:

int

search_data(value, bottom_up=True)[source]#

Search and return index of node in Tree with value in node data.

If the value is in the data of more than one node, return the index of the deepest node (farthest from root), when bottom_up = True, otherwise, return the index of the shallowest (closest to root) node.

If no such node is found with the specified value in its data, the value -1 is returned.

Parameters:

value – Value in node data to search for.
bottom_up (bool) – Indicate if the index of the shallowest or deepest node should be returned when more than one node has the same name. It is assumed that no two nodes at the same depth have the same name.

Returns:

index – Index of node in the tree. If node is not found, returns -1.

Return type:

int

class netflow.pose.organization.TreeNode(name='root', data=None, children=None, parent=None, nonunique=None, unidentified=None, branchable=True, is_trunk=None)[source]#

Node of a general tree data structure.

Each node is intended to refer to a branch.

Parameters:

name – Reference name of node (branch).
data – Data associated with the node. Intended to be a list of indices corresponding to the branch members.
children (list [TreeNode]) – List of children TreeNode objects.
parent (TreeNode) – Parent TreeNode object.
nonunique (bool) – Indicate if node (branch) is the trunk.
unidentified (bool) – Indicate if node (branch) is a set of points that were not identified with a particular branch after splitting.
branchable (bool) – Indicate if node can potentially be further branched.
is_trunk (bool) – Indicate if node referes to undecided trunk branch.

__init__(name='root', data=None, children=None, parent=None, nonunique=None, unidentified=None, branchable=True, is_trunk=None)[source]#

__module__ = 'netflow.pose.organization'#

__repr__()[source]#: Return repr(self).

add_child(node)[source]#

Add child to node.

Parameters:: node (TreeNode) – The child node.

contains(value)[source]#: Check if value is in data.

depth()[source]#: Depth of current node.

disp()[source]#

is_leaf()[source]#

is_root()[source]#

netflow.pose.organization._compute_transitions(similarity=None, density_normalize: bool = True)[source]#

Compute transition matrix.

Parameters:

similarity (numpy.ndarray, (n_observations, n_observations)) – Symmetric similarity measure (with 1s on the diagonal).
density_normalize (bool) – The density rescaling of Coifman and Lafon (2006): Then only the geometry of the data matters, not the sampled density.

Returns:

transitions_asym (numpy.ndarray, (n_observations, n_observations)) – Asymmetric Transition matrix.
transitions_sym (numpy.ndarray, (n_observations, n_observations)) – Symmetric Transition matrix.

Notes

Code copied from scanpy.neighbors.

netflow.pose.organization._dpt_from_augmented_sym_transitions(T, n_comps: int = 0, return_eigs=False)[source]#

Return the diffusion pseudotime metric between observations, computed from the symmetric transitions.

Note

\(T\) is the symmetric transition matrix
\(M(x,z) = \sum_{i=1}^{n-1} (\lambda_i * (1 - \lambda_i))\psi_i(x)\psi_i^T(z)\)
\(dpt(x,z) = ||M(x, .) - M(y, .)||^2\)

Parameters:

T (numpy.ndarray, (n_observations, n_observations)) – Symmetric transitions.
n_comps – Number of eigenvalues/vectors to be computed, set n_comps = 0 to compute the whole spectrum. Alternatively, if set n_comps >= n_observations, the whole spectrum will be computed.

Returns:

dpt – Pairwise-observation Diffusion pseudotime distances.

Return type:

numpy.ndarray, (n_observations, n_observations)

netflow.pose.organization.compute_multiscale_VNE_transitions_from_similarity(keeper, similarity_key, tau_max=None, do_save=True)[source]#

Compute the multi-scale transition matrix based on the elbow of the Von Neumann Entropy (VNE) as described in GSPA and PHATE KrishnaswamyLab/spARC, https://pdfs.semanticscholar.org/16ab/e92b7630d5b84b904bde97dad9b9fbce406c.pdf.

Parameters:

keeper (netflow.Keeper) – The keeper object.
similarity_key (str) – Reference key to the numpy.ndarray, (n_observations, n_observations) symmetric similarity measure (with 1s on the diagonal) stored in the similarities in the keeper.
tau_max (int) – Max scale tau tested for VNE (default is 100).
do_save (bool) – If True, save to keeper.

Returns:

P (numpy.ndarray (n_observations, n_observations)) – The symmetric VNE multi-scale transition matrix (with 0s on the diagonals). If do_save is True, P is added to the keeper.misc with the key 'transitions_sym_multiscaleVNE_{similarity_key}'
P_asym (numpy.ndarray (n_observations, n_observations)) – The random-walk VNE multi-scale transition matrix (with 0s on the diagonals). If do_save is True, P_asym is added to the keeper.misc with the key 'transitions_multiscaleVNE_{similarity_key}'

netflow.pose.organization.compute_rw_transitions(keeper, similarity_key, do_save=True)[source]#

Compute the row-stochastic transition matrix.

Parameters:

keeper (netflow.Keeper) – The keeper object.
similarity_key (str) – Reference key to the numpy.ndarray, (n_observations, n_observations) symmetric similarity measure (with 1s on the diagonal) stored in the similarities in the keeper.
do_save (bool) – If True, save to keeper.

Returns:

P – The row-stochastic transition matrix (with 0s on the diagonals). If do_save is True, P is added to the keeper.misc with the key 'transitions_rw_{similarity_key}'

Return type:

numpy.ndarray (n_observations, n_observations)

netflow.pose.organization.compute_sym_diffusion_affinity_transitions(keeper, similarity_key, do_save=True)[source]#

Compute the symmetric diffusion affinity transition matrix from KrishnaswamyLab/graphtools.

\[P_{ij} = K_{ij} * (d_i * d_j)^{-1/2}\]

where \(d_i = \sum_r K_{ir}\) is the degree (row sum) of observation \(i\).

Parameters:

keeper (netflow.Keeper) – The keeper object.
similarity_key (str) – Reference key to the numpy.ndarray, (n_observations, n_observations) symmetric similarity measure (with 1s on the diagonal) stored in the similarities in the keeper.
do_save (bool) – If True, save to keeper.

Returns:

P – The symmetric diffusion affinity transition matrix (with 0s on the diagonals). If do_save is True, P is added to the keeper.misc with the key 'transitions_sym_diff_aff_{similarity_key}'

Return type:

numpy.ndarray (n_observations, n_observations)

netflow.pose.organization.compute_transitions(keeper, similarity_key, density_normalize: bool = True)[source]#

Compute symmetric and asymmetric transition matrices and store in keeper.

Parameters:

keeper (netflow.Keeper) – The keeper object.
similarity_key (str) – Reference key to the numpy.ndarray, (n_observations, n_observations) symmetric similarity measure (with 1s on the diagonal) stored in the similarities in the keeper.
density_normalize (bool) – The density rescaling of Coifman and Lafon (2006): Then only the geometry of the data matters, not the sampled density.

Returns:

transitions_asym_{similarity_key}numpy.ndarray, (n_observations, n_observations): Asymmetric Transition matrix.
transitions_sym_{similarity_key}numpy.ndarray, (n_observations, n_observations): Symmetric Transition matrix.

Return type:

Adds the following to the keeper.misc (with 0s on the diagonals)

Notes

Code primarily copied from scanpy.neighbors.

netflow.pose.organization.dpt_from_augmented_sym_transitions(keeper, key, n_comps: int = 0, save_eig=False)[source]#

Compute the diffusion pseudotime metric between observations, computed from the symmetric transitions.

Note

\(T\) is the symmetric transition matrix
\(M(x,z) = \sum_{i=1}^{n-1} (\lambda_i * (1 - \lambda_i))\psi_i(x)\psi_i^T(z)\)
\(dpt(x,z) = ||M(x, .) - M(y, .)||^2\)

Parameters:

key (str) – Reference ID for the symmetric transitions numpy.ndarray, (n_observations, n_observations) stored in keeper.misc.
n_comps – Number of eigenvalues/vectors to be computed, set n_comps = 0 to compute the whole spectrum. Alternatively, if set n_comps >= n_observations, the whole spectrum will be computed.

Returns:

dpt – Pairwise-observation Diffusion pseudotime distances are stored in keeper.distances[dpt_key] where dpt_key="dpt_from_{key}". If the full spectrum is not used (i.e., 0 < n_comps < n_observations"), then dpt_key="dpt_from_{key}_{n_comps}comps".

Return type:

numpy.ndarray, (n_observations, n_observations)

netflow.pose.organization.get_pose(keeper, key, label, n_branches, until_branched=False, root=None, min_branch_size=5, choose_largest_segment=False, flavor='haghverdi16', allow_kendall_tau_shift=False, smooth_corr=False, brute=True, split=True, connect_closest=False, connect_trunk='classic', mutual=False, k_mnn=3, verbose=None)[source]#

Compute the pose and saved to keeper.

Parameters:

keeper (netflow.Keeper) – The keeper object that stores the distance matrix of size (n_observations, n_observations).
key (str) – The label used to reference the distance matrix stored in keeper.distances, of size (n_observations, n_observations).
label (str) – Label used to store resulting schema in keeper.misc[label] and POSE topology in keeper.graphs[label]..
n_branches (int) – Number of branch splits to perform (n_branches > 0).
until_branched (bool) – If True, iteratively find segment to branch and perform branching until a segement is successfully branched or no branchable segments remain. Otherwise, if False, attempt to perform branching only once on the next potentially branchable segment.
root ({None, int, ‘density’, ‘density_inv’, ‘ratio’}) –
The root. If None, ‘density’ is used.

Options:
- int : index of observation
- ’density’ : select observation with minimal distance-density
- ’density_inv’ : select observation with maximal distance-density
- ’ratio’ : select observation which leads to maximal triangular ratio distance
min_branch_size ({int, float}) – During recursive splitting of branches, only consider splitting a branch with at least min_branch_size > 2 data points. If a float, min_branch_size refers to the fraction of the total number of data points (0 < min_branch_size < 1).
choose_largest_segment (bool) – If True, select largest segment for branching.
flavor ({'haghverdi16', 'wolf17_tri', 'wolf17_bi', 'wolf17_bi_un'}) – Branching algorithm (based on scanpy implementation).
allow_kendall_tau_shift (bool) – If a very small branch is detected upon splitting, shift away from maximum correlation in Kendall tau criterion of [Haghverdi16] to stabilize the splitting.
smooth_corr (bool, default = False) – If True, smooth correlations before identifying cut points for branch splitting.
brute (bool) – If True, data points not associated with any branch upon split are combined with undecided (trunk) points. Otherwise, if False, they are treated as individual islands, not associated with any branch (and assigned branch index -1).
split (bool (default = True)) – if True, split segment into multiple branches. Otherwise, determine a single branching off of the main segment. This is ignored if flavor is not ‘haghverdi16’. If True, brute is ignored.
mutual (bool (default = False)) – If True, add k_mnn mutual nn edges. Otherwise, add single nn edge. When False, k_mnn is ignored.
k_mnn (int (0 < k_mnn < len(G))) – The number of nns to consider when extracting mutual nns. Note, this is ignored when mutual is False.
connect_closest (bool (default = False)) – If True, connect branches by points with smallest distance between the branches. Otherwise, connect by continuum of ordering.
connect_trunk ({'classic', 'endpoint', 'dual'}, default = 'classic') –
Specify how to connect segments to unresolved/unidentified trunk. Note, this only applies when a split results in a trunk consisting of unresolved/unidentified points. Additionally, this is ignored if flavor ~= 'haghverdi16'. It is also ignored If flavor = `haghverdi16' and split = False.

Options:
- classic : point identified in trunk is connected to the point in the segment closest to it
- endpoint : point identified in trunk is connected to the the segment’s second tip
- dual : point identified in trunk is connected to both points determined by classic and endpoint

Returns:

poser : POSER
- The poser object with the pseudo-organizational branching structure is stored in keeper.misc['poser_{label}'].
G_pose : networkx.Graph
- The resulting pose topology is stored in keeper.graphs['pose_{label}'].
G_pose_nn : networkx.Graph
- The resulting pose + nearest-neighbor (nn) topology is stored in keeper.graphs['pose_nn_{label}].

Return type:

Writes the following to the keeper

netflow.pose.organization.root_max_ratio(keeper, key)[source]#

Returns root index of observation that leads to the largest triangle..

Parameters:

keeper (netflow.Keeper) – The keeper object.
key (str) – Reference key of distance in keeper used to determine the root.

Returns:

root – The root index.

Return type:

int

netflow.pose.organization#

organization#

This Page