netflow.pose.similarity#

Functions

`distance_to_similarity`(keeper, key, ...[, ...])	Convert distance matrix to symmetric similarity measure.
`edges_from_mutual_knn_indices`(kmnn)	Convert mutual k-nns to a list of edges.
`get_knn_indices_distances`(d[, n_neighbors])	Get indices of and distances to k-nearest neighbors
`mutual_knn_edges`(d[, n_neighbors])	Get edges between indices of mutual k-nearest neighbors (nn) from distance matrix.
`mutual_knn_indices`(d[, n_neighbors])	Get indices of mutual k-nearest neighbors (nn).
`sigma_knn`(keeper, key[, label, n_neighbors, ...])	Set sigma for each obs as the distance to its k-th neighbor from keeper.
`sigma_knn_`(d[, n_neighbors, method, return_nn])	Determine sigma for each obs as the distance to its k-th neighbor.

netflow.pose.similarity._distance_to_similarity(d, n_neighbors, method, sigmas=None, knn=False, indices=None)[source]#

Convert distance matrix to symmetric similarity measure.

\[K = \sqrt{2\sigma_i\sigma_j / (\sigma_i^2 + \sigma_j^2)}\exp{-(x-y)^2 / (\sigma_x^2 + \sigma_y^2)}.\]

Parameters:

d (numpy.ndarray, (n_observations, n_observations)) – Symmetric distance matrix.
n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing sigmas, n_neighbors > 0. (Uses n_neighbors + 1, since each obs is it’s closest neighbor). If None, all neighbors are used.
method ({float, int, ‘mean’, ‘median’, ‘max’, ‘precomputed’}) –
Indicate how to compute sigma.

Options:
- float : constant float to use as sigma
- int : constant int to use as sigma
- ’mean’ : mean of distance to n_neighbors nearest neighbors
- ’median’ : median of distance to n_neighbors nearest neighbors
- ’max’ : distance to n_neighbors-nearest neighbor
- ’precomputed’ : precomputed values passed to sigmas
sigmas (numpy.ndarray, (n_observations, )) – Option to provide precomputed sigmas , ignored unless method='precomputed'.
knn (bool) – If True, restrict similarity measure to be non-zero only between n_neighbors nearest neighbors.
indices (numpy.ndarray, (n_observations, n_neighbors)) – Option to provide precomputed indices of n_neighbors nearest neighbors for each obs when method = ‘precomputed’ and knn = True

Returns:

K – Symmetric similarity measure.

Return type:

numpy.ndarray, (n_observations, n_observations)

netflow.pose.similarity.distance_to_similarity(keeper, key, n_neighbors, method, label=None, sigmas=None, knn=False, indices=None)[source]#

Convert distance matrix to symmetric similarity measure.

Parameters:

keeper (netflow.Keeper) – The keeper object that stores the symmetric distance matrix of size (n_observations, n_observations).
key (str) – The label used to reference the distance matrix stored in keeper.distances, of size (n_observations, n_observations).
n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing sigmas, n_neighbors > 0. (Uses n_neighbors + 1, since each obs is it’s closest neighbor). If None, all neighbors are used.
method ({float, ‘mean’, ‘median’, ‘max’, ‘precomputed’}) –
Indicate how to compute sigma.

Options:
- float : constant float to use as sigma
- int : constant int to use as sigma
- ’mean’ : mean of distance to n_neighbors nearest neighbors
- ’median’ : median of distance to n_neighbors nearest neighbors
- ’max’ : distance to n_neighbors-nearest neighbor
- ’precomputed’ : precomputed values passed to sigmas
label (str) – Label used to store resulting similarity matrix of size (n_observations, n_observations) in keeper.similarities.
sigmas (str) – Option to provide precomputed sigmas, ignored unless method='precomputed'. If provided, the precomputed sigmas are extracted from keeper.misc[sigmas] as a numpy.ndarray of size (n_observations, ).
knn (bool) – If True, restrict similarity measure to be non-zero only between n_neighbors nearest neighbors.
indices ({None, str}) – Option to provide precomputed indices of n_neighbors nearest neighbors for each obs when method = ‘precomputed’ and knn = True. If provided, the indices are extracted from keeper.misc[indices] as a numpy.ndarray of size (n_observations, n_neighbors).

Returns:

K – Symmetric similarity measure. if label is not None, this is stored in keeper.similarities[label] instead of being returned.

Return type:

numpy.ndarray, (n_observations, n_observations)

netflow.pose.similarity.edges_from_mutual_knn_indices(kmnn)[source]#

Convert mutual k-nns to a list of edges.

Parameters:: kmnn (defaultdict) – The mutual k-nn indices as returned from mutual_knn_indices.
Returns:: edges – The list of edges corresponding to the mutual k-nns.
Return type:: list

netflow.pose.similarity.get_knn_indices_distances(d, n_neighbors=None)[source]#

Get indices of and distances to k-nearest neighbors

Parameters:

d (numpy.ndarray, (m, m)) – Symmetric distance matrix.
n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing sigmas, n_neighbors > 0. (Uses n_neighbors + 1, since each obs is its closest neighbor). If None, all neighbors are used.

Returns:

indices (numpy.ndarray, (m, n_neighbors)) – Matrix with indices of k-nearest neighbors in each row Note, this does not include itself in output)
distances (numpy.ndarray, (m, n_neighbors)) – Matrix with distance to k-nearest neighbors Note, this does not include itself in output.

netflow.pose.similarity.mutual_knn_edges(d, n_neighbors=None)[source]#

Get edges between indices of mutual k-nearest neighbors (nn) from distance matrix.

Note

Self is not included as one of the k-nns.

Parameters:

d (numpy.ndarray, (m, m)) – Symmetric distance matrix.
n_neighbors ({int, None}) – Number of mutual nns to include (does not include self), n_neighbors > 0. (Uses n_neighbors + 1, since each obs is its closest neighbor). If None, all neighbors are used (same as k-nns since all neighbors are mutually included).

Returns:

edges – The list of edges corresponding to the mutual k-nns.

Return type:

list

netflow.pose.similarity.mutual_knn_indices(d, n_neighbors=None)[source]#

Get indices of mutual k-nearest neighbors (nn).

Parameters:

d (numpy.ndarray, (m, m)) – Symmetric distance matrix.
n_neighbors ({int, None}) – Number of mutual nns to include (does not include self), n_neighbors > 0. (Uses n_neighbors + 1, since each obs is its closest neighbor). If None, all neighbors are used (same as k-nns since all neighbors are mutually included).

Returns:

kmnn_indices – Defaultdict keyed by row index referrencing the row indices of its mutual nns out of n_neighbors nns. Note, this does not include itself in output.

Return type:

defaultdict[`list]` of the form {m : [up to ``n_neighbors mutual nns]}``

netflow.pose.similarity.sigma_knn(keeper, key, label=None, n_neighbors=None, method='mean', return_nn=False)[source]#

Set sigma for each obs as the distance to its k-th neighbor from keeper.

Parameters:

keeper (netflow.Keeper) – The keeper object that stores the symmetric distance matrix of size (n_observations, n_observations).
key (str) – The label used to reference the distance matrix stored in keeper.distances, of size (n_observations, n_observations).
label (str) – Label used to store resulting sigmas in keeper.misc['sigmas_'+label]. If return_nn is True, nearest neighbor indices are stored in keeper.misc['nn_indices_'+label] and nearest neighbor distances are stored in keeper.misc['nn_distances_'+label].
n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing sigmas, n_neighbors > 0. (Uses n_neighbors + 1, since each obs is it’s closest neighbor). If None, all neighbors are used.
method ({'mean', 'median', 'max'}) –
Indicate how to compute sigma.

Options:
- ’mean’ : mean of distance to n_neighbors nearest neighbors
- ’median’ : median of distance to n_neighbors nearest neighbors
- ’max’ : distance to n_neighbors-nearest neighbor
return_nn (bool) – If True, also return/store indices and distances of n_neighbors nearest neighbors.

Returns:

sigmas (numpy.ndarray, (n_observations, )) – The distance to the k-th nearest neighbor for all rows in d. Sigmas represent the kernel width representing each data point’s accessible neighbors.
indices (numpy.ndarray, (n_observations, )) – Indices of nearest neighbors where each row corresponds to an observation. Returned if return_nn is True.
distances (numpy.ndarray, (n_observations, n_neighbors + 1)) – Distances to nearest neighbors where each row corresponds to an obs. Returned if return_nn is True.

netflow.pose.similarity.sigma_knn_(d, n_neighbors=None, method='mean', return_nn=False)[source]#

Determine sigma for each obs as the distance to its k-th neighbor.

Parameters:

d (numpy.ndarray, (n_observations, n_observations)) – Symmetric distance matrix.
n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing sigmas, n_neighbors > 0. (Uses n_neighbors + 1, since each obs is it’s closest neighbor). If None, all neighbors are used.
method ({'mean', 'median', 'max'}) – Indicate how to compute sigma.
return_nn (bool) –
If True, also return indices and distances of n_neighbors nearest neighbors.

Options:
- ’mean’ : mean of distance to n_neighbors nearest neighbors
- ’median’ : median of distance to n_neighbors nearest neighbors
- ’max’ : distance to n_neighbors-nearest neighbor

Returns:

sigmas (numpy.ndarray, (n_observations, )) – The distance to the k-th nearest neighbor for all rows in d. Sigmas represent the kernel width representing each data point’s accessible neighbors.
indices (numpy.ndarray, (n_observations, )) – Indices of nearest neighbors where each row corresponds to an observation. Returned if return_nn is True.
distances (numpy.ndarray, (n_observations, n_neighbors + 1)) – Distances to nearest neighbors where each row corresponds to an obs. Returned if return_nn is True.

netflow.pose.similarity#

This Page