netflow.pose.similarity#

Functions

distance_to_similarity(keeper, key, ...[, ...])

Convert distance matrix to symmetric similarity measure.

edges_from_mutual_knn_indices(kmnn)

Convert mutual k-nns to a list of edges.

get_knn_indices_distances(d[, n_neighbors])

Get indices of and distances to k-nearest neighbors

mutual_knn_edges(d[, n_neighbors])

Get edges between indices of mutual k-nearest neighbors (nn) from distance matrix.

mutual_knn_indices(d[, n_neighbors])

Get indices of mutual k-nearest neighbors (nn).

sigma_knn(keeper, key[, label, n_neighbors, ...])

Set sigma for each obs as the distance to its k-th neighbor from keeper.

sigma_knn_(d[, n_neighbors, method, return_nn])

Determine sigma for each obs as the distance to its k-th neighbor.

netflow.pose.similarity._distance_to_similarity(d, n_neighbors, method, sigmas=None, knn=False, indices=None)[source]#

Convert distance matrix to symmetric similarity measure.

\[K = \sqrt{2\sigma_i\sigma_j / (\sigma_i^2 + \sigma_j^2)}\exp{-(x-y)^2 / (\sigma_x^2 + \sigma_y^2)}.\]
Parameters:
  • d (numpy.ndarray, (n_observations, n_observations)) – Symmetric distance matrix.

  • n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing sigmas, n_neighbors > 0. (Uses n_neighbors + 1, since each obs is it’s closest neighbor). If None, all neighbors are used.

  • method ({float, int, ‘mean’, ‘median’, ‘max’, ‘precomputed’}) –

    Indicate how to compute sigma.

    Options:

    • float : constant float to use as sigma

    • int : constant int to use as sigma

    • ’mean’ : mean of distance to n_neighbors nearest neighbors

    • ’median’ : median of distance to n_neighbors nearest neighbors

    • ’max’ : distance to n_neighbors-nearest neighbor

    • ’precomputed’ : precomputed values passed to sigmas

  • sigmas (numpy.ndarray, (n_observations, )) – Option to provide precomputed sigmas , ignored unless method='precomputed'.

  • knn (bool) – If True, restrict similarity measure to be non-zero only between n_neighbors nearest neighbors.

  • indices (numpy.ndarray, (n_observations, n_neighbors)) – Option to provide precomputed indices of n_neighbors nearest neighbors for each obs when method = ‘precomputed’ and knn = True

Returns:

K – Symmetric similarity measure.

Return type:

numpy.ndarray, (n_observations, n_observations)

netflow.pose.similarity.distance_to_similarity(keeper, key, n_neighbors, method, label=None, sigmas=None, knn=False, indices=None)[source]#

Convert distance matrix to symmetric similarity measure.

Parameters:
  • keeper (netflow.Keeper) – The keeper object that stores the symmetric distance matrix of size (n_observations, n_observations).

  • key (str) – The label used to reference the distance matrix stored in keeper.distances, of size (n_observations, n_observations).

  • n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing sigmas, n_neighbors > 0. (Uses n_neighbors + 1, since each obs is it’s closest neighbor). If None, all neighbors are used.

  • method ({float, ‘mean’, ‘median’, ‘max’, ‘precomputed’}) –

    Indicate how to compute sigma.

    Options:

    • float : constant float to use as sigma

    • int : constant int to use as sigma

    • ’mean’ : mean of distance to n_neighbors nearest neighbors

    • ’median’ : median of distance to n_neighbors nearest neighbors

    • ’max’ : distance to n_neighbors-nearest neighbor

    • ’precomputed’ : precomputed values passed to sigmas

  • label (str) – Label used to store resulting similarity matrix of size (n_observations, n_observations) in keeper.similarities.

  • sigmas (str) – Option to provide precomputed sigmas, ignored unless method='precomputed'. If provided, the precomputed sigmas are extracted from keeper.misc[sigmas] as a numpy.ndarray of size (n_observations, ).

  • knn (bool) – If True, restrict similarity measure to be non-zero only between n_neighbors nearest neighbors.

  • indices ({None, str}) – Option to provide precomputed indices of n_neighbors nearest neighbors for each obs when method = ‘precomputed’ and knn = True. If provided, the indices are extracted from keeper.misc[indices] as a numpy.ndarray of size (n_observations, n_neighbors).

Returns:

K – Symmetric similarity measure. if label is not None, this is stored in keeper.similarities[label] instead of being returned.

Return type:

numpy.ndarray, (n_observations, n_observations)

netflow.pose.similarity.edges_from_mutual_knn_indices(kmnn)[source]#

Convert mutual k-nns to a list of edges.

Parameters:

kmnn (defaultdict) – The mutual k-nn indices as returned from mutual_knn_indices.

Returns:

edges – The list of edges corresponding to the mutual k-nns.

Return type:

list

netflow.pose.similarity.get_knn_indices_distances(d, n_neighbors=None)[source]#

Get indices of and distances to k-nearest neighbors

Parameters:
  • d (numpy.ndarray, (m, m)) – Symmetric distance matrix.

  • n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing sigmas, n_neighbors > 0. (Uses n_neighbors + 1, since each obs is its closest neighbor). If None, all neighbors are used.

Returns:

  • indices (numpy.ndarray, (m, n_neighbors)) – Matrix with indices of k-nearest neighbors in each row Note, this does not include itself in output)

  • distances (numpy.ndarray, (m, n_neighbors)) – Matrix with distance to k-nearest neighbors Note, this does not include itself in output.

netflow.pose.similarity.mutual_knn_edges(d, n_neighbors=None)[source]#

Get edges between indices of mutual k-nearest neighbors (nn) from distance matrix.

Note

Self is not included as one of the k-nns.

Parameters:
  • d (numpy.ndarray, (m, m)) – Symmetric distance matrix.

  • n_neighbors ({int, None}) – Number of mutual nns to include (does not include self), n_neighbors > 0. (Uses n_neighbors + 1, since each obs is its closest neighbor). If None, all neighbors are used (same as k-nns since all neighbors are mutually included).

Returns:

edges – The list of edges corresponding to the mutual k-nns.

Return type:

list

netflow.pose.similarity.mutual_knn_indices(d, n_neighbors=None)[source]#

Get indices of mutual k-nearest neighbors (nn).

Parameters:
  • d (numpy.ndarray, (m, m)) – Symmetric distance matrix.

  • n_neighbors ({int, None}) – Number of mutual nns to include (does not include self), n_neighbors > 0. (Uses n_neighbors + 1, since each obs is its closest neighbor). If None, all neighbors are used (same as k-nns since all neighbors are mutually included).

Returns:

kmnn_indices – Defaultdict keyed by row index referrencing the row indices of its mutual nns out of n_neighbors nns. Note, this does not include itself in output.

Return type:

defaultdict[`list]` of the form {m : [up to ``n_neighbors mutual nns]}``

netflow.pose.similarity.sigma_knn(keeper, key, label=None, n_neighbors=None, method='mean', return_nn=False)[source]#

Set sigma for each obs as the distance to its k-th neighbor from keeper.

Parameters:
  • keeper (netflow.Keeper) – The keeper object that stores the symmetric distance matrix of size (n_observations, n_observations).

  • key (str) – The label used to reference the distance matrix stored in keeper.distances, of size (n_observations, n_observations).

  • label (str) – Label used to store resulting sigmas in keeper.misc['sigmas_'+label]. If return_nn is True, nearest neighbor indices are stored in keeper.misc['nn_indices_'+label] and nearest neighbor distances are stored in keeper.misc['nn_distances_'+label].

  • n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing sigmas, n_neighbors > 0. (Uses n_neighbors + 1, since each obs is it’s closest neighbor). If None, all neighbors are used.

  • method ({'mean', 'median', 'max'}) –

    Indicate how to compute sigma.

    Options:

    • ’mean’ : mean of distance to n_neighbors nearest neighbors

    • ’median’ : median of distance to n_neighbors nearest neighbors

    • ’max’ : distance to n_neighbors-nearest neighbor

  • return_nn (bool) – If True, also return/store indices and distances of n_neighbors nearest neighbors.

Returns:

  • sigmas (numpy.ndarray, (n_observations, )) – The distance to the k-th nearest neighbor for all rows in d. Sigmas represent the kernel width representing each data point’s accessible neighbors.

  • indices (numpy.ndarray, (n_observations, )) – Indices of nearest neighbors where each row corresponds to an observation. Returned if return_nn is True.

  • distances (numpy.ndarray, (n_observations, n_neighbors + 1)) – Distances to nearest neighbors where each row corresponds to an obs. Returned if return_nn is True.

netflow.pose.similarity.sigma_knn_(d, n_neighbors=None, method='mean', return_nn=False)[source]#

Determine sigma for each obs as the distance to its k-th neighbor.

Parameters:
  • d (numpy.ndarray, (n_observations, n_observations)) – Symmetric distance matrix.

  • n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing sigmas, n_neighbors > 0. (Uses n_neighbors + 1, since each obs is it’s closest neighbor). If None, all neighbors are used.

  • method ({'mean', 'median', 'max'}) – Indicate how to compute sigma.

  • return_nn (bool) –

    If True, also return indices and distances of n_neighbors nearest neighbors.

    Options:

    • ’mean’ : mean of distance to n_neighbors nearest neighbors

    • ’median’ : median of distance to n_neighbors nearest neighbors

    • ’max’ : distance to n_neighbors-nearest neighbor

Returns:

  • sigmas (numpy.ndarray, (n_observations, )) – The distance to the k-th nearest neighbor for all rows in d. Sigmas represent the kernel width representing each data point’s accessible neighbors.

  • indices (numpy.ndarray, (n_observations, )) – Indices of nearest neighbors where each row corresponds to an observation. Returned if return_nn is True.

  • distances (numpy.ndarray, (n_observations, n_neighbors + 1)) – Distances to nearest neighbors where each row corresponds to an obs. Returned if return_nn is True.