netflow.pose.similarity#
Functions
|
Convert distance matrix to symmetric similarity measure. |
Convert mutual k-nns to a list of edges. |
|
|
Get indices of and distances to k-nearest neighbors |
|
Get edges between indices of mutual k-nearest neighbors (nn) from distance matrix. |
|
Get indices of mutual k-nearest neighbors (nn). |
|
Set sigma for each obs as the distance to its k-th neighbor from keeper. |
|
Determine sigma for each obs as the distance to its k-th neighbor. |
- netflow.pose.similarity._distance_to_similarity(d, n_neighbors, method, sigmas=None, knn=False, indices=None)[source]#
Convert distance matrix to symmetric similarity measure.
\[K = \sqrt{2\sigma_i\sigma_j / (\sigma_i^2 + \sigma_j^2)}\exp{-(x-y)^2 / (\sigma_x^2 + \sigma_y^2)}.\]- Parameters:
d (numpy.ndarray, (n_observations, n_observations)) – Symmetric distance matrix.
n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing
sigmas
,n_neighbors > 0
. (Usesn_neighbors + 1
, since each obs is it’s closest neighbor). If None, all neighbors are used.method ({float, int, ‘mean’, ‘median’, ‘max’, ‘precomputed’}) –
Indicate how to compute sigma.
Options:
float : constant float to use as sigma
int : constant int to use as sigma
’mean’ : mean of distance to
n_neighbors
nearest neighbors’median’ : median of distance to
n_neighbors
nearest neighbors’max’ : distance to
n_neighbors
-nearest neighbor’precomputed’ : precomputed values passed to
sigmas
sigmas (numpy.ndarray, (n_observations, )) – Option to provide precomputed sigmas , ignored unless
method='precomputed'
.knn (bool) – If True, restrict similarity measure to be non-zero only between
n_neighbors
nearest neighbors.indices (numpy.ndarray, (n_observations, n_neighbors)) – Option to provide precomputed indices of
n_neighbors
nearest neighbors for each obs whenmethod
= ‘precomputed’ andknn
= True
- Returns:
K – Symmetric similarity measure.
- Return type:
numpy.ndarray, (n_observations, n_observations)
- netflow.pose.similarity.distance_to_similarity(keeper, key, n_neighbors, method, label=None, sigmas=None, knn=False, indices=None)[source]#
Convert distance matrix to symmetric similarity measure.
- Parameters:
keeper (netflow.Keeper) – The keeper object that stores the symmetric distance matrix of size (n_observations, n_observations).
key (str) – The label used to reference the distance matrix stored in
keeper.distances
, of size (n_observations, n_observations).n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing
sigmas
,n_neighbors > 0
. (Usesn_neighbors + 1
, since each obs is it’s closest neighbor). If None, all neighbors are used.method ({float, ‘mean’, ‘median’, ‘max’, ‘precomputed’}) –
Indicate how to compute sigma.
Options:
float : constant float to use as sigma
int : constant int to use as sigma
’mean’ : mean of distance to
n_neighbors
nearest neighbors’median’ : median of distance to
n_neighbors
nearest neighbors’max’ : distance to
n_neighbors
-nearest neighbor’precomputed’ : precomputed values passed to
sigmas
label (str) – Label used to store resulting similarity matrix of size (n_observations, n_observations) in
keeper.similarities
.sigmas (str) – Option to provide precomputed sigmas, ignored unless
method='precomputed'
. If provided, the precomputed sigmas are extracted fromkeeper.misc[sigmas]
as a numpy.ndarray of size (n_observations, ).knn (bool) – If True, restrict similarity measure to be non-zero only between
n_neighbors
nearest neighbors.indices ({None, str}) – Option to provide precomputed indices of
n_neighbors
nearest neighbors for each obs whenmethod
= ‘precomputed’ andknn
= True. If provided, the indices are extracted fromkeeper.misc[indices]
as a numpy.ndarray of size (n_observations, n_neighbors).
- Returns:
K – Symmetric similarity measure. if
label
is not None, this is stored inkeeper.similarities[label]
instead of being returned.- Return type:
numpy.ndarray, (n_observations, n_observations)
- netflow.pose.similarity.edges_from_mutual_knn_indices(kmnn)[source]#
Convert mutual k-nns to a list of edges.
- Parameters:
kmnn (defaultdict) – The mutual k-nn indices as returned from
mutual_knn_indices
.- Returns:
edges – The list of edges corresponding to the mutual k-nns.
- Return type:
list
- netflow.pose.similarity.get_knn_indices_distances(d, n_neighbors=None)[source]#
Get indices of and distances to k-nearest neighbors
- Parameters:
d (numpy.ndarray, (m, m)) – Symmetric distance matrix.
n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing
sigmas
,n_neighbors > 0
. (Usesn_neighbors + 1
, since each obs is its closest neighbor). If None, all neighbors are used.
- Returns:
indices (numpy.ndarray, (m, n_neighbors)) – Matrix with indices of k-nearest neighbors in each row Note, this does not include itself in output)
distances (numpy.ndarray, (m, n_neighbors)) – Matrix with distance to k-nearest neighbors Note, this does not include itself in output.
- netflow.pose.similarity.mutual_knn_edges(d, n_neighbors=None)[source]#
Get edges between indices of mutual k-nearest neighbors (nn) from distance matrix.
Note
Self is not included as one of the k-nns.
- Parameters:
d (numpy.ndarray, (m, m)) – Symmetric distance matrix.
n_neighbors ({int, None}) – Number of mutual nns to include (does not include self),
n_neighbors > 0
. (Usesn_neighbors + 1
, since each obs is its closest neighbor). If None, all neighbors are used (same as k-nns since all neighbors are mutually included).
- Returns:
edges – The list of edges corresponding to the mutual k-nns.
- Return type:
list
- netflow.pose.similarity.mutual_knn_indices(d, n_neighbors=None)[source]#
Get indices of mutual k-nearest neighbors (nn).
- Parameters:
d (numpy.ndarray, (m, m)) – Symmetric distance matrix.
n_neighbors ({int, None}) – Number of mutual nns to include (does not include self),
n_neighbors > 0
. (Usesn_neighbors + 1
, since each obs is its closest neighbor). If None, all neighbors are used (same as k-nns since all neighbors are mutually included).
- Returns:
kmnn_indices – Defaultdict keyed by row index referrencing the row indices of its mutual nns out of
n_neighbors
nns. Note, this does not include itself in output.- Return type:
defaultdict[`list]` of the form
{m : [up to ``n_neighbors
mutual nns]}``
- netflow.pose.similarity.sigma_knn(keeper, key, label=None, n_neighbors=None, method='mean', return_nn=False)[source]#
Set sigma for each obs as the distance to its k-th neighbor from keeper.
- Parameters:
keeper (netflow.Keeper) – The keeper object that stores the symmetric distance matrix of size (n_observations, n_observations).
key (str) – The label used to reference the distance matrix stored in
keeper.distances
, of size (n_observations, n_observations).label (str) – Label used to store resulting sigmas in
keeper.misc['sigmas_'+label]
. Ifreturn_nn
is True, nearest neighbor indices are stored inkeeper.misc['nn_indices_'+label]
and nearest neighbor distances are stored inkeeper.misc['nn_distances_'+label]
.n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing
sigmas
,n_neighbors > 0
. (Usesn_neighbors + 1
, since each obs is it’s closest neighbor). If None, all neighbors are used.method ({'mean', 'median', 'max'}) –
Indicate how to compute sigma.
Options:
’mean’ : mean of distance to
n_neighbors
nearest neighbors’median’ : median of distance to
n_neighbors
nearest neighbors’max’ : distance to
n_neighbors
-nearest neighbor
return_nn (bool) – If True, also return/store indices and distances of
n_neighbors
nearest neighbors.
- Returns:
sigmas (numpy.ndarray, (n_observations, )) – The distance to the k-th nearest neighbor for all rows in
d
. Sigmas represent the kernel width representing each data point’s accessible neighbors.indices (numpy.ndarray, (n_observations, )) – Indices of nearest neighbors where each row corresponds to an observation. Returned if
return_nn
is True.distances (numpy.ndarray, (n_observations,
n_neighbors + 1
)) – Distances to nearest neighbors where each row corresponds to an obs. Returned ifreturn_nn
is True.
- netflow.pose.similarity.sigma_knn_(d, n_neighbors=None, method='mean', return_nn=False)[source]#
Determine sigma for each obs as the distance to its k-th neighbor.
- Parameters:
d (numpy.ndarray, (n_observations, n_observations)) – Symmetric distance matrix.
n_neighbors ({int, None}) – K-th nearest neighbor (or number of nearest neighbors) to use for computing
sigmas
,n_neighbors > 0
. (Usesn_neighbors + 1
, since each obs is it’s closest neighbor). If None, all neighbors are used.method ({'mean', 'median', 'max'}) – Indicate how to compute sigma.
return_nn (bool) –
If True, also return indices and distances of
n_neighbors
nearest neighbors.Options:
’mean’ : mean of distance to
n_neighbors
nearest neighbors’median’ : median of distance to
n_neighbors
nearest neighbors’max’ : distance to
n_neighbors
-nearest neighbor
- Returns:
sigmas (numpy.ndarray, (n_observations, )) – The distance to the k-th nearest neighbor for all rows in
d
. Sigmas represent the kernel width representing each data point’s accessible neighbors.indices (numpy.ndarray, (n_observations, )) – Indices of nearest neighbors where each row corresponds to an observation. Returned if
return_nn
is True.distances (numpy.ndarray, (n_observations,
n_neighbors + 1
)) – Distances to nearest neighbors where each row corresponds to an obs. Returned ifreturn_nn
is True.