netflow.probe.clustering#
Functions
|
Return average value of X within and between clusters. |
|
Perform clustering on each branch. |
|
Perform DBSCAN clustering. |
Extract higher resolution graph from POSE. |
|
|
Extract higher resolution graph from clustering on POSE branches. |
|
Compute Louvain communities on graph, intended for POSE |
|
Compute Louvain communities on graph and further partition restricted to existing classifier (e.g., branches intended for POSE) |
- netflow.probe.clustering.avg_cluster_edges(X, clustering, G=None)[source]#
Return average value of X within and between clusters.
- Parameters:
X (pandas.DataFrame) – The dobservation-pairwise data.
clustering (pandas.Series) – The clustering IDs indexed by the observation labels.
G (networkx.Graph) – (Optional) If provided, restrict to the average over over edges in the graph. Otherwise, the average is taken between all observations in the cluster. Expected to have node attribute ‘name’ corresponding to the name of each observation (represented as nodes).
- Returns:
R – The average values within and between clusters, where the index and columns are the cluster references.
- Return type:
pandas.DataFrame
- netflow.probe.clustering.branch_clustering(d, branch_record, clustering, min_branch_size=6, **kwargs)[source]#
Perform clustering on each branch.
Note: outliers should be returned from
clustering
with the value -1. Currently, such outliers are treated as a cluster.- Parameters:
d (pandas.DataFrame) – The symmetric, observation-pairwise distances.
branch_record (pandas.Series) – The branch assignment for each observation in
d
.clustering (func) – A function that performs clustering from given data. Must accept data matrix and key-word arguments)
min_branch_size (int) – Branches with <=
min_branch_size
observations are not considered for further clustering.kwargs (dict) – Key-word arguments passed to
clustering
.
- Returns:
records (pandas.Series) – The new cluster index for each observation.
branch_map (dict) – Maps the original branch index to indices of new child sub clusters.
- netflow.probe.clustering.dbscan_clustering(d, eps=0.5, min_samples=3, metric='precomputed', **kwargs)[source]#
Perform DBSCAN clustering.
- Parameters:
d (pandas.DataFrame) – The symmetric, observation-pairwise distances.
{eps – Key-word arguments passed to sklearn.cluster.DBSCAN
min_samples – Key-word arguments passed to sklearn.cluster.DBSCAN
metric – Key-word arguments passed to sklearn.cluster.DBSCAN
kwargs} – Key-word arguments passed to sklearn.cluster.DBSCAN
- Returns:
clusters – The cluster assignment for each observation in
d
.- Return type:
pandas.Series
- netflow.probe.clustering.high_res_branch_graph(G)[source]#
Extract higher resolution graph from POSE.
Each node represents a branch from the POSE, and edges are placed between nodes containing observations incident to edges between branches in the original POSE.
- Parameters:
G (nx.Graph) – The graph returned from netflow.TDA.construct_topology.
- Returns:
Ghr – The high-res graph.
- Return type:
nx.Graph
- netflow.probe.clustering.high_res_clustered_branch_graph(G, clustering)[source]#
Extract higher resolution graph from clustering on POSE branches.
Each node represents a cluster from a POSE branch, and edges are placed between nodes containing observations incident to edges between branches in the original POSE.
- Parameters:
G (nx.Graph) – The graph returned from netflow.TDA.construct_topology.
clustering (pandas.Series) – The cluster IDs, indexed by the observation labels.
- Returns:
Ghr – The high-res graph.
- Return type:
nx.Graph
- netflow.probe.clustering.louvain(G, weight='inverted_distance', resolution=1.0, seed=0, **kwargs)[source]#
Compute Louvain communities on graph, intended for POSE
Louvain communities are computed via
networkx.community.louvain_communities
- Parameters:
G (networkx.Graph) – The graph.
weight ({None, str}) – The edge attribute of the value used as the weight. If None, set to 1 for all edges (default value = ‘inverted_distance’).
resolution (float) – Influences algorithm preference for larger (resolution value greater than 1) or smaller (resolution value smaller than 1) communities.
seed (int) – Random generator state.
kwargs (dict) – Keyword arguments passed to
networkx.community.louvain_communities
.
- Returns:
lvp – Index of community each node is partitioned into, keyed by the nodes.
- Return type:
dict
- netflow.probe.clustering.louvain_paritioned(G, class_attr, louvain_attr=None, weight='inverted_distance', resolution=1.0, seed=0, **kwargs)[source]#
Compute Louvain communities on graph and further partition restricted to existing classifier (e.g., branches intended for POSE)
Louvain communities are computed via
networkx.community.louvain_communities
- Parameters:
G (networkx.Graph) – The graph.
class_attr (str) – The node attribute of the value of the pre-assigned class attribute against which the Louvain communities should be partitioned.
louvain_attr ({None, str}) – (Optional) Node attribute where Louvain community indices are stored. If provided, first check if the attribute already exists in
G
to use pre-compouted Louvain community idices (if it exists, remaining argument values are ignored). Otherwise, the computed Louvain community indices are stored in this node attribute. If not provided, Louvain communities are computed and not stored.weight ({None, str}) – The edge attribute of the value used as the weight. If None, set to 1 for all edges (default value = ‘inverted_distance’). (Ignored if pre-existing Louvain communiites were saved in
louvain_attr
.)resolution (float) – Influences algorithm preference for larger (resolution value greater than 1) or smaller (resolution value smaller than 1) communities. (Ignored if pre-existing Louvain communiites were saved in
louvain_attr
.)seed (int) – Random generator state. (Ignored if pre-existing Louvain communiites were saved in
louvain_attr
.)kwargs (dict) – Keyword arguments passed to
networkx.community.louvain_communities
. (Ignored if pre-existing Louvain communiites were saved inlouvain_attr
.)
- Returns:
f”{class_attr}_{louvain_attr}” : The class partitioned and Louvain community reference index in the form “{class-index}-{Louvain-index}” (
louvain_attr
defaults to “lvp” if not provided).louvain_attr - The Louvain community reference index, if provided.
- Return type:
The following node attributes are added to the graph
G