Density Based Clustering : DBSCAN

It stands for Density spatial clustering of application with noise. The main benefit of DBSCAN that it does not require the user to set the number of clusters a priori and it can capture clusters of complex shapes. It can identify a point that is not part of any cluster.

DBSCAN is somewhat slower than agglomerative clustering and k-Means, but still

scales to relatively large datasets. The way DBSCAN works is by identifying points that are in “crowded” regions of the feature space, where many data points are close together. These regions are referred to as dense regions in feature space. The idea behind DBSCAN is that clusters form dense regions of data, separated by regions that are relatively empty.

Points that are within a dense region are called core samples, and they are defined as

follows. There are two parameters in DBSCAN, min_samples, and eps. If there are at least

min_samples many data points within a distance of eps to a given data point, it’s

called a core sample. Core samples that are closer than the distance eps are put into

the same cluster by DBSCAN. The algorithm works by picking a point to start with.

It then finds all points with distance eps or less. If there are less than min_samples

points within distance eps or less, this point is labeled as noise, meaning that this

the point doesn’t belong to any cluster. If there are more than min_samples points within a distance of eps, the point is labeled a core sample and assigned a new cluster label. Then, all neighbors (within eps) of the point are visited. If they have not been assigned a cluster yet, they are assigned the new cluster label we just created. If they are core samples, their neighbors are visited in turn, and so on. The cluster grows until there are no more core-samples within distance eps of the cluster. Then another point, which hasn’t yet been visited, is picked, and the same procedure is repeated.

In the end, there are three kinds of points: core points, points that are within distance

eps of core points (called boundary points), and noise. When running the DBSCAN

algorithm on a particular dataset multiple times, the clustering of the core points is

always the same, and the same points will always be labeled as noise. However, a

boundary point might be neighbor to core samples of more than one cluster. Therefore,

the cluster membership of boundary points depends on the order in which

points are visited. Usually, there are only a few boundary points, and this slight dependence

on the order of points is not important.

My Revision Web Page

Pages

Density Based Clustering : DBSCAN

No comments:

Post a Comment

BLOGGER

Follow Me

Blog Archive

Popular

Tags

Report Abuse

About Me

Creating users (login and logout pages)

Subscribe

Followers

Blog Archive

Search This Blog

Cloud

Video Of Day

Ads

Popular Posts

Pages

Email Subscription

INSTAGRAM FEED