Clustering:
Clustering is an unsupervised
learning algorithm. It group data on similarity to each other. It finds simpler data sets. It provides categorical class labels. It is used in retail
marketing, banking, insurance, publication, medicine, biology and many other
fields.
We need clustering because of the
following reasons:
(i) Exploratory data analysis
(ii) Summary generation
(iii) Outlier detection
(iv) Finding duplicates
(v) Pre-processing step
The algorithms which are generally
used in clustering are Partition based, Hierarchical and density-based
clustering.
1. Partition
based clustering: It is a relatively efficient algorithm. Some partition-based clustering algorithms are k-mean, k-median, and fuzzy c-mean.
2. Hierarchical
clustering: It produces a tree of clustering. Agglomerative and Divisive are some popular hierarchical
algorithms.
3. Density-based clustering: It produces
arbitrary shaped clusters. DBSCAN is a popular density-based clustering
algorithm.
No comments:
Post a Comment
If you have any doubt, let me know