This email address is being protected from spambots. You need JavaScript enabled to view it.
 
+7 (4912) 72-03-73
 
Интернет-портал РГРТУ: https://rsreu.ru

UDC 004.855.5

SELECTION OF THE NUMBER OF CLUSTERS IN K-MEAN ALGORITHM USING CLUSTER SOLUTION ENTROPY

V. I. Oreshkov, Ph.D. (Tech.), associate professor, CAD department, RSREU, Ryazan, Russia; orcid.org/0000-0003-0316-4927,

e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

The article discusses the problem of choosing the number of clusters in popular k-means clustering algorithm. It is noted that an unsuccessful choice of this hyper parameter can lead to the creation of a cluster structure the meaningful interpretation of which in the process of data mining leads to false conclusions and making incorrect management decisions based on them. The aim of the work is to develop a method for automatic selection of the number of clusters for k-means algorithm. The article provides an analytical review of the known methods for determining the number of clusters, their advantages and disadvantages being noted. The proposed approach is based on the elbow method, which uses the entropy of cluster solutions instead of the mean squares of clustering error. A practical example shows that the use of cluster solution entropy makes it possible to choose the number of clusters even in the case when the approach based on clustering error turns out to be untenable

Key words: : data mining, machine learning, unsupervised learning, teaching example, clustering, cluster, centroid, entropy.

 Download