Classification of Observations through Combination of the Dimension Reduction and the Cluster Analysis

Hyeuk Kim


Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.

Full Text:



J. T. Lee and H.-S. Cho, “An analysis on the home-field advantage in Korean pro-baseball with logistic regression model,” Journal of the Korean Data and Information Science Society, 21(6), pp. 1041-1049, 2009.

H.-Y. Lee and S.-K. Lee, “Relation analysis between victory and the records of Korean professional baseball,” Journal of the Korean Data Analysis Society, 10(6B), pp. 3413-3422, 2009.

S.-K. Shin, K.-C. Park, Y.-S. Cho, and S.-H. Choi, “A study on analyzing factors affecting the outcome of Korean professional baseball games: A case of Samsung Lions,” Journal of the Korean Data Analysis Society, 9(4), pp. 2071-2083, 2007.

Y.-S. Cho, J.-T. Han, C. Park, and T.-Y. Heo, “A statistical analysis of professional baseball team data: The case of the Lotte Giants,” The Korean Journal of Applied Statistics, 23(6), pp. 1191-1199, 2010.

D. Lutz, “A cluster analysis of NBA players,” in Proceedings in MIT Sloan Sports Analytics Conference, 2012.

S. Han, S. Cheon, and S. Jin, “Clustering Korean professional basketball players by using k-medoids clustering,” Journal of the Korean Data Analysis Society, 10(6B), pp. 3423-3433, 2008.

T.-H. Choi and Y.-S. Choi, “A study on the relationship between skill and competition score factors of KLPGA players using canonical correlation biplot and cluster analysis,” The Korean Journal of Applied Statistics, 21(3), pp. 429-439, 2008.

B. James, The Bill James historical baseball abstract, Villard, 1985.

J. Furtado, The 1999 big bad baseball annual: The book baseball deserves, Maters Press, 1999.

P. J. Rousseeuw, “Silhouette: A graphical aid to the interpretation and validation of cluster analysis,” Computational and Applied Mathematics, 20, pp. 53-65, 1987.

R. Tibshirani, G. Walther, and T. Hastie, “Estimating the number of clusters in a data set via the gap statistic,” Journal of the Royal Statistical Society: Series B, 63(2), pp. 411-423, 2001.


  • There are currently no refbacks.

© International Journals of Advanced Research in Computer Science and Software Engineering (IJARCSSE)| All Rights Reserved | Powered by Advance Academic Publisher.