Top 10 algorithms in data mining

Abstract

📜 Abstract

The IEEE International Conference on Data Mining (ICDM) is the world’s premier research conference in data mining. To promote and disseminate research to a wider audience, we undertook the effort to identify some of the most influential algorithms that have been widely used in the data mining community. We present the Top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms in data mining have become the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, its impact, and review three representative applications.

Description

✨ Summary

The paper “Top 10 algorithms in data mining” published in December 2007 in the Knowledge and Information Systems journal, identifies and describes ten algorithms that were recognized as being amongst the most influential in the field of data mining as of 2006. These algorithms include C4.5, k-Means, Support Vector Machines (SVM), Apriori, Expectation-Maximization (EM), PageRank, AdaBoost, k-Nearest Neighbors (kNN), Naive Bayes, and CART. Each algorithm is detailed with a description of its methodology, and examples of its impact and representative applications.

This publication has influenced the field significantly by standardizing a core set of algorithms that are now frequently taught in data science curricula and utilized within academia and industry for various data mining tasks. It has been cited by numerous subsequent research articles that seek to leverage these foundational algorithms in new domains or to improve their performance and efficiency in high-dimensional data contexts.

References in which this paper has been cited include: - “Frequent Pattern Mining: Current Status and Future Directions” (Published in Data Mining and Knowledge Discovery, 2014) Link to source - “An Extensive Experimental Survey of Regression Methods” (Published in Neural Networks, 2009) Link to source - “A survey of outlier detection methodologies” (Published in Data Mining and Knowledge Discovery, 2009) Link to source

The influence of this paper can also be observed through its integration into many educational platforms and its recurring mention in foundational texts within the data mining and machine learning disciplines.