SELEKSI FITUR DAN OPTIMASI PARAMETER k-NN BERBASIS ALGORITMA GENETIKA PADA DATASET MEDIS

##plugins.themes.academic_pro.article.main##

Rizki Tri Prasetio

Abstract

Klasifikasi dataset medis adalah masalah data mining utama yang sedang diteliti selama satu dekade yang telah menarik beberapa peneliti dari berbagai bidang. Banyak algoritma klasifikasi dirancang untuk belajar dari data itu sendiri melalui proses pelatihan, karena pengetahuan ahli untuk menentukan parameter klasifikasi sulit. Penelitian ini mengusulkan metodologi yang didasarkan pada paradigma data mining. Paradigma ini mengintegrasikan pencarian heuristic yang terinspirasi dari evolusi alam yang disebut algoritma genetika dengan algoritma pembelajaran yang paling sederhana dan paling banyak digunakan, k nearest neighbor. Algoritma genetika digunakan untuk pemilihan fitur dan optimasi parameter sedangkan k-nearest neighbor digunakan sebagai algoritma klasifikasi. Metode yang diusulkan diujicobakan pada lima dataset medis dari UCI Machine Learning Repository untuk menangani klasifikasi dataset medis. Hasil percobaan menunjukkan bahwa metode yang diusulkan mampu mencapai kinerja yang baik, dibandingkan dengan hasil pengklasifikasi lain dengan peningkatan yang signifikan dengan nilai p uji-t 0.0011.

##plugins.themes.academic_pro.article.details##

References

Abe, S. (2005). Modified Backward Feature Selection by Cross Validation. (pp. 163-168). Bruges: European Symposium on Artificial Neural Networks.
Abe, S. (2010). Support Vector Machine for Pattern Classification (Second Edition ed.). New York: Springer London.
Amato, F., Lopez, A., Pena-Mendez, E. M., Vanhara, P., Hampi, A., & Havel, J. (2013). Artificial neural networks in medical diagnosis. Journal of Applied Biomedicine, 11(2), 47-58.
Antal, B., & Hajdu, A. (2014). An ensemble-based system for automatic screening of diabetic retinopathy. Knowledge-Based Systems, 60, 20-27.
Ayres-de-campos, D., Bernardes, J., Garrido, A., Marques-de-Sa, J., & Pereira-Leite, L. (2000). SisPorto 2.0: A Program for Automated Analysis of Cardiotocograms. The Journal of Maternal-Fetal Medicine, 9, 311-318.
Babu, G. S., & Suresh, S. (2013). Meta-cognitive RBF network and its projection based learning algorithm for classification problems. Applied Soft Computing Journal, 13(1), 654-666.
Bharti, K. K., & Singh, P. K. (2014). A three-stage unsupervised dimension reduction method for text clustering. Journal of Computational Science, 5(2), 156-169.
Blanchet, F. G., Legendre, P., & Borcard, D. (2008). Forward Selection of Explanatory Variables. Ecology, 89(9), 2623-2632.
Brameier, M., & Banzhaf, W. (2001). A comparison of linear genetic programming and neural networks in medical data mining. IEEE Transactions on Evolutionary Computation , 5(1), 17-26.
Chang, P.-C., Lin, J.-J., & Liu, C.-H. (2012). An attribute weight assignment and particle swarm optimization algorithm for medical database classifications. Computer Methods and Programs in Biomedicine, 107(3), 382-392.
Derksen, S., & Keselman, H. J. (1992). Backward, Forward and Stepwise Automated Subset Selection Algorithms. British Journal of Mathematical and Statistical Psychology, 45, 265-282.
Dyer, E. L., Sankaranarayanan, A. C., & Baraniuk, R. G. (2013). Greedy Feature Selection for Subspace Clustering. Journal of Machine Learning Research, 14, 2487-2517.
Farahat, A. K., Ghodsi, A., & Kamel, M. S. (2013). Efficient Greedy Feature Selection for Unsupervised Learning. Knowledge Information System, 35, 285-310.
Gorunescu, F. (2011). Data Mining: Concepts, Models, and Techniques. Verlag Berlin Heidelberg: Springer.
Guyon, I., & Elisseeff, A. (2003). An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3, 1157-1182.
Han, J., Kamber, M., & Pei, J. (2012). Data Mining Concepts and Techniques. San Fransisco: Morgan Kauffman.
Harrington, P. (2012). Machine Learning in Action. New York: Manning Publication.
Herliana, A., Setiawan, V. A., & Prasetio, R. T. (2018). Penerapan Inferensi Backward Chaining Pada Sistem Pakar Diagnosa Awal Penyakit Tulang. Jurnal Informatika, 5(1), 50-60.
Holland, J. H. (1975). Adaption in Natural and Artificial Systems. Cambridge: MIT Press.
Inbarani, H. H., Azar, A. T., & Jothi, G. (2014). Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Computer Methods and Programs in Biomedicine, 113(1), 175-185.
Jabbar, M. A., Deekshatulu, B. L., & Chandra, P. (2013). Classification of Heart Disease Using K- Nearest Neighbor and Genetic Algorithm. Procedia Technology, 10, 85-94.
Jain, A., & Zongker, D. (1997). Feature Selection: Evaluation, Application and Small Sample Performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2), 153-158.
Jirapech-Umpai, T., & Aitken, S. (2005). Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinformatics, 6, 148.
Kurgan, L. A., Cios, K. J., Tadeusiewicz, R., Ogiela, M., & Goodenday, L. S. (2001). Knowledge discovery approach to automated Cardiac SPECT Diagnosis. Artificial Intelligence in Medicine, 23, 149-169.
Larose, D. T. (2005). Discovering Knowledge in Data: An Introduction to Data Mining. New Jersey: John Wiley & Sons, Inc.
Larose, D. T. (2006). Data Mining Methods and Models. New Jersey: John Wiley & Sons, Inc.
Liu, Z., Chai, T., & Tang, J. (2015). Multi-frequency signal modeling using empirical mode decomposition and PCA with application to mill load estimation. Neurocomputing, 169, 392-402.
Maimon, O., & Rokach, L. (2010). Data Mining and Knowledge Discovery Handbook (Second Edition ed.). New York: Springer.
Man, K. F., Tang, K. S., & Kwong, S. (1996). Genetic Algorithms: Concepts and Applications. IEEE Transactions on Industrial Electronics, 43(5), 519-534.
Mangasarian, O. L., Street, W. N., & Wolberg, W. H. (1995). Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), 570-577.
Mazurowski, M. A., Habas, P. A., Zurada, J. M., Lo, J. Y., Baker, J. A., & Tourassi, G. D. (2008). Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Networks, 21(2), 427-436.
Nowe, A. (2014). Genetic Algorithms (Encyclopedia of Astrobiology ed.). Berlin: Springer.
Prasetio, R. T. (2014). Inventory Control Using Statistics Forecasting on Manufacture Company. Jurnal Informatika, 1(2).
Prasetio, R. T., Rismayadi, A. A., & Anshori, I. F. (2018). Implementasi Algoritma Genetika pada k-nearest neighbours untuk Klasifikasi Kerusakan Tulang Belakang. Jurnal Informatika, 5(2), 186-194.
Prasetio, R. T., & Pratiwi. (2015). Penerapan Teknik Bagging pada Algoritma Klasifikasi untuk Mengatasi Ketidakseimbangan Kelas pada Dataset Medis. Informatika, 2(2), 395-403.
Prasetio, R. T., & Riana, D. (2015). A Comparison of Classification Methods in Vertebral Column Disorder with the Application of Genetic Algorithm and Bagging. Bandung: IEEE.
Prasetio, R. T., & Ripandi, E. (2019). Optimasi Klasifikasi Jenis Hutan Menggunakan Deep Learning Berbasis Optimize Selection. Jurnal Informatika, 6(1), 100-106.
Prasetio, R. T., & Susanti, S. (2019). Prediksi Harapan Hidup Pasien Kanker Paru Pasca Operasi Bedah Toraks Menggunakan Boosted k-Nearest Neighbor. JURNAL RESPONSIF: Riset Sains & Informatika, 1(1), 64-69.
Raymer, M. L., Punch, W. F., Goodman, E. D., Kuhn, L. A., & Jain, A. K. (2000). Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation, 4(2), 164-171.
Riana, D. (2018). Classification of Pap Smear. International Journal of Electrical and Computer Engineering (IJECE), 8(6), 5415-5424.
Setiyorini, T., & Wahono, R. S. (2015). Penerapan Metode Bagging untuk Mengurangi Data Noise pada Neural Network untuk Estimasi Kuat Tekan Beton. Journal of Intelligent Systems, 1(1), 37-42.
Shah, S., & Kusiak, A. (2007). Cancer gene search with data-mining and genetic algorithms. Computers in Biology and Medicine, 37(2), 251-261.
Shilaskar, S., & Ghatol, A. (2013). Dimensionality Reduction Techniques for Improved Diagnosis of Heart Disease. International Journal of Computer Applications , 61(5), 1-8.
Subbulakhsmi, C. V., & Deepa, S. N. (2015). Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier. The Scientific World Journal, 2015, 1-12.
Suguna, N., & Thanushkodi, K. (2010). An Improved k-Nearest Neigbor Classification Using Genetic Algorithm. IJCSI International Journal of Computer Science, 7(2), 18-44.
Unal, Y., & Kocer, E. (2013). Diagnosis of Pathology on the Vertebral Column with Backpropagation and Naive Bayes Classifier. (pp. 278-281). Turkey: IEEE.
Vafaie, H., & Imam, I. F. (1994). Feature Selection Method: Genetic Algorithms vs Greedy-like Search. Louisville: Proceedings of the 3rd International Fuzzy Systems and Intelligent Control Conference.
Witten, I. H., Frank, E., & Hall, M. A. (2011). Data Mining: Practical Machine Learning Tools and Technique (Third Edition ed.). Amsterdam: Elsevier Inc.
Wu, X., & Kumar, V. (2009). The Top Ten Algorithms in Data Mining. Boca Raton: Taylor & Francis Group, LLC.
Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., & Yang, Q. (2008). Top 10 Algorithms in Data Mining. London: Springer-Verlag.
Yang, J., & Honavar, V. (1998). Feature Subset Selection Using a Genetic Algorithm. Feature Extraction, Construction and Selection, 117-136.