Machine Learning–Based Prediction of Oil Palm Plantation Yield Using Random Forest Regression
DOI:
https://doi.org/10.55227/ijhet.v4i5.572Abstract
The rapid development of digital technology has led to a significant increase in the volume and diversity of customer transaction data, making big data a crucial asset for organizations in designing business strategies. However, abundant data will not provide meaningful value if it is not analyzed appropriately. This study aims to implement data science techniques to extract insights from big data of customer transactions using the Python programming language. The research adopts a descriptive–exploratory quantitative approach by utilizing customer transaction datasets as secondary data. The analysis stages include data preprocessing, exploratory data analysis (EDA), and the application of data science algorithms such as clustering and predictive analysis using Python libraries including pandas, numpy, matplotlib, and scikit-learn. The results show that the data science approach is capable of identifying customer behavior patterns based on spending value, transaction frequency, and purchasing habits over a specific period. Furthermore, the clustering model successfully groups customers into several segments with distinct characteristics, providing valuable insights that can be used as a basis for more effective and personalized marketing decision-making. Therefore, this study confirms that the implementation of data science using Python can assist companies in transforming big data of customer transactions into high-value information that supports improved business strategies and customer retention.
Downloads
References
Andrian. (2025). PREDIKSI HASIL PANEN KAKAO DI DESA MINANGA MENGGUNAKAN ALGORITMA RANDOM FOREST REGRESSION PREDICTION. Https://Repository.Unsulbar.Ac.Id/Id/Eprint/1610/2/ANDRIAN_organized.Pdf, 3.
Bishnoi, S., & Hooda, B. K. (2022). Decision Tree Algorithms and their Applicability in Agriculture for Classification. Journal of Experimental Agriculture International, 44(7), 20–27. https://doi.org/10.9734/jeai/2022/v44i730833
Breiman, L. (2020). Random Forests. Machine Learning, 45, 5–32. https://doi.org/10.1007/978-3-030-62008-0_35
Danny, M., & Muhidin, A. (2025). Optimasi Algoritma Random Forest untuk Prediksi Eksport Kelapa Sawit Global. Https://Hostjournals.Com/Bulletincsr/Article/View/744?Utm_source=chatgpt.Com, 5(5), 1129–1138.
Firdawanti, A. R., Sumertajaya, I. M., & Sartono, B. (2020). Random Forest Lag Distributed Regression for Forecasting on Palm Oil Production. CSA 2019: Proceedings of the 1st International Conference on Statistics and Analytics. https://doi.org/10.4108/eai.2-8-2019.2290493
Gómez-Méndez, I., & Joly, E. (2023). Regression with missing data, a comparison study of techniques based on random forests. Journal of Statistical Computation and Simulation, 93(12), 1924–1949. https://doi.org/10.1080/00949655.2022.2163646
Hastie, T., Tibshirani, R., & Friedman, J. (2021). The Elements of Statistical Learning : Data Mining, Inference and Prediction (2nd Ed.). Springer. https://doi.org/10.3390/math11194129
Heizer, J., Render, B., & Munson, C. (2024). Operations Management : Sustainability and Supply Chain Management (19th Ed.). Pearson.
Hermawan, R., Suarna, N., Ali, I., & Rohman, D. (2025). Optimasi Prediksi Omset Penjualan Pada Pabrik Olahan Tahu Menggunakan Algoritma Regresi Linear. Jurnal Informatika Dan Teknik Elektro Terapan, 13(1). https://doi.org/10.23960/jitet.v13i1.5888
Hidayah, K. T., Arifitama, B., & Permana, S. D. H. (2024). Klasifikasi Penyakit Kanker Serviks Berdasarkan Kebiasaan dan Rekam Medis dengan Metode C4.5. Jurnal Nasional Teknologi Dan Sistem Informasi, 10(1), 36–44. https://doi.org/10.25077/teknosi.v10i1.2024.36-44
Imawan, R., Sidhi, E. Y., Sutiknjo, T. D., & Aji, S. B. (2022). Perbandingan Pendapatan Usahatani Kelapa Sawit Pola Swadaya Pada Blok A Dan Blok B Desa Bumi Jaya Kecamatan Seruyan Tengah Kabupaten Seruyan Kalimantan Tengah. JINTAN : Jurnal Ilmiah Pertanian Nasional, 2(2), 137. https://doi.org/10.30737/jintan.v2i2.2776
Jackins, V., Vimal, S., Kaliappan, M., & Lee, M. Y. (2021). AI-Based Smart Prediction of Clinical Disease Using Random Forest Classifier and Naive Bayes. Journal of Supercomputing, 77(5), 5198–5219. https://doi.org/10.1007/s11227-020-03481-x
Justam, J., Jamilah, N., Umar, S. M., Erlita, E., & Ramba, J. (2024). Penerapan Algoritma C4.5 dan Random Forest untuk Pemetaan Kerusakan Jalan dengan WebGIS. Jurnal Ilmiah Sistem Informasi Dan Teknik Informatika (JISTI), 7(2), 326–339. https://doi.org/10.57093/jisti.v7i2.270
Khan, N., Kamaruddin, M. A., Ullah Sheikh, U., Zawawi, M. H., Yusup, Y., Bakht, M. P., & Mohamed Noor, N. (2022). Prediction of Oil Palm Yield Using Machine Learning In The Perspective of Fluctuating Weather and Soil Moisture Conditions: Evaluation of a Generic Workflow. Plants, 11(13). https://doi.org/10.3390/plants11131697
Monita, C. F., & Zebua, D. D. N. (2023). Faktor-Faktor yang Mempengaruhi Produktivitas Kelapa Sawit di PT. Mustika Agung Sentosa. JURNAL MANAJEMEN AGRIBISNIS (Journal Of Agribusiness Management), 11(01), 231. https://doi.org/10.24843/jma.2023.v11.i01.p18
Nain, F. N. M., Malim, N. H. A. H., Abdullah, R., Rahim, M. F. A., Mokhtar, M. A. A., & Fauzi, N. S. M. (2022). A Review of An Artificial Intelligence Framework For Identifying The Most Effective Palm Oil Prediction. Algorithms, 15(6), 1–54. https://doi.org/10.3390/a15060218
Norhalimi, M., & Siswa, T. A. Y. (2022). Optimasi Seleksi Fitur Information Gain pada Algoritma Naïve Bayes dan K-Nearest Neighbor. JISKA (Jurnal Informatika Sunan Kalijaga), 7(3), 237–255. https://doi.org/10.14421/jiska.2022.7.3.237-255
Pamuji, F. Y., & Ramadhan, V. P. (2021). Komparasi Algoritma Random Forest dan Decision Tree untuk Memprediksi Keberhasilan Immunotheraphy. Jurnal Teknologi Dan Manajemen Informatika, 7(1), 46–50. https://doi.org/10.26905/jtmi.v7i1.5982
Perkovic, L. (2022). Introduction to Computing Using Python: An Application Development Focus (2nd Ed.). Wiley.
Prasakti, L. A., & Juliane, C. (2023). Penerapan Forecasting Menggunakan Metode Time Series Untuk Menentukan Proyeksi Sales di Perusahaan Manufacturing Furniture. Building of Informatics, Technology and Science (BITS), 4(4). https://doi.org/10.47065/bits.v4i4.2802
Primajaya, A., & Sari, B. N. (2020). Random Forest Algorithm for Prediction of Precipitation. Indonesian Journal of Artificial Intelligence and Data Mining, 1(1), 27–31. https://doi.org/10.24014/ijaidm.v1i1.4903
Rhodes, J. S., Cutler, A., & Moon, K. R. (2023). Geometry- and Accuracy-Preserving Random Forest Proximities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9), 10947–10959. https://doi.org/10.1109/TPAMI.2023.3263774
Saadah, S., & Salsabila, H. (2021). Prediksi Harga Bitcoin Menggunakan Metode Random Forest. Jurnal Komputer Terapan, 7(1), 24–32. https://doi.org/10.35143/jkt.v7i1.4618
Salman, H. A., Kalakech, A., & Steiti, A. (2024). Random Forest Algorithm Overview. Babylonian Journal of Machine Learning, 2024, 69–79. https://doi.org/10.58496/bjml/2024/007
Santra, A. K., & Christy, C. J. (2022). An Efficient Document Clustering by Optimization Technique for Cluster Optimality. International Journal of Computer Applications, 43(16), 15–20. https://doi.org/10.5120/6187-8666
Sirtin, A. A., Makky, M., Santosa, & Cherie, D. (2025). Non-Destructive Evaluation Quality of Oil Palm Fresh Fruit Bunch (FFB) (Elaeis guineensis Jacq.) Using Thermal Imaging in the Grading Process. Eksakta : Berkala Ilmiah Bidang MIPA, 26(03), 312–328. https://doi.org/10.24036/eksakta/vol26-iss03/611
Sulistya, Y. I., Musdholifah, A., Sapuletea, C., Br Bangun, E. T., Hamda, H., Anjani, S., & Septiadi, A. D. (2024). Prediction and Analysis of Rice Production and Yields Using Ensemble Learning Techniques. ILKOM Jurnal Ilmiah, 16(2), 115–124. https://doi.org/10.33096/ilkom.v16i2.1948.115-124
Sumartini, S. H., & Purnam, S. W. (2022). Penggunaan Metode Classification and Regression Trees (CART) untuk Klasifikasi Rekurensi Pasien Kanker Serviks di RSUD Dr. Soetomo Surabaya. Jurnal Sains Dan Seni ITS, 4(2), 211–216. https://doi.org/10.12962/j23373520.v4i2.10673
Syairozi, M. I. (2021). ANALISIS KEMISKINAN DI SEKTOR PERTANIAN (Studi Kasus Komoditas Padi di Kabupaten Malang). Media Ekonomi, 28(2), 113–128. https://doi.org/10.25105/me.v28i2.7169
Tjandra, W., Ginting, C., & Gunawan, S. (2023). Penentuan Dosis Pupuk Berdasarkan Data Tonase Tandan Buah Segar (TBS) pada Perkebunan Kelapa Sawit. AGROISTA : Jurnal Agroteknologi, 7(1), 8–16. https://doi.org/10.55180/agi.v7i1.341
Wijaya, S., & Fauziah, F. (2023). Analysis of The Comparison Between Linear Regression, Random Forest, and Logistic Regression Methods in Predicting Crude Palm Oil (CPO) Price. Brilliance: Research of Artificial Intelligence, 3(2), 343–350. https://doi.org/10.47709/brilliance.v3i2.3334.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Mayang Modelina Cynthia, Sigit Prabowo, Jheki Pranta Singarimbun, Muhammad Akbar Firdaus, Hafizh Al-Ghifari Rangkuti Rangkuti, Rido Favorit Saronitehe Waruwu, Muhammad Amin

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
























