Machine Learning–Based Prediction of Oil Palm Plantation Yield Using Random Forest Regression

Authors

  • Mayang Modelina Cynthia Teknologi Informasi, Universitas Panca Budi
  • Sigit Prabowo Teknologi Informasi, Universitas Panca Budi
  • Jheki Pranta Singarimbun Teknologi Informasi, Universitas Panca Budi
  • Muhammad Akbar Firdaus Teknologi Informasi, Universitas Panca Budi
  • Hafizh Al-Ghifari Rangkuti Rangkuti Teknologi Informasi, Universitas Panca Budi
  • Rido Favorit Saronitehe Waruwu Teknologi Informasi, Universitas Panca Budi
  • Muhammad Amin Teknologi Informasi, Universitas Panca Budi

DOI:

https://doi.org/10.55227/ijhet.v4i5.572

Abstract

The rapid development of digital technology has led to a significant increase in the volume and diversity of customer transaction data, making big data a crucial asset for organizations in designing business strategies. However, abundant data will not provide meaningful value if it is not analyzed appropriately. This study aims to implement data science techniques to extract insights from big data of customer transactions using the Python programming language. The research adopts a descriptive–exploratory quantitative approach by utilizing customer transaction datasets as secondary data. The analysis stages include data preprocessing, exploratory data analysis (EDA), and the application of data science algorithms such as clustering and predictive analysis using Python libraries including pandas, numpy, matplotlib, and scikit-learn. The results show that the data science approach is capable of identifying customer behavior patterns based on spending value, transaction frequency, and purchasing habits over a specific period. Furthermore, the clustering model successfully groups customers into several segments with distinct characteristics, providing valuable insights that can be used as a basis for more effective and personalized marketing decision-making. Therefore, this study confirms that the implementation of data science using Python can assist companies in transforming big data of customer transactions into high-value information that supports improved business strategies and customer retention.

Downloads

Download data is not yet available.

References

Andrian. (2025). PREDIKSI HASIL PANEN KAKAO DI DESA MINANGA MENGGUNAKAN ALGORITMA RANDOM FOREST REGRESSION PREDICTION. Https://Repository.Unsulbar.Ac.Id/Id/Eprint/1610/2/ANDRIAN_organized.Pdf, 3.

Bishnoi, S., & Hooda, B. K. (2022). Decision Tree Algorithms and their Applicability in Agriculture for Classification. Journal of Experimental Agriculture International, 44(7), 20–27. https://doi.org/10.9734/jeai/2022/v44i730833

Breiman, L. (2020). Random Forests. Machine Learning, 45, 5–32. https://doi.org/10.1007/978-3-030-62008-0_35

Danny, M., & Muhidin, A. (2025). Optimasi Algoritma Random Forest untuk Prediksi Eksport Kelapa Sawit Global. Https://Hostjournals.Com/Bulletincsr/Article/View/744?Utm_source=chatgpt.Com, 5(5), 1129–1138.

Firdawanti, A. R., Sumertajaya, I. M., & Sartono, B. (2020). Random Forest Lag Distributed Regression for Forecasting on Palm Oil Production. CSA 2019: Proceedings of the 1st International Conference on Statistics and Analytics. https://doi.org/10.4108/eai.2-8-2019.2290493

Gómez-Méndez, I., & Joly, E. (2023). Regression with missing data, a comparison study of techniques based on random forests. Journal of Statistical Computation and Simulation, 93(12), 1924–1949. https://doi.org/10.1080/00949655.2022.2163646

Hastie, T., Tibshirani, R., & Friedman, J. (2021). The Elements of Statistical Learning : Data Mining, Inference and Prediction (2nd Ed.). Springer. https://doi.org/10.3390/math11194129

Heizer, J., Render, B., & Munson, C. (2024). Operations Management : Sustainability and Supply Chain Management (19th Ed.). Pearson.

Hermawan, R., Suarna, N., Ali, I., & Rohman, D. (2025). Optimasi Prediksi Omset Penjualan Pada Pabrik Olahan Tahu Menggunakan Algoritma Regresi Linear. Jurnal Informatika Dan Teknik Elektro Terapan, 13(1). https://doi.org/10.23960/jitet.v13i1.5888

Hidayah, K. T., Arifitama, B., & Permana, S. D. H. (2024). Klasifikasi Penyakit Kanker Serviks Berdasarkan Kebiasaan dan Rekam Medis dengan Metode C4.5. Jurnal Nasional Teknologi Dan Sistem Informasi, 10(1), 36–44. https://doi.org/10.25077/teknosi.v10i1.2024.36-44

Imawan, R., Sidhi, E. Y., Sutiknjo, T. D., & Aji, S. B. (2022). Perbandingan Pendapatan Usahatani Kelapa Sawit Pola Swadaya Pada Blok A Dan Blok B Desa Bumi Jaya Kecamatan Seruyan Tengah Kabupaten Seruyan Kalimantan Tengah. JINTAN : Jurnal Ilmiah Pertanian Nasional, 2(2), 137. https://doi.org/10.30737/jintan.v2i2.2776

Jackins, V., Vimal, S., Kaliappan, M., & Lee, M. Y. (2021). AI-Based Smart Prediction of Clinical Disease Using Random Forest Classifier and Naive Bayes. Journal of Supercomputing, 77(5), 5198–5219. https://doi.org/10.1007/s11227-020-03481-x

Justam, J., Jamilah, N., Umar, S. M., Erlita, E., & Ramba, J. (2024). Penerapan Algoritma C4.5 dan Random Forest untuk Pemetaan Kerusakan Jalan dengan WebGIS. Jurnal Ilmiah Sistem Informasi Dan Teknik Informatika (JISTI), 7(2), 326–339. https://doi.org/10.57093/jisti.v7i2.270

Khan, N., Kamaruddin, M. A., Ullah Sheikh, U., Zawawi, M. H., Yusup, Y., Bakht, M. P., & Mohamed Noor, N. (2022). Prediction of Oil Palm Yield Using Machine Learning In The Perspective of Fluctuating Weather and Soil Moisture Conditions: Evaluation of a Generic Workflow. Plants, 11(13). https://doi.org/10.3390/plants11131697

Monita, C. F., & Zebua, D. D. N. (2023). Faktor-Faktor yang Mempengaruhi Produktivitas Kelapa Sawit di PT. Mustika Agung Sentosa. JURNAL MANAJEMEN AGRIBISNIS (Journal Of Agribusiness Management), 11(01), 231. https://doi.org/10.24843/jma.2023.v11.i01.p18

Nain, F. N. M., Malim, N. H. A. H., Abdullah, R., Rahim, M. F. A., Mokhtar, M. A. A., & Fauzi, N. S. M. (2022). A Review of An Artificial Intelligence Framework For Identifying The Most Effective Palm Oil Prediction. Algorithms, 15(6), 1–54. https://doi.org/10.3390/a15060218

Norhalimi, M., & Siswa, T. A. Y. (2022). Optimasi Seleksi Fitur Information Gain pada Algoritma Naïve Bayes dan K-Nearest Neighbor. JISKA (Jurnal Informatika Sunan Kalijaga), 7(3), 237–255. https://doi.org/10.14421/jiska.2022.7.3.237-255

Pamuji, F. Y., & Ramadhan, V. P. (2021). Komparasi Algoritma Random Forest dan Decision Tree untuk Memprediksi Keberhasilan Immunotheraphy. Jurnal Teknologi Dan Manajemen Informatika, 7(1), 46–50. https://doi.org/10.26905/jtmi.v7i1.5982

Perkovic, L. (2022). Introduction to Computing Using Python: An Application Development Focus (2nd Ed.). Wiley.

Prasakti, L. A., & Juliane, C. (2023). Penerapan Forecasting Menggunakan Metode Time Series Untuk Menentukan Proyeksi Sales di Perusahaan Manufacturing Furniture. Building of Informatics, Technology and Science (BITS), 4(4). https://doi.org/10.47065/bits.v4i4.2802

Primajaya, A., & Sari, B. N. (2020). Random Forest Algorithm for Prediction of Precipitation. Indonesian Journal of Artificial Intelligence and Data Mining, 1(1), 27–31. https://doi.org/10.24014/ijaidm.v1i1.4903

Rhodes, J. S., Cutler, A., & Moon, K. R. (2023). Geometry- and Accuracy-Preserving Random Forest Proximities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9), 10947–10959. https://doi.org/10.1109/TPAMI.2023.3263774

Saadah, S., & Salsabila, H. (2021). Prediksi Harga Bitcoin Menggunakan Metode Random Forest. Jurnal Komputer Terapan, 7(1), 24–32. https://doi.org/10.35143/jkt.v7i1.4618

Salman, H. A., Kalakech, A., & Steiti, A. (2024). Random Forest Algorithm Overview. Babylonian Journal of Machine Learning, 2024, 69–79. https://doi.org/10.58496/bjml/2024/007

Santra, A. K., & Christy, C. J. (2022). An Efficient Document Clustering by Optimization Technique for Cluster Optimality. International Journal of Computer Applications, 43(16), 15–20. https://doi.org/10.5120/6187-8666

Sirtin, A. A., Makky, M., Santosa, & Cherie, D. (2025). Non-Destructive Evaluation Quality of Oil Palm Fresh Fruit Bunch (FFB) (Elaeis guineensis Jacq.) Using Thermal Imaging in the Grading Process. Eksakta : Berkala Ilmiah Bidang MIPA, 26(03), 312–328. https://doi.org/10.24036/eksakta/vol26-iss03/611

Sulistya, Y. I., Musdholifah, A., Sapuletea, C., Br Bangun, E. T., Hamda, H., Anjani, S., & Septiadi, A. D. (2024). Prediction and Analysis of Rice Production and Yields Using Ensemble Learning Techniques. ILKOM Jurnal Ilmiah, 16(2), 115–124. https://doi.org/10.33096/ilkom.v16i2.1948.115-124

Sumartini, S. H., & Purnam, S. W. (2022). Penggunaan Metode Classification and Regression Trees (CART) untuk Klasifikasi Rekurensi Pasien Kanker Serviks di RSUD Dr. Soetomo Surabaya. Jurnal Sains Dan Seni ITS, 4(2), 211–216. https://doi.org/10.12962/j23373520.v4i2.10673

Syairozi, M. I. (2021). ANALISIS KEMISKINAN DI SEKTOR PERTANIAN (Studi Kasus Komoditas Padi di Kabupaten Malang). Media Ekonomi, 28(2), 113–128. https://doi.org/10.25105/me.v28i2.7169

Tjandra, W., Ginting, C., & Gunawan, S. (2023). Penentuan Dosis Pupuk Berdasarkan Data Tonase Tandan Buah Segar (TBS) pada Perkebunan Kelapa Sawit. AGROISTA : Jurnal Agroteknologi, 7(1), 8–16. https://doi.org/10.55180/agi.v7i1.341

Wijaya, S., & Fauziah, F. (2023). Analysis of The Comparison Between Linear Regression, Random Forest, and Logistic Regression Methods in Predicting Crude Palm Oil (CPO) Price. Brilliance: Research of Artificial Intelligence, 3(2), 343–350. https://doi.org/10.47709/brilliance.v3i2.3334.

Downloads

Published

2026-01-29

How to Cite

Mayang Modelina Cynthia, Sigit Prabowo, Jheki Pranta Singarimbun, Muhammad Akbar Firdaus, Hafizh Al-Ghifari Rangkuti Rangkuti, Rido Favorit Saronitehe Waruwu, & Muhammad Amin. (2026). Machine Learning–Based Prediction of Oil Palm Plantation Yield Using Random Forest Regression. International Journal of Health Engineering and Technology, 4(5). https://doi.org/10.55227/ijhet.v4i5.572