Enhancing Data Integration in Oracle Databases: Leveraging Machine Learning for Automated Data Cleansing, Transformation, and Enrichment

Authors

  • Padmaja Pulivarthy Author

Abstract

The integration of data from multiple sources into Oracle databases presents significant challenges, including data cleansing, transformation, and enrichment. Traditional methods often involve manual processes that are time-consuming, error-prone, and inefficient. This research explores the application of machine learning (ML) algorithms to automate and enhance these processes, thereby improving the overall efficiency and accuracy of data integration. In this study, we develop and evaluate a comprehensive ML-based framework designed to address the complexities of data integration. The framework leverages supervised and unsupervised learning techniques to identify and correct inconsistencies, transform data into compatible formats, and enrich datasets with additional relevant information. Key components of the framework include data preprocessing modules, anomaly detection algorithms, and intelligent transformation pipelines. We conducted extensive experiments using diverse datasets sourced from different domains to assess the performance of the proposed framework. The results demonstrate significant improvements in data quality and integration speed compared to traditional methods. The automated processes reduced the time required for data preparation by up to 70% and increased the accuracy of integrated data by 25%. Furthermore, this research highlights the adaptability of ML algorithms in handling various data types and formats, showcasing their potential in real-world applications. The implementation details, including algorithm selection, model training, and system architecture, are thoroughly discussed to provide a clear roadmap for practitioners and researchers interested in replicating or extending this work. This paper contributes to the field of data integration by presenting a novel approach that combines the strengths of ML algorithms with the robustness of Oracle databases. The findings underscore the transformative impact of ML in automating and optimizing data integration tasks, paving the way for more efficient and reliable data management solutions in complex, multi-source environments.

Downloads

Download data is not yet available.

References

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB) (pp. 487-499). Morgan Kaufmann.

Batini, C., & Scannapieco, M. (2016). Data and Information Quality: Dimensions, Principles and Techniques. Springer.

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324

Chen, P. P. (1976). The entity-relationship model—toward a unified view of data. ACM Transactions on Database Systems (TODS), 1(1), 9-36. https://doi.org/10.1145/320434.320440

Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 71-80). ACM.

Fayyad, U. M., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37-54.

Han, J., Pei, J., & Kamber, M. (2011). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer.

Hu, X., & Liu, H. (2004). Mining complex data. IEEE Intelligent Systems, 19(3), 76-79. https://doi.org/10.1109/MIS.2004.35

Jagadish, H. V., & Olken, F. (2004). Data management for life sciences research. ACM SIGMOD Record, 33(2), 15-20. https://doi.org/10.1145/1024694.1024696

Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2006). Machine learning: A review of classification and combining techniques. Artificial Intelligence Review, 26(3), 159-190. https://doi.org/10.1007/s10462-007-9052-3

Li, W., & Moon, B. (2001). Distributed co-evolutionary algorithms for complex data mining. In Proceedings of the First SIAM International Conference on Data Mining (pp. 473-478). SIAM.

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106. https://doi.org/10.1023/A:1022643204877

Witten, I. H., Frank, E., & Hall, M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.). Morgan Kaufmann.

Downloads

Published

2023-06-07

Issue

Section

Articles

How to Cite

Enhancing Data Integration in Oracle Databases: Leveraging Machine Learning for Automated Data Cleansing, Transformation, and Enrichment. (2023). International Journal of Holistic Management Perspectives, 4(4), 1-18. https://injmr.com/index.php/IJHMP/article/view/81

Most read articles by the same author(s)

1 2 3 4 > >>