The impact of data preprocessing on data mining outcomes
Systems Analysts, HRIS, Atrius Health, USA.
Review Article
World Journal of Advanced Research and Reviews, 2022, 15(03), 540-544
Publication history:
Received on 16 August 2022; revised on 18 September 2022; accepted on 20 September 2022
Abstract:
Data preprocessing is a vital initial step during knowledge discovery because it determines the success of data mining projects. A dataset's quality and representation stand as the primary element because any presence of redundant, irrelevant, too noisy, or unreliable information will severely disrupt the knowledge discovery process. The preprocessing phase first converts unstructured data into an analytical format alongside solutions for data inconsistencies, errors, and missing values to maintain data mining result integrity. The preprocessing corrects data quality problems and arranges data properly, improving data mining model accuracy, efficiency, and interpretability. The data mining pipeline requires data preprocessing as its essential foundation since it provides multiple techniques to convert raw data into an effective analytical format. Data mining depends heavily on preprocessing operations because they guarantee proper analysis results through accurate correction of errors and optimal data structure development and absent data point management.
Keywords:
Data Mining; Big data; Data Preprocessing; Data Analytics; Model-based Imputation\
Full text article in PDF:
Copyright information:
Copyright © 2022 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0