Enhancing predictive model performance for data-driven software systems

Sai Krishna Reddy Mudhiganti *

Independent researcher.
 
Review Article
World Journal of Advanced Research and Reviews, 2019, 04(02), 196-206
Article DOI: 10.30574/wjarr.2019.4.2.0149
Publication history: 
Received on 02 December 2019; revised on 24 December 2019; accepted on 30 December 2019
 
Abstract: 
Predictive models enable data-driven software systems to make useful forecasts out of imperfect inputs in areas like healthcare, finance and e-commerce. Despite this, missing or damaged data, large sets with many features and changes in the underlying patterns can cause uncertainty. To handle these difficulties, this study proposes a comprehensive pipeline with five major parts:
·        K-nearest neighbours imputation, interquartile-range outlier detection and z-score normalization to clean and standardize the data;
·        Adding temporal and interaction features to enrich the inputs;
·        Reducing and pruning features with mutual information and recursive feature elimination;
·        Letting various machine learners—decision trees, support-vector machines, gradient-boosted trees and shallow neural networks—train with optimized hyperparameters using bayesian search; and
·        Stacking.
By using a regret-minimizing algorithm for online updates, the ensemble manages to adjust quickly to new data and still remains accurate with a drop of just 2% per hour. According to empirical assessments, the stacked model achieves an accuracy of 92.4 %, a F1-score of 0.91 and an AUC of 0.96, always surpassing each single learner in both batch and streaming situations. The ability to separate each stage into its own reusable module enables the framework to fit with orchestration tools, microservices and to scale across different platforms. Future steps involve applying the method to different types of data and using meta-learning to handle automated pipeline construction and adjusting hyperparameters.
 
Keywords: 
Predictive Modelling; Feature Selection; Ensemble Learning; Bayesian Optimization; Online Learning; Data-Driven Software Systems
 
Full text article in PDF: 
Share this