Home
World Journal of Advanced Research and Reviews
International Journal with High Impact Factor for fast publication of Research and Review articles

Main navigation

  • Home
    • Journal Information
    • Editorial Board Members
    • Reviewer Panel
    • Abstracting and Indexing
    • Journal Policies
    • Our CrossMark Policy
    • Publication Ethics
    • Issue in Progress
    • Current Issue
    • Past Issues
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Join Editorial Board
    • Join Reviewer Panel
  • Contact us
  • Downloads

eISSN: 2582-8185 || CODEN: WJARAI || Impact Factor 8.2 ||  CrossRef DOI

Research and review articles are invited for publication in March 2026 (Volume 29, Issue 3) Submit manuscript

GenAI Data Engineering: Synthetic Data and Feature Engineering framework for Cloud Analytics

Breadcrumb

  • Home
  • GenAI Data Engineering: Synthetic Data and Feature Engineering framework for Cloud Analytics

Sandeep Kamadi *

Independent Researcher, Wilmington University, Delaware, USA.
 
Research Article
World Journal of Advanced Research and Reviews, 2024, 24(01), 2867-2877
Article DOI: 10.30574/wjarr.2024.24.1.3165
DOI url: https://doi.org/10.30574/wjarr.2024.24.1.3165
 
Received on 08 September 2024; revised on 23 October 2024; accepted on 28 October 2024
 
The integration of generative artificial intelligence into modern data engineering pipelines represents a transformative paradigm shift addressing critical challenges in data scarcity, privacy preservation, and feature engineering automation. Traditional data engineering approaches struggle with rare event representation, imbalanced datasets, privacy-constrained environments, and labor-intensive feature creation processes that limit machine learning model effectiveness and organizational agility. This research presents a comprehensive cloud-native data engineering framework that leverages generative AI technologies including Variational Autoencoders, Generative Adversarial Networks, and diffusion models for synthetic data generation, combined with transformer-based architectures for automated feature engineering and embedding creation. The proposed architecture integrates synthetic data generation capabilities throughout the data lifecycle, from ingestion through storage, feature engineering, model training, and inference, while maintaining comprehensive governance through data quality validation, model drift detection, and regulatory compliance monitoring. Experimental validation across multiple use cases demonstrates that synthetic data augmentation improves model performance by 23.7% for rare event detection, reduces feature engineering effort by 64%, achieves 97.3% statistical fidelity to production data distributions while preserving privacy guarantees, and accelerates model development cycles by 58% through automated feature generation. The framework addresses critical gaps in existing data engineering practices by unifying generative AI capabilities with traditional extract-transform-load pipelines, feature stores, and governance frameworks within a cohesive architecture validated through production deployment processing petabyte-scale datasets. This work contributes both theoretical foundations for generative AI integration in data engineering and practical implementation patterns for organizations seeking to modernize analytics infrastructure while addressing data privacy, quality, and scalability requirements.
 
Generative AI; Synthetic Data Generation; Feature Engineering; Data Governance; Cloud Analytics; Machine Learning Operations; Privacy-Preserving Analytics
 
https://wjarr.com/sites/default/files/fulltext_pdf/WJARR-2024-3165.pdf

Preview Article PDF

Sandeep Kamadi. GenAI Data Engineering: Synthetic Data and Feature Engineering framework for Cloud Analytics. World Journal of Advanced Research and Reviews, 2024, 24(1), 2867-2877. Article DOI: https://doi.org/10.30574/wjarr.2024.24.1.3165

Copyright © Author(s). All rights reserved. This article is published under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the original author(s) and source, a link to the license is provided, and any changes made are indicated.


All statements, opinions, and data contained in this publication are solely those of the individual author(s) and contributor(s). The journal, editors, reviewers, and publisher disclaim any responsibility or liability for the content, including accuracy, completeness, or any consequences arising from its use.

Get Certificates

Get Publication Certificate

Download LoA

Check Corssref DOI details

Issue details

Issue Cover Page

Editorial Board

Table of content

Copyright © 2026 International Journal of Science and Research Archive - All rights reserved

Developed & Designed by VS Infosolution