Home
World Journal of Advanced Research and Reviews
International Journal with High Impact Factor for fast publication of Research and Review articles

Main navigation

  • Home
    • Journal Information
    • Editorial Board Members
    • Reviewer Panel
    • Abstracting and Indexing
    • Journal Policies
    • Our CrossMark Policy
    • Publication Ethics
    • Issue in Progress
    • Current Issue
    • Past Issues
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Join Editorial Board
    • Join Reviewer Panel
  • Contact us
  • Downloads

eISSN: 2581-9615 || CODEN: WJARAI || Impact Factor 8.2 ||  CrossRef DOI

Research and review articles are invited for publication in March 2026 (Volume 29, Issue 3) Submit manuscript

Optimizing generative AI models for edge deployment: Techniques and best practices

Breadcrumb

  • Home
  • Optimizing generative AI models for edge deployment: Techniques and best practices

Sai Kalyan Reddy Pentaparthi *

ST Engineering iDirect, Inc., USA.

Review Article

World Journal of Advanced Research and Reviews, 2025, 26(01), 1485-1492

Article DOI: 10.30574/wjarr.2025.26.1.1161

DOI url: https://doi.org/10.30574/wjarr.2025.26.1.1161

Received on 28 February 2025; revised on 07 April 2025; accepted on 10 April 2025

Generative AI models represent a significant advancement in content creation capabilities but face substantial challenges when deployed at the network edge due to inherent resource constraints. This article examines comprehensive optimization strategies for enabling generative AI functionality on edge devices without requiring cloud connectivity. The exponential growth in model size has created a widening gap between computational requirements and the limited resources available in edge environments. Through systematic model compression, architectural redesign, and hardware-software co-optimization, generative models can achieve dramatic efficiency improvements while maintaining acceptable quality thresholds. The compression techniques examined include pruning methodologies that systematically eliminate redundant parameters, quantization approaches that reduce numerical precision, and knowledge distillation methods that transfer capabilities from larger models to compact alternatives. Architectural innovations such as modified attention mechanisms, conditional computation, and neural architecture search further enhance efficiency by fundamentally rethinking model design for resource-constrained environments. The integration of these techniques with hardware-specific optimizations and specialized software frameworks enables practical deployment across diverse application domains. Real-world implementations in speech processing, computer vision, and industrial IoT demonstrate that properly optimized generative models can operate within edge constraints while delivering near-real-time performance and maintaining high-quality outputs. These advancements empower industries to leverage generative AI capabilities in scenarios where privacy concerns, connectivity limitations, or latency requirements make cloud-based processing impractical. 

Generative AI; Edge Computing; Model Compression; Quantization; Neural Architecture Search; Hardware Acceleration

https://wjarr.com/sites/default/files/fulltext_pdf/WJARR-2025-1161.pdf

Preview Article PDF

Sai Kalyan Reddy Pentaparthi. Optimizing generative AI models for edge deployment: Techniques and best practices. World Journal of Advanced Research and Reviews, 2025, 26(1), 1485-1492. Article DOI: https://doi.org/10.30574/wjarr.2025.26.1.1161

Copyright © Author(s). All rights reserved. This article is published under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the original author(s) and source, a link to the license is provided, and any changes made are indicated.


All statements, opinions, and data contained in this publication are solely those of the individual author(s) and contributor(s). The journal, editors, reviewers, and publisher disclaim any responsibility or liability for the content, including accuracy, completeness, or any consequences arising from its use.

Get Certificates

Get Publication Certificate

Download LoA

Check Corssref DOI details

Issue details

Issue Cover Page

Editorial Board

Table of content

Copyright © 2026 World Journal of Advanced Research and Reviews - All rights reserved

Developed & Designed by VS Infosolution