Vision-guided automation: A generic approach to web form filling using GPT and computer vision

Leela Gowtham Yanamaddi 1, * and Balaji Kummari 2

1 CEO and VP of Engineering, scale.jobs 537 Payne Rd, Woodstock, GA, USA 30188.
2 CTO, scale.jobs 1-84, Beside Venugopala Swamy Temple, Rayanapadu, Vijayawada, AP 521241, India.
 
Research Article
World Journal of Advanced Research and Reviews, 2023, 20(03), 2096-2107
Article DOI: 10.30574/wjarr.2023.20.3.2524
 
Publication history: 
Received on 03 November 2023; revised on 16 December 2023; accepted on 18 December 2023
 
Abstract: 
Combining computer vision approaches with GPT (Generative Pre-trained Transformer) models, this research presents a novel approach to automating web-based form filling tasks. A general approach that can adapt to different forms without knowing their structure is made possible by the suggested system, which detects and labels interactive elements on web pages visually. This allows it to transcend the restrictions of hardcoded DOM element interactions. Notable advancements include utilising computer vision to recognise and label form elements and integrating GPT models to read form fields semantically and produce context-appropriate responses (for instance, using resume data). Plus, AI-guided judgements are made using a versatile action system that mimics human-like interactions like typing, clicking, and scrolling. An automated job application form filling case study demonstrates the system's efficacy and highlights its potential for wide-ranging online automation activities.
 
Keywords: 
Vision-Guided Automation; Web Form Filling; GPT; Computer Vision
 
Full text article in PDF: 
Share this