A Vision-Based Deep Learning Framework for Real-Time Sign Language Recognition and Translation
Authors:
Manav Sharma (B. M. Insitute of Engineering and Technology)
Rahul (B. M. Institute of Engineering and Technology)
Arihant Jain (B. M. Institute of Engineering and Technology)
Gurminder Kaur (B. M. Institute of Engineering and Technology)
Rubi (B. M. Institute of Engineering and Technology)
Poonam Shah (B. M. Institute of Engineering and Technology)
Abstract

Communication barriers between the deaf and hearing communities often limit inclusivity in education, healthcare, and daily interactions. This research presents VOCAL CODE, a vision-based deep learning framework designed to translate American Sign Language (ASL) gestures into text and speech in real time. Utilizing a standard webcam for gesture capture and a Convolutional Neural Network (CNN) for classification, the system eliminates the need for costly sensors or wearable devices. The integrated auto-correction module enhances text accuracy, while the text-to-speech engine facilitates auditory feedback for seamless interaction. Experimental evaluation achieved an accuracy of approximately 98%, demonstrating the models robustness and adaptability across varied conditions. The proposed solution provides a cost-effective, scalable, and user-friendly approach to assistive communication, promoting accessibility and inclusivity for individuals with hearing impairments. Future work aims to extend the system to recognize full-word gestures, incorporate multilingual capabilities, and optimize real-time performance on mobile and web platforms.

📄 Download Full Paper (PDF)
Published in: NCAIDT 2025 Proceedings
DOI: 10.63169/NCAIDT2025.p16
Paper ID: NCAIDT2025-0427