AI-Powered Inclusive Communication Challenge
How might we develop an AI-powered accessibility solution for language translation and communication for sign language and persons with communication difficulties?
Individual
participation
Challenge Statement
How might we develop an AI-powered accessibility solution for language translation and communication for sign language and persons with communication difficulties?
MADA Center is looking for solution providers that can help to develop an AI-powered accessibility solution for language translation and communication for sign language and persons with communication difficulties.
This ideal solution should therefore aim to develop a cutting-edge Machine Translation system for Sign Language and Persons with Communication Difficulties, powered by Large Language Models (LLMs), complemented by additional AI-driven tools to support communication for individuals with disabilities. The innovation lies not only in the translation capabilities but also in the integration of complementary features like text-to-speech, speech-to-text systems, real-time captioning, and voice modulation to enhance accessibility for a broader audience.
Learn MoreAI-powered Accessibility Solutions for Language Translation and Communication
How might we develop an AI-powered accessibility solution for language translation and communication for sign language and persons with communication difficulties?
Communication barriers between the Deaf community and hearing individuals remain a significant issue despite advances in technology. While some machine translation systems for spoken languages have achieved remarkable success, similar progress for sign languages has been limited. Existing sign language translation tools often rely on static, rule-based systems or limited datasets, leading to inaccuracies and a lack of contextual understanding.
Moreover, the current tools lack integration with broader accessibility features such as real-time captioning, text-to-speech, and voice modulation, leaving many gaps in accessibility for the broader differently-abled population. These inadequacies result in unreliable translations, minimal adaptability to the complexities of sign language, and limited practical applications in real-world scenarios.
MADA Center is therefore looking for solution providers that can help to develop an AI-powered accessibility solution for language translation and communication for sign language and persons with communication difficulties.
This ideal solution should therefore aim to develop a cutting-edge Machine Translation system for Sign Language and Persons with Communication Difficulties, powered by Large Language Models (LLMs), complemented by additional AI-driven tools to support communication for individuals with disabilities. The innovation lies not only in the translation capabilities but also in the integration of complementary features like text-to-speech, speech-to-text systems, real-time captioning, and voice modulation to enhance accessibility for a broader audience.
For reference, we have proven in a scientific publication that is possible to build such model published here: https://link.springer.com/article/10.1007/s44282-024-00113-0
Central to the project is the creation of an advanced expert user interface (UI) designed to evaluate and refine LLM translation outcomes. This interface will enable experts to assess translation quality, adjust prompts, and make corrections, thereby improving accuracy, contextual relevance, and usability. Moreover, the project will provide an API to ensure compatibility with various applications, fostering seamless integration across platforms.
Several solutions have been attempted in the realm of sign language translation, but they often fall short due to the following reasons:
- Rule-Based Systems: Traditional systems based on predefined grammatical rules and vocabulary lack the flexibility to adapt to the nuances of sign language, such as regional variations or contextual meanings. These systems often fail to capture the full depth and complexity of sign language communication.
- Data-Driven Approaches: Efforts to use machine learning models trained on small, static datasets have proven inadequate. Limited annotated datasets restrict the models’ ability to generalize, resulting in poor accuracy and limited scalability.
- Wearable Device-Based Solutions: Some approaches use gloves or other wearable devices to capture signs, but these solutions are intrusive, often uncomfortable for users, and fail to accommodate the natural expressiveness of sign language.
- Lack of Expert Involvement: Existing systems often lack a mechanism for domain experts to actively refine outputs, resulting in stagnant performance and limited contextual relevance.
- Hardware-Dependent Solutions: Solutions requiring specialized hardware, such as wearable devices or camera setups with complex requirements, are not aligned with the project's goal of creating a scalable, software-driven approach.
- Non-Adaptive Models: Approaches that do not incorporate mechanisms for iterative improvement through expert feedback, such as rigid rule-based systems, are excluded as they fail to adapt to the complexities and nuances of sign language.
- Generalized Accessibility Tools: While broad accessibility solutions are valuable, this project specifically focuses on enhancing sign language translation. Generic tools without specific sign language components or integration will not be pursued.
- Hardware Independence: The solution must be software-driven and not rely on specialized hardware, making it accessible on standard computing devices (laptops, tablets, mobile phones).
- Compatibility: The system should support seamless integration via an API with other applications and platforms, ensuring broad adaptability.
- Scalability: The solution should handle real-time translation and interactions for datasets of varying sizes and complexities.
- Accessibility: The interface must be designed to be accessible to experts with different abilities, following WCAG (Web Content Accessibility Guidelines).
- The system should be able to connect with our API to enable deployment across multiple sites and platforms.
- We’d like the system to make use of existing LLM’s to translate signs into language, with an expert review layer (human-in-the-loop) step in between to accurately label the translations and enable the system to learn and improve.
- Expert Feedback Interface: A user-friendly interface allowing experts to evaluate translation quality, adjust prompts, and make corrections efficiently. Edits should reflect in system learning to improve future outputs.
- Real-Time Processing: The system must process video inputs and provide translations within a latency of less than 3 seconds for real-time usability.
- Language and Regional Adaptability: Support for multiple sign languages with customizable options to incorporate regional and cultural nuances.
- Accessibility Standards: The product must meet WCAG 2.1 AA requirements to ensure accessibility for differently-abled users.
- Data Security and Privacy: Compliance with GDPR and other relevant data protection laws to ensure the privacy of user inputs and outputs.
- Interoperability Standards: Adherence to API design standards such as RESTful architecture to enable integration with third-party platforms.
- Ethical AI Guidelines: The solution should follow ethical AI principles, ensuring inclusivity, fairness, and transparency in machine learning outputs.
- Multi-Modal AI Integration: Incorporate additional accessibility tools such as speech-to-text and real-time captioning to enhance versatility.
- A module needs to build providing dashboard insights (including but not limited to):
- Number of translations done
- Initial accuracy
- Accuracy post intervention by an expert
- Classifiers
- Dictionary
- Prompt and database management
- Sign Language Machine Translation: High accuracy in translating sign language videos into text or other sign language representations, with a minimum target accuracy rate of 50% for MVP and 70% for the final product.
- The ROI of the system will be evaluated on a case-by-case basis depending on the output achieved during the MVP phase.
Cost targets will be determined on a case-by-case basis. Ideally, the solution or final product should not cost more than S$100,000. The ultimate business model could be based on API-calls or utilization and/or license fees.
Phase 1: POC development: Q3-Q4 2025
Phase 2: Commercial roll-out: to be determined on a case-by-case basis, target implementation by Q1 2026
If the solution is successful, MADA Center is willing to support further deployment to 10k API calls per month. We expect the solution can scale to multiple licenses and API accesses within three years, targeting organizations serving the deaf community including educational institutions, healthcare providers, and government services. It also holds the potential for adoption in industries like media, corporate training and customer service where accessible communication is crucial.
Cash contributions:
- Mentorship and support for solution development,
- Access to relevant datasets,
- Support for development for cloud environment (MS Azure),
- Access to our API to crawl government websites to have content ready for translation,
- Access to the Avatar API for final testing,
- Expert advice (solution providers will have access to sign language experts to further advice on the solution).
We are looking for SMEs and startups with solutions that can be implemented in a relatively short-time frame (TRL of 5 and higher).
For Background IP (BIP), both parties will retain their respective IPs bought into the project. In the event of new Foreground IP (FIP) creation, ownership will be discussed on a case-by-case basis.