The report on the Global AI Assurance Pilot, launched by IMDA and the AI Verify Foundation, emphasises the need for end-to-end testing of generative AI (GenAI) applications, and not just their underlying models, to develop a trustworthy...

Insights

Towards Trusted AI: Key Takeaways from the Global Assurance Pilot

Date
June 3, 2025
Author
OrionW

The Infocomm Media Development Authority (IMDA) and the AI Verify Foundation recently published the results and recommendations from the Global AI Assurance Pilot (Assurance Pilot) to codify emerging norms and best practices around technical testing of generative AI (GenAI) applications.  

Specifically, the Assurance Pilot sought to address a perceived gap in AI testing.  Most efforts have focused on testing the foundation model of an AI system for safety and alignment.  However, a shift is required, to target the reliability of the end-to-end systems or applications in which they are embedded, rather than just the AI model itself.  The Assurance Pilot emphasised that testing GenAI-enabled systems in real-world situations at scale has to consider the specific context of the use case, organisation, industry and/or socio-cultural expectations as well as the complexity in their use.  

Results and Recommendations

For the Assurance Pilot, deployers of GenAI applications from the healthcare, banking and IT sectors were paired with assurance and testing firms for a limited assurance testing of one agreed application.  

The following insights on GenAI testing in real-world applications were gleaned from the Assurance Pilot:

  • Test what matters – Context determines the relevant risks requiring testing.  In that regard, the Assurance Pilot recommends focusing on risks that are important for specific use cases, engaging with subject matter experts, conducting simulation testing and leveraging the experience of specialist testers.  

  • Test data may not be readily available or fit for purpose – Participants in the Assurance Pilot faced challenges sourcing usable test data but were able to adapt it through annotation and anonymisation.  AI models can also support adversarial red teaming and simulation testing.  Ultimately, creating realistic, adversarial and edge-case test data requires combined human and AI effort.  

  • Test inside the application pipeline – Testing inside the application pipeline, at pre-defined touchpoints, helps in debugging efforts and deepens understanding of GenAI model workflows.  This approach is essential for agentic AI applications.

  • Skilful and cautious use of large language models (LLMs) as judges – Although relying on a human subject matter expert is ideal, it could be expensive and difficult to scale, even in pre-production testing.  However, using LLMs as judges required skilful and careful prompting, extensive human calibration and ongoing monitoring to ensure that there are no silent failures (i.e., incorrect, biased, or harmful outputs without any obvious error signals or alerts).  

Conclusion

As GenAI capabilities expand, it is imperative that testing of GenAI applications continues to be refined and developed to yield meaningful insights and support the creation of a trustworthy and reliable GenAI-powered environment.  AI developers, deployers and testers should implement the results and recommendations in the Assurance Pilot report to strengthen their AI application testing practices.

For More Information

OrionW regularly advises clients on artificial intelligence matters.  For more information about responsible development, deployment and use of artificial intelligence systems, or if you have questions about this article, please contact us at info@orionw.com.  

Disclaimer: This article is for general information only and does not constitute legal advice.

Newsletter

Subscribe to
our newsletters

To subscribe, select the newsletter options that interest you (TMT, FinTech or DPC - Data Protection and Cybersecurity) and provide your details.

  • TMT - Technology, Media and Telecommunications
  • FinTech
  • DPC - Data Protection & Cybersecurity
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.