
The Infocomm Media Development Authority (IMDA) and the AI Verify Foundation recently published the results and recommendations from the Global AI Assurance Pilot (Assurance Pilot) to codify emerging norms and best practices around technical testing of generative AI (GenAI) applications.
Specifically, the Assurance Pilot sought to address a perceived gap in AI testing. Most efforts have focused on testing the foundation model of an AI system for safety and alignment. However, a shift is required, to target the reliability of the end-to-end systems or applications in which they are embedded, rather than just the AI model itself. The Assurance Pilot emphasised that testing GenAI-enabled systems in real-world situations at scale has to consider the specific context of the use case, organisation, industry and/or socio-cultural expectations as well as the complexity in their use.
For the Assurance Pilot, deployers of GenAI applications from the healthcare, banking and IT sectors were paired with assurance and testing firms for a limited assurance testing of one agreed application.
The following insights on GenAI testing in real-world applications were gleaned from the Assurance Pilot:
As GenAI capabilities expand, it is imperative that testing of GenAI applications continues to be refined and developed to yield meaningful insights and support the creation of a trustworthy and reliable GenAI-powered environment. AI developers, deployers and testers should implement the results and recommendations in the Assurance Pilot report to strengthen their AI application testing practices.
OrionW regularly advises clients on artificial intelligence matters. For more information about responsible development, deployment and use of artificial intelligence systems, or if you have questions about this article, please contact us at info@orionw.com.
Disclaimer: This article is for general information only and does not constitute legal advice.