
The Infocomm Media Development Authority (IMDA) has released its Starter Kit for Safety Testing of LLM-Based Applications (Starter Kit) for public consultation. The Starter Kit provides a set of voluntary guidelines for testing large language model (LLM)-based applications for common risks.
The Starter Kit derives content from the Global AI Assurance Pilot (please see our article on it here) and workshops with industry stakeholders and government experts. Both the Global AI Assurance Pilot and the Starter Kit emphasise that while testing has been largely focused on LLMs themselves, there has been insufficient attention on testing the applications in which they are embedded.
Interested parties may submit their comments to IMDA until 25 June 2025.
The Starter Kit provides guidance on the four common risks found in AI-powered applications:
It also differentiates risk based on an application’s design and context. Baseline risks are common risk manifestations that arise in general scenarios which, when addressed, provide a basic safety level across a wide range of applications. On the other hand, specific risks arise due to unique contexts (e.g., cultures), sectoral domains (e.g., finance), local laws and use cases of an app.
The Starter Kit proposes testing both the output and components of an AI-powered application. Output testing evaluates and validates the overall safety characteristics of the application before deployment, while component testing – testing individual components of the underlying LLM – identifies failure points for further mitigation, if output testing results are not up to expectations.
Benchmarking and red teaming are the suggested ways of testing AI-powered applications. Benchmarking is likened to an exam where the application is presented with a standardised set of task prompts and comparing the results against pre-defined criteria. On the other hand, red teaming is a more in-depth testing methodology and a second-layer of testing where the AI-powered application is probed for system failures such as generation of potentially harmful content or leakage of sensitive information.
To optimise resources, the Starter Kit suggests prioritising certain risks that are most relevant in the planned use case of the AI-powered application. This is done by identifying material risks dictated by the use cases, selecting the appropriate output tests considering the identified risks and calibrating the extent of testing. As a rule of thumb, the Starter Kit emphasises that the goal of testing is to achieve a level of confidence that each risk has been adequately identified and addressed.
The Starter Kit recognises that publicly available benchmarks and standards of testing are generally sufficient ways to test AI-powered applications. However, they may not be readily applicable in some specialised settings where these benchmarks and standards may not work as intended. As such, developers could develop their own customised benchmarks or standards fit for the intended use case of their AI-powered application.
However, the Starter Kit reminds developers that in developing their own tests, it is important to have representative test datasets. Tests should also have the right metrics aligned with test objectives and dataset structure.
After the tests, the results need to be interpreted and analysed. This requires clear thresholds that define the minimum acceptable levels of behaviour for each identified risk and appropriately trained and oriented evaluators. Generally, higher-priority risks warrant more stringent thresholds.
AI application testing is becoming increasingly important as AI is adopted across a wide range of industries. However, widely available tests may not be suitable for all AI use cases. The Starter Kit provides a baseline for AI developers and AI testing firms to review their AI testing approaches, ensuring that AI-powered applications are appropriately tested in line with their intended use cases.
OrionW regularly advises clients on artificial intelligence matters. For more information about responsible development, deployment and use of artificial intelligence systems, or if you have questions about this article, please contact us at info@orionw.com.
Disclaimer: This article is for general information only and does not constitute legal advice.