The Open Source AI Definition 1.0 clarifies what should be considered “open source AI”. Among others, it requires that information on training data is made available under open source terms.

Insights

Open Source Initiative Defines Open Source AI

Date
July 21, 2025
Author
OrionW

The Open Source Initiative (OSI), a non-profit organisation that defines open source software standards released the Open Source Artificial Intelligence (AI) Definition 1.0 (OSAID) to clarify what qualifies as open source AI.  Before this definition, various developers relied on unclear and diverging criteria to claim that their AI system is open source.

Why Open Source AI Needs to Be Defined

While the traditional definition of open source has worked well for software and source code, it is not directly applicable to an AI model.  The classic Open Source Definition states that “the source code must be the preferred form in which a programmer would modify the program”.  However this may not apply to AI models, because there is no consensus on the preferred form for modifying an AI system.  In addition, AI and machine learning systems are more than just software programs: they also include data, configuration options, documentation and artifacts such as weights and biases.

Open Source AI Defined

OSAID defines an AI model as comprised of the model architecture, inference code and model parameters (such as AI weights – learned parameters that produce outputs from given inputs).  For an AI system to qualify as Open Source under OSAID, all AI components – including the model architecture, inference code, model parameters, and artifacts – must be open source.  Such an AI system must allow users to:

  • Use the system freely for any purpose. 
  • Study the system’s workings and inspect its components. 
  • Modify the system for any purpose, including altering its output. 
  • Share the system with others with or without modifications, for any purpose. 

Moreover, users must have access to the “preferred form to make modifications” to the system – that is, the AI system must include each of the following components, provided under OSI-approved terms:

  • Data information: Sufficient details on the system’s training data to enable a skilled person to build a substantially equivalent system.  This includes (a) a full description of all training data and their provenance, scope, characteristics, selection process, labelling, processing and filtering methods and (b) a list of all training data (whether from public sources or third parties) and where to obtain them (including for a fee).

  • Code: The complete source code used to train and run the system including: 
    • code for data processing and filtering, validation and testing; 
    • inference code; 
    • model architecture; and 
    • supporting libraries like tokenisers and hyperparameters search code. 
  • Parameters: The model parameters, such as weights or other configuration settings. 

Open Source AI Definition Benefits and Challenges

Having OSAID is expected to enable AI developers, deployers and end users to enjoy greater autonomy, transparency, frictionless reuse and collaborative improvement of AI systems.  It also brings additional documented benefits of open source, such as improved safety and security, accelerated innovation, more flexible customisation and lower costs.

On the other hand, OSAID can pose a significant challenge for AI companies as it essentially requires full disclosure of training data.  Given that the right to use copyrighted data for AI model training and AI system outputs remains a hotly contested topic in many jurisdictions, most AI companies have kept their training data tightly under wraps at this time, instead only disclosing the “weights” or “parameters”.

Conclusion

The release of OSAID marks a crucial step in defining what truly constitutes open source AI.  By setting clear standards for transparency, accessibility and reuse, OSAID aims to prevent mislabelling and ensure AI systems align with open source principles.  While its requirements pose challenges – especially regarding data disclosure – they also pave the way for greater collaboration, innovation and trust in AI development.  As the debate over open source AI continues, it would be key for AI companies to continually monitor developments in this area.

For More Information

OrionW regularly advises clients on technology and artificial intelligence (AI) related matters.  For more information about Singapore or regional AI regulations, or if you have questions about this article, please contact us at info@orionw.com.

Disclaimer: This article is for general information only and does not constitute legal advice.

Newsletter

Subscribe to
our newsletters

To subscribe, select the newsletter options that interest you (TMT, FinTech or DPC - Data Protection and Cybersecurity) and provide your details.

  • TMT - Technology, Media and Telecommunications
  • FinTech
  • DPC - Data Protection & Cybersecurity
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.