The possibilities and challenges of defining and developing open source AI

September 2024  |  SPECIAL REPORT: DIGITAL TRANSFORMATION

Financier Worldwide Magazine

September 2024 Issue


As open artificial intelligence (AI) models have proliferated in recent years, two camps have started to emerge – those who support AI models that are ‘closed’ and proprietary with no access to the model’s underlying source code, data or weightings, and those who support ‘open’ models where at least the source code is publicly available.

What makes an AI model truly ‘open’, and whether ‘closed’ or ‘open’ models are more beneficial to society at large, is fiercely debated. This article discusses what it means for an AI model to be ‘open’, and the implications of adopting a ‘closed’ or ‘open’ model approach.

‘Open’ AI has become a buzzword used by Big Tech, start-ups, pundits and regulators alike to describe certain AI models whose source code has been made publicly available, but there is still a lack of consistent definition as to what exactly that means. As a general matter, and irrespective of AI considerations, the definition of ‘open source’ promulgated by The Open Source Initiative (OSI) is recognised as the standard definition for the term.

Under the OSI definition, software licences must meet certain criteria to be deemed an open source licence. This includes the fundamental concept that the underlying source code is made publicly available in a manner that can be inspected, used or modified pursuant to a licence with limited restrictions. While the OSI publishes a list of approved licences that meet this standard, any licence not included on the OSI’s list could still be deemed ‘open source’ to the extent it complies with the parameters of the OSI definition.

The complication with labelling AI as ‘open source’ is that critical components of these AI models extend beyond source code, and include the underlying training data and model weights, and the trained model itself. Some argue that if the goal of open source is for software to be publicly available and accessible, then in the case of AI models, all aspects of the model need to be publicly available beyond just the source code. These critics maintain that an AI model cannot call itself ‘open’ unless all aspects of the model are open.

For example, the OSI has criticised Meta’s ‘open source’ designation for its LLaMa 2 model licence saying that it does not meet the OSI definition because the licence puts restrictions on commercial uses for certain users and also restricts the use of the model and software for certain purposes. Meta has countered that its approach allows the AI community to access the AI while balancing responsibility concerns.

On 23 July 2024, in a blog post shared by Mark Zuckerberg with the release of LLaMa 3, Mr Zuckerberg emphasised that open source would enable LLaMa 3 to grow into a full ecosystem by encouraging the community to use and develop the model, and that open sourcing does not concede advantages to Meta’s competitors.

Those who support releasing the entire AI model as ‘open source’ run into the practical challenge that most open source licences are drafted to deal with software as opposed to the numerical model weights or training datasets that comprise an AI model. Therefore, how to apply the ‘open source’ concept to components other than software may present challenges and result in inconsistent interpretations among industry players.

Those advocating for fully ‘open’ AI models point to other technological breakthroughs that resulted from an ‘open approach’ (e.g., Mozilla Firefox and Android). With respect to AI, they argue that access to all aspects of an AI model will foster innovation, allowing developers to build on the works of others. They also assert that, like traditional open source projects, AI models will only improve with a broader community working on them, identifying flaws and vulnerabilities and advancing such models.

For example, in February 2024, Ben Brooks, head of public policy at Stability AI, lauded the ability for developers and regulators to audit and examine the inner workings of open AI models, claiming that such transparency can lead to safer and more democratic AI models. From a competition perspective, open AI models will reduce barriers to entry for start-ups, and reduce friction for integrating their own tools and services into existing AI models.

Those who oppose open AI models caution that by exposing the inner workings of such powerful models to the general public, it may invite malicious actors to take these models, strip out any safety features and customise them for nefarious purposes. While those who release open AI models could include any licence restrictions against illegal or improper use, such restrictions would fly in the face of the open source ethos, which prohibits restrictions on how open source code can be used. Opponents of open models also argue that such legal restrictions would be meaningless since a malicious actor would simply ignore them.

An additional risk of open AI models that is often cited is that a developer of a proprietary, closed model can correct vulnerabilities without the public even knowing they were there to exploit. This has the added benefit of ensuring that all users of such a model are accessing the remedied and secure version.

In contrast, in an open AI model, a developer could not necessarily ensure that other users would implement the same fixes to ensure the integrity and security of the AI model. This could result in damaging consequences where powerful AI technology is broadly accessible without appropriate safeguards. As AI technology is still in its nascent days, it is not apparent yet whether open source AI will generate the positive externalities that advocates anticipate, or whether the risks will outweigh the potential benefits.

The OSI acknowledges that the traditional view of open source code and licences may be insufficient for AI and is driving a global, multi-stakeholder process to define ‘open source AI’. A current draft of the open source AI definition is available and the official stable version is expected to be published in late October 2024.

The draft definition requires an AI system to be made available under terms that grant the freedoms to: (i) use the system for any purpose and without having to ask for permission; (ii) study how the system works and inspect its components; (iii) modify the system for any purpose, including to change its output; and (iv) share the system for others to use with or without modifications, for any purpose.

To exercise these rights, access to certain default components of the AI systems is required. Under the latest draft definition, this includes training methodologies and techniques, training data labelling procedures, data pre-processing code and supporting libraries and tools. Notably, optional components include all training data sets and model elements such as model outputs and model metadata. This allows AI developers to keep training datasets proprietary, which may be an appropriate approach given that training datasets can include confidential or copyrighted information.

Regulators are also wading into the fray to consider how open source AI should or should not be regulated. President Biden’s executive order addressing AI called for the secretary of commerce to conduct further study into open AI models and consequently, in February 2024, the Department of Commerce’s National Telecommunications and Information Administration requested public comment on the risks and benefits of open source AI to inform a report for the president.

How open source AI will ultimately be defined and whether it will be widely adopted by the AI industry and provide a boon to AI or introduce unforeseen risks remains to be seen.

 

Mana Ghaemmaghami is an associate at Skadden, Arps, Slate, Meagher & Flom LLP and Affiliates. She can be contacted on +1 (212) 735 2594 or by email: mana.ghaemmaghami@skadden.com.

© Financier Worldwide


©2001-2025 Financier Worldwide Ltd. All rights reserved. Any statements expressed on this website are understood to be general opinions and should not be relied upon as legal, financial or any other form of professional advice. Opinions expressed do not necessarily represent the views of the authors’ current or previous employers, or clients. The publisher, authors and authors' firms are not responsible for any loss third parties may suffer in connection with information or materials presented on this website, or use of any such information or materials by any third parties.