Loading...

Why truly open-source AI remains out of reach

Why truly open-source AI remains out of reach
Photo Credit: Image generated using AI
Loading...

For decades, open source has been synonymous with transparency, collaboration, and community-driven innovation. But in the era of artificial intelligence (AI)—particularly large language models (LLMs) and generative AI—truly open-source AI seems elusive. 

Even the latest China-based DeepSeek R1 model, marketed largely as an open-source language model doesn’t adhere completely to widely accepted OSI’s (Open Source Initiative) definition for open-source AI. To begin with, the training dataset for the model is not fully disclosed, making it difficult for the community to reproduce it independently. 

Truly open source AI

Loading...

To be sure, for an AI model to be truly open-source under OSI-like standards, it must allow free redistribution without restrictions, provide public access to its architecture, training code, and pre-trained weights, and permit fine-tuning, and redistribution without requiring special permissions. Additionally, it should not impose limitations on commercial use or specific applications and training data should also be disclosed for full transparency, enabling reproducibility.

Facebook-parent Meta claims that its LLaMA models are open models and available to researchers and businesses under specific licensing terms. While Meta does release LLaMA’s pretrained weights and model architecture, the licence restricts commercial use without approval, and the training data is not disclosed—both of which conflict with OSI's definition of open source. LLaMA models are currently being used by companies such as Goldman Sachs, AT&T, Nomura, DoorDash, and Accenture.

Same is the case with several popular AI models that claim to be open but are not fully open source in the traditional OSI sense. This popularly referred to as open source washing or open washing. The lack of transparency, especially about the training data has led to several lawsuits against big AI companies.

Loading...

“Most companies claiming to be open-source merely provide open weights, not full transparency. Open-washing is deceptive—allowing only fine-tuning while restricting system-level changes prevents true openness. Real open-source AI is anti-competition and pro-innovation,” Avijit Ghosh, Applied Policy Researcher (ML & Society) at AI development platform Hugging Face told TechCircle.

Notably, in January, Hugging Face has initiated Open-R1 project, an initiative to build open version of DeepSeek-R1 model by reconstructing its data and training pipeline. 

Resource intensive

Loading...

AI development is resource-intensive. Training state-of-the-art LLMs requires massive computational power, vast amounts of proprietary data, and specialised engineering expertise. Companies invest hundreds of millions—if not billions—into model development. For instance, OpenAI, which is valued $157 billion, invested more $100 million in building its GPT-4 model.

Experts like Kashyap Kompella, who is the chief executive officer of RPA2AI Research, believe that companies have the right to choose licensing models that align with their business interests. However, they face criticism when engaging in presenting their models as open source while keeping key components proprietary. Kompella calls them OSINO—"open source in name only."

“It’s unrealistic to expect organisations that invest hundreds of millions in AI development to release their models purely out of goodwill. Instead, they open-source selectively, balancing strategic advantage with public perception,” he said.

Loading...

“That said, government-funded AI projects present a different scenario. Many nations are now developing their own GPT-style models, built with public funds. These efforts are more likely to be truly open source, as they belong to the digital commons of their respective countries.”

Notably, inspired by DeepSeek's achievements, India is also exploring the development of its own LLMs. This initiative seeks to enhance the country's AI capabilities, especially in light of restrictions on high-performance GPU exports from the US. 

Problem with Vendor-backed open source projects

Loading...

The software ecosystem has long grappled with vendor-controlled open source, even before AI complexities. In several cases vendor creators of such projects have unilaterally change licenses, often restricting freedoms over time.

“We've seen this pattern with Red Hat, MongoDB, and ElasticSearch—companies shifting licenses to prevent hyperscalers from offering their software as a service while bypassing the original developers. This often leaves customers unaware of who actually maintains the project,” said Sai Rahul Poruri, CEO of FOSS United India, a non-profit promoting open-source software.

Such moves have sparked backlash, as these new licences impose restrictions that undermine the open source ethos. The same applies to vendor-controlled AI models—companies can change terms or even disappear entirely, leaving users stranded.

Loading...

“History shows that many open-source projects fade when their primary backers collapse. This is why decentralised efforts offer a more resilient alternative,” he added.


Sign up for Newsletter

Select your Newsletter frequency