Loading...

EU wants firms to disclose copyright material used for training generative AI models

EU wants firms to disclose copyright material used for training generative AI models
Photo Credit: Credit: Pixabay
Loading...

Firms deploying chatbots or image generators based on generative artificial intelligence (AI) models will soon have to disclose if they have used any copyrighted material for training purposes. Lawmakers in the EU have reportedly added a new provision in the proposed rules to regulate AI in light of the massive interest in StabilityAI’s Stable Diffusion and OpenAI’s ChatGPT, which surpassed 100 million users within two months of its release last November. 

This was first reported by Reuters, which was told by an unnamed source that lawmakers were earlier planning to completely ban the use of copyrighted material to train generative AI models but later agreed to allow it but with a provision that requires companies to be transparent about it. 

EU’s plans to regulate AI have been in the works since 2021. EU’s AI Act, which aims to create a legal framework to regulate AI applications, products, and services, is currently under discussion in the European parliament.   

Loading...

The need to regulate the use of copyrighted material for training large language models (LLMs) stems from recent cases where artists and creators of original data have questioned the use of their work without their permission. This is the premise of two major class action lawsuits in the US against firms such as OpenAI and MidJourney. 

The first lawsuit was filed in November 2022 against GitHub, Microsoft, and OpenAI for pub¬lic GitHub repos¬i¬to¬ries to train OpenAI’s Codex, the underlying model behind GitHub Copilot, which can write codes based on text prompts. The lawsuit claims that the three firms have vio¬lated the legal rights of a vast num¬ber of cre¬ators who posted their codes under cer¬tain open-source licenses on GitHub.

Similarly, in January, a group of artists filed a class action suit in California against Sta¬bil¬ity AI, DeviantArt, and Mid¬Jour¬ney for using copyright images of artists without their consent for training Stable Diffusion. 

Loading...

LLMs require large volumes of data for training purposes, which is what makes them so powerful. Most of the data subdued for training them has been scrapped from the Internet, public databases, and social media platforms.

CEOs of social media platforms are also planning to charge firms for using their data for free. Early this month, Twitter CEO Elon Musk threatened to sue Microsoft
For allegedly using Twitter data to illegally train its AI models. 

Reddit CEO and founder Steve Huffman also said that the social media platform is planning to charge firms for using its data and accessing its application programming interface (API). 

Loading...

Musk and several other tech founders had earlier urged AI firms to pause the development of more advanced generative AI models so they can assess the risks and create a safety protocol for AI design and development. 

Several tech CEOs in India including Zoho’s Sridhar Vembu have urged the government to regulate the use of AI.

Though the Indian government is working on creating standards for responsible AI and encouraging the adoption of the best practices, they have no immediate plans to regulate the growth or set any laws for AI in the country, Union IT and Telecom Minister Ashwini Vaishnaw told Parliament early this month. 

Loading...

Sign up for Newsletter

Select your Newsletter frequency