Loading...

Reddit to charge firms for using its data

Reddit to charge firms for using its data
Photo Credit: Credit: www.123rf.com
Loading...

Reddit is planning to charge companies for using its data including those that are using it for training large language models (LLMs). The social media platform said on Tuesday that it will soon start charging companies for accessing its application programming interface (API). 

In an interview with NYT, Reddit CEO, and founder Steve Huffman said that the data on Reddit is “valuable” and he doesn’t want to give it away to some of the largest companies in the world for free. 

API is a commonly used mechanism, which allows firms and developers to retrieve data from another app and use it to offer new functionality or services on their own apps. Reddit won't be the first social media platform to block access to its APIs. In February, Twitter also stopped free APIs on its platforms and a month later launched paid plans for API access. 

Loading...

Social media posts including Reddit forums that generate millions of conversations are regarded as powerful data sets as many of these LLMs such as OpenAI’s GPT-4 or Google’s LaMDA are trained on dialogues. 

Most of these models are trained on millions of data points which are either drawn from large public databases or scraped from the Internet or social media platforms. 

Though OpenAI started as a non-profit, it is partly owned by Microsoft which is using its products to gain an edge over rivals in search and other applications. In March, the big tech firm also threatened to cut access to Bing Search Index to rival search companies if they use it to offer AI capabilities like chatbots. 

Loading...

What sets generative AI apart from any of the previous generation AI models is their ability to generate human-like responses. They are based on neural networks called transformers which have the ability to understand the link between two sequential pieces of information, such as words in a sentence. 

The growing popularity of generative AI and products based on them such as ChatGPT or Dall-E have also sparked privacy and copyright concerns over the methods used to collect online data to train the LLMs. For instance, in January a group of artists in the US filed a lawsuit against Stability AI, DeviantArt, and Midjourney for using copyright images of artists without consent to train a generative AI model Stable Diffusion.

Reddit is one of the leading social media platforms and as per industry estimates has over 400 million monthly active users (MAUs). The social media firm reportedly filed for a confidential initial public offering (IPO) with the US securities regulator in 2021 and is looking to go public later this year. 

Loading...

Sign up for Newsletter

Select your Newsletter frequency