Explainer: What are transformers, the neural networks underneath ChatGPT and LaMDA?
Ever since OpenAI made its artificial intelligence (AI) chatbot, ChatGPT, public last October, millions of Internet users have taken its help to write essays, thesis, poems, and codes. Its ability to pull relevant information from the web and summarise it for users in a simple language is seen by many as a precursor to a new era of chatbot-driven search. Microsoft, last week, released a new version of Bing with in-built ChatGPT-like functionalities, while Google has done a limited release of a bot called Bard.
Read More: Will Microsoft's GPT-powered Bing outplay Google's Bard? (techcircle.in)
Both these chatbots and OpenAI’s other AI models, like DALL-E and Codex, have one thing in common — they all are based on multi-layer transformative neural networks, also referred to as transformers.
What is a transformer?
Developed in 2017 by researchers at Google and the University of Toronto, a transformer is a neural network used for natural language processing, natural language generation, and even genome sequencing.
A neural network is an interconnected group of artificial neurons modeled on the human brain. It is the underlying technology used by deep learning models to process complex data. Transformers are meant to process a piece of information and generate the most relevant response. What makes them more effective than earlier models is the ability to understand the link between two sequential pieces of information, such as words in a sentence.
For example, if you tell the AI that you want to buy a Toyota Corolla, and then a little later you say you want to know how much ‘that car’ will cost, transformers are able to understand that you’re talking about that same Corolla you mentioned earlier.
This allows such AI to understand the context even in long sentences and large paragraphs, something which many other models cannot do.
What was used before the transformers came?
Before transformers, most language models were built on something called a recurrent neural network (RNN), which processes data in a sequential manner. It is used for language translation, natural language processing (NLP), speech recognition, and image captioning.
It is also used by Apple’s voice assistants Siri and Google Translate. The limitation of an RNN is that it processes every word in a sentence separately, which makes it difficult for it to handle long sentences or large paragraphs.
Why are we seeing so many transformer-based products now?
The release of ChatGPT last October has triggered a scramble to tap into its underlying technology, known as the generative pre-trained transformer (GPT). Its third-generation model, GPT-3, was released in May 2020 and is said to be the largest language model ever trained. It is trained on 45TB of text data and has 175 billion parameters, which is ten times more than the 1.5 billion parameters of its predecessor GPT-2, which was trained on 40GB of text data.
Though GPT-3 uses the same architecture as its predecessor, it has more layers and it has been trained on a much larger dataset. Data is the most critical element in any machine learning (ML) model, especially those based on complex neural networks. Larger datasets enable better classification and identification, which in turn makes the model better at performing its tasks.
OpenAI is expected to release GPT-4 next month, while Google has a model called Language Model for Dialogue Applications (LaMDA) which runs underneath Bard. OpenAI claims that more than 300 applications are now using GPT-3 through the OpenAI API.
In India, Flipkart is working on ChatGPT-based products, while Indian Express reported today that the Indian government is working on a ChatGPT-powered WhatsApp chatbot for farmers.