Explained: OpenAI’s new and free everything-to-everything AI model – GPT-4o
Just a day ahead of Google’s flagship Google I/O 2024 event, ChatGPT maker OpenAI announced GPT-4o, which the company calls a step closer to natural human-machine interaction. The ‘o’ in the name stands for omni. It accepts multimodal inputs – that includes text, audio, image, or even a combination of the three – and generates multimodal output.
In terms of performance, GPT-4o matches GPT-4 Turbo on English text and coding, while being faster and 50% cheaper in API. To be sure, GPT-4 Turbo is an advanced large language model launched in November last year, touted to be more capable and has knowledge of world events up to April 2023. GPT-4o achieves GPT-4 Turbo-level performance on text, reasoning, and coding intelligence, while surpassing on multilingual, audio, and vision capabilities.
OpenAI demonstrated the capabilities of GPT-4o through a series of live demos, which included real-time language translation, describing the surrounding environment, two GPT-4o systems interacting with each other, playing rock-paper-scissor, cracking jokes, and customer service (in proof of context), among others.
Before GPT-4o, the existing models used a combination of neural network models to perform tasks. For instance, in the Voice Mode for ‘speak’ with ChatGPT, earlier, the model had three layers – the first layer where audio is transcribed to audio, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio. In case of GPT-4o, it is a single model trained end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network.
"From an enterprise perspective, this release with advanced voice capabilities means getting closer to achieving fully autonomous contact centers. One standout feature bringing us closer to the reality of fully autonomous customer service, a vision we at Yellow.ai have been working towards, is GPT-4o’s ability to understand and act on user emotion by providing real time responsiveness,” said Raghu Ravinutala, CEO & co-founder, of Yellow.ai.
The text and image capabilities of GPT-4o are rolled out to ChatGPT. While it is free for all – free tier or Plus users – the latter get five times higher message limits. The company will roll out a new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus in the coming weeks. OpenAI also announced other updates that include a new desktop service and advances in its voice assistant capabilities. The company also announced the availability of ChatGPT on macOS.
“Note that [besides the] simultaneous translation, the deciphering of emotion, the tutoring of maths, the big [takeaway] is that it is available to all users, not only the paid ones. There is a desktop app, which makes work easier. They are also launching the API; expect to see some wild GPTs coming out,” commented Jaspreet Bindra, founder of UK-based consultancy firm Tech Whisperer.
To be sure, the company hasn’t been too transparent about the content the new model is trained on. The chief technology officer Mira Murati however said that the model learns from the licensed content from the company’s partners as well as other publicly available sources collected using web crawlers. It may be noted that the company already faces lawsuits from several individuals and organisations including The New York Times for alleged unauthorised use of owned content to train their models.