Startups, Qualcomm outperform Nvidia in efficiency of chips training ChatGPT-like AI models
US chipmaker Qualcomm, along with US-based machine learning (ML)-focused chipmaking startup SiMa, and Taiwan-based Neuchips, all outperformed market leader Nvidia in numerous power efficiency scores of high-performance enterprise chips that are used by data centers, cloud platforms and artificial intelligence (AI)-training tech firms globally. While SiMa chips achieved faster latency of processing in edge computing, Neuchips led the total volume of images processed per watt of power consumption — in image classification done using neural networks.
The power efficiency scores were published Wednesday by engineering consortium MLCommons’ latest benchmark figures for the March quarter, which simulated the performance of enterprise chips based on a language model similar to what powers the popular ChatGPT by OpenAI.
While Nvidia still ranks the highest in terms of outright performance with its ML-targeting enterprise graphics processing unit (GPU) chips, the Nvidia H11, Qualcomm’s proprietary AI Cloud 100 (QAIC100) AI processing chip produced more efficient object detection performance in neural network tasks — resolving 3.2 queries per watt of electricity consumed, versus 2.7 queries per watt from Nvidia H100.
The figures come after a team of researchers at Google on Tuesday published a research paper on the company’s custom Tensor Processing Unit (TPU) chip-powered supercomputers, which it uses to train its large language models (LLMs). In the research paper, the researchers said that its TPU v4 ML-training supercomputer, which is 10x more powerful than its predecessor, is also nearly 2x more power efficient than Nvidia’s A100.
To test the latest AI chips from companies around the world, MLCommons used the language model Bert-Large as the reference point. The latter is not an LLM, since it uses lesser data points (340 million parameters) in comparison to an industry-standard LLM such as OpenAI’s Generative Pre-trained Transformer (GPT)-3.5, which was the original database powering the popular ChatGPT chatbot. To be sure, GPT-3.5 used 175 billion data parameters, while its successor, GPT-4 released last month, is rumoured to have 3 trillion data points.
In an interview with EE Times, David Kanter, executive director of MLCommons, said that the consortium will start taking LLMs into account while benchmarking AI-training processors from the upcoming quarter, and the same will reflect in chip processing scores from later this year.
Nvidia being outperformed in terms of chip efficiency could be a significant inflection point — as of December last year, an enterprise GPU industry report by Jon Peddie Research pegged Nvidia with a market share of 88%. Companies, meanwhile, have highlighted the massive cost and volume of energy consumption required to train LLMs akin to OpenAI’s ChatGPT and Google’s Bard, as well as the hundreds of other formats and services such as the image-generating tool, Midjourney.
It is this problem that many are trying to solve. In an interview with EE Times, Krishna Rangasayee, chief executive of SiMa, whose chips outperformed Nvidia’s H100 in edge processing efficiency, said that the company is still working on the 16nm manufacturing node, and thus has plenty of room to decrease power consumption further in order to scale deployment of its chips.
The manufacturing node refers to the size of the transistor that is used inside a semiconductor chip. The smaller the size of the node, the more power efficient the chip is.