What's the Impact of Compute Power on AI Innovations
It is a modern (tech) saying: Data is the new oil . Oil revolutionized the world by its widespread use as an energy source. Hidden energy in oil is released on combining with oxygen. Similarly, hidden “intelligence” in big data is released only when combined with Computing. Computing is then the new oxygen for AI.
A dominant fraction of this Oxygen comes from one source: GPUs by Nvidia. Nvidia is the toast of the tech-world and of the stock market today. Announcement of Nvidia’s results is the most anticipated financial event. On February 22nd, 2024, Nvidia’s shares surged to create the largest-ever single-day increase in market capitalization of $277 billion! How did they get to be in such a dominating position?
AI boom’s insatiable hunger for computing caused this. The real answer involves Nvidia’s mission of making high-performance computing (HPC) accessible at even modest budgets. It is a story of foresight, persistence, and luck that favored the prepared!
Nvidia started making GPUs or add-on cards for 3D graphics and computer games in 1993. The demand for enhanced speed, resolution, and quality of imagery called for special hardware. “Moore’s Law” was in full force then, with the transistor-count in a chip doubling roughly every two years. The CPUs in the PCs got faster and cheaper until about 2000, when they hit a wall as their complex design couldn’t effectively exploit the additional transistors. The GPUs (Graphics Processing Units), however, had simpler architectures and could productively utilize more transistors to perform simple, near-identical calculations on large numbers of elements, like pixels of an image. Nvidia was not the only company making GPUs. They had competition from 3dfx, 3Dlabs, ATI (now part of AMD), S3, etc. Today, AMD and Nvidia are neck-and-neck in GPUs for gaming while others disappeared.
As the computing power in a GPU increased, parts of it were made programmable, primarily for interesting visual effects. Clever researchers realized that GPUs were like the specialized array processors built in the 1970s which used a SIMD (Single Instruction on Multiple Data) model. Prior experience was effectively recycled to implement foundational operations like matrix multiplication, FFT, sorting, etc., on the GPU by mid 2000s.
Jen-Hsun (Jensen) Huang, Founder-CEO of Nvidia, saw an opportunity to “Democratize high-performance computing”. He envisioned providing specialized computing for specific applications, with 3D graphics as the first. GPUs released in 2006 had multiple identical processing units, instead of specialized units to process vertexes and pixels. Nvidia positioned the GPUs as economical and accessible parallel processors, delivering about 350 GFLOPS at $400!!
At the core of the strategy was the CUDA parallel computing platform to exploit the GPU power via high-level APIs. Also included were state-of-the-art compilers, runtimes, debuggers, drivers, etc., to facilitate easy adoption as parallel processors. The end-to-end or full-stack approach is the major factor behind Nvidia’s phenomenal success in HPC. Success was far from guaranteed, but Jensen persisted with it. GPUs using CUDA started to power protein folding, oil & gas exploration, etc., in addition to graphics and media processing. Many in academia started to develop algorithms and techniques to use GPU for different problems: Computer Vision, Ray Tracing, Graph Algorithms, Sorting, etc. By 2012, GPUs were being used as compute accelerators widely; 13 of the top-100 supercomputers used Nvidia GPUs, including 2 of the top 10. (Today, 53 of top-100 and 6 of top-10 use them. GPUs use less energy for computations and appear in 70 of top-100 and 7 of top-10 Green-500 supercomputers today.) Nvidia’s focus on HPC was apparently going to pay off over the coming years.
Life got more interesting, with luck meeting the prepared. Deep Learning burst into the scene to transform AI and, along with it, the compute landscape and Nvidia. Artificial Neural Networks had been around for decades as shallow, multi-layer perceptron networks that didn’t scale to larger problems. Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN) with many layers were experimented with by a few. They need huge amounts of data and compute power to train. Huge amounts of text, speech, and image data under varied conditions became available with the explosion of the internet, inexpensive sensors like cameras, microphones, and smartphones, etc. Compute was still a problem, however.
In 2012, AlexNet revolutionized AI landscape by sweeping ImageNet recognition challenge by a huge margin! Images are bulky and need large networks to process. Alex Krizhevsky trained a 60+ million parameter CNN on two Nvidia GTX580 GPUs in about 7 days. This couldn’t be tried without GPUs. Deep networks became the only game for most AI tasks thereafter, with GPUs supplying oxygen. Deeper and bigger networks as well as newer architectures emerged later; they need more data and compute.
Nvidia was quick to spot the potential and pivoted itself to an “AI first” company, concentrating strongly on compute for AI. Architectural features that suit AI computations, such as 16-bit and 8-bit floating point numbers, were added to the hardware. Equally notably, software tools and libraries were developed to exploit the GPU for Deep Learning. CUDA libraries like cuBLAS and cuDNN integrated seamlessly with the high-level tools like TensorFlow and PyTorch developed by others. As heavier networks like Transformers came along, the demand for compute went through the roof. The big ripples created by the introduction of LLMs like ChatGPT to the world made AI a global buzzword. With it went up the ravenous demand for data and compute.
Advances in chip technology makes GPUs faster each year. Nvidia combines it with enhancements in architecture, number representation, memory, etc. Their latest Hopper GPUs have transformer engines to help train foundational models. The overall performance of Nvidia GPUs has doubled yearly in the past decade; this is informally referred to as the “Huang’s Law”.
Compute power is a critical resource in today’s world. Nvidia sits comfortably with its dominant GPU offerings. Nvidia’s datacenter or hyperscaler market is 10 times larger than gaming today, with no credible competition. Other companies are trying to play catch-up. Google builds its own Tensor Processing Units (TPU) to accelerate AI. Other big companies and several startups are building AI processors as alternatives. Cerebras follows a radical approach to build Wafer-Scale Engines with a million compute cores. None of these are available in an easy-to-use manner.
P J Narayanan
Prof. P J Narayanan is Director at IIIT Hyderabad.