Nvidia and Microsoft’s new model may trump GPT-3 in race to NLP supremacy
Chipmaker Nvidia and Microsoft claim they have built the world’s largest artificial intelligence (AI) powered language model to date. The model, called the Megatron-Turing Natural Language Generation (MT-NLP) is a successor to the two companies’ earlier work, which gave rise to the Turing NLG 17B and Megatron-LM models. It contains 530 parameters, which the companies claim will bring “unmatched” accuracy when the AI is put to work on natural learning tasks. This includes reading, common sense reasoning, word sense disambiguation and natural language inferences.
In comparison to MT-NLP, OpenAI’s GPT-3 AI has only 175 billion parameters. GPT-3 is widely considered to be a standard for NLP models and has already been put to work in many use cases. In machine learning (ML) and AI algorithms, the number of parameters included in the model allow the system to make more and better predictions. As a general rule of thumb, the more the parameters, the better will be the predictions.
“Language models with large numbers of parameters, more data, and more training time acquire a richer, more nuanced understanding of language,” the companies said in a blog post. “The 105-layer, transformer-based MT-NLG improved upon the prior state-of-the-art models in zero-, one-, and few-shot settings and set the new standard for large-scale language models in both model scale and quality,” the companies said in a blog post.
To be sure, an NLP system of this kind cannot be run using just any computer. Microsoft and Nvidia’s AI machine requires supercomputer hardware, like Selene, the sixth fastest supercomputer in the world. It consists of hardware such as Nvidia A100 Graphic processing units (GPUs), and advanced network solutions like Mellanox HDDR networking.
Over the past few years, NLP models have become a matter of competition amongst the biggest tech giants, especially when it comes to outperforming GPT-3. Other than Microsoft-Nvidia and OpenAI, Search giant Google had also unveiled a language model called LaMBDA, or Language Model for Dialogue Applications, which the company claimed “can engage in a free-flowing way about a seemingly endless number of topics, an ability we think could unlock more natural ways of interacting with technology and entirely new categories of helpful applications.”
Can’t rid AI of bias
Interestingly, the companies admitted in the blog post that such giant language models are most likely to suffer from bias and toxicity. “Understanding and removing these problems in language models is under active research by the AI community, including at Microsoft and NVIDIA,” they said.
They reported observations that the model picks up stereotypes and biases from the data provided, and continued research is necessary to quantify the bias of the model, and also warned that practical deployment of such solutions should require measures to mitigate and minimize potential harm to users.
“We live in a time where AI advancements are far outpacing Moore’s law. We continue to see more computation power being made available with newer generations of GPUs, interconnected at lightning speeds,” Microsoft said. The technology conglomerate added that the results are a big step towards unlocking the full promise of AI in natural language.