Last week, two major breakthrough AI models were announced. The first is Google’s PaLM. The second is DALL.E 2, the latest AI system from OpenAI that can create realistic images and art from a description in natural language.
The two AI models
Both models are Transformer models that take the written word as input, develop some understanding of said input, and convert that into something else.
Note: Transformer is a state-of-the-art language model that has become the new go to model for AI language applications. It is an improvement over Recurrent Neural Networks (RNN), which tends to lose accuracy as the length of the text increases.
Google’s PaLM
PaLM converts text into understanding that can be employed to answer questions and write software code. Here’s a sample of what Google’s PaLM can do:
The model exhibits an uncanny ability to derive meaning from text, make inferences from contextual information, and reason!
It can explain a joke.
It can also write software code, specifically translating natural language into code. Even though PaLM’s pre-training data set only contained 5% software code (the rest consisted of natural language), PaLM was able to achieve surprisingly accurate results with ~50x less software training data than the previous state-of-the-art model.
In the example below, the model translated one coding language to another (from C to Python), and from natural language to code.
OpenAI’s DALL.E 2
OpenAI’s DALL.E 2 converts text input into artificially generated 2D images. The really cool thing about this model is that it can maintain semantic consistency in the images it creates. Here’s an example taken from their announcement website: Taking input “Teddy Bears†performing certain actions, you can also specify the style of the image, and the DALL.E 2 outputs a synthetic image that is true to the description.
DALL.E 2 can also take images as input and make alterations, while maintaining the original style. Here’s an example:Â
So what?
Impressive as these two models are (they seem to exhibit common sense traits that resemble humans), we are still far away from general human intelligence, as outlined by this tweet thread. Nevertheless, these models will be immediately useful. In last week’s article, we outlined all the different physical labor jobs that robots are increasingly doing. These new AI models may have a similar impact on knowledge workers.
Writing software without coders
The war for high skilled workers has been raging for years now. The shortage is expected to exceed 85 million tech workers by 2030. This is especially acute in the US. Ability to write programs with no code solutions will increasingly be important. Most companies offering no-code products (Squarespace, Web Flow, Wix) try to tackle this problem by first creating a no-code platform for website builders (where the functions are well constrained). However, adoption of no-code tools for enterprise use or for creating other consumer applications are still limited.
The example that Google’s PaLM showcased paints a compelling future whereby a more general software can be written from natural human language. Interestingly, only two months ago, another Google company, DeepMind, published its research on AlphaCode, an AI system that competes in programming competitions (which tests good coding skills and problem solving creativity).
The system achieves about average performance when ranked against other participants (which means that it is likely performing at an above average level programmer, since the average competitive programmer is likely a better programmer than the average population). Google is not the only company that is trying to develop AI to write software. Last year, Microsoft, the proud owner of GitHub, which is the single largest source of training data for these AI models, released a tool called Copilot (powered by partnership with OpenAI), which helps programmers synthesize code. It promises to accelerate software development, but it is not without its problems. Copilot can accidentally output snippets of code copyrighted by others.
Building digital assets
While the first AI models were built to perform image recognition tasks, these models above represent a leap, using written text to generate content. The DALL.E 2 in particular, paints an incredible possibility of pushing content creation costs to zero.
If we can create 2D static images with words. In time, we will be able to create moving 2D images and 3D photorealistic worlds without programming. Potential future applications could be:
-
- More anime for Netflix and other streaming services. About half of Netflix’s 200 million global subscribers have watched at least one anime show. While producing anime is cheaper than live action shows (and therefore more profitable considering the global appetite), creating a hand drawn animation is still a slow and painstaking process.
-
- AI training AI. Collecting and labeling real world imagery to train AI models is increasingly becoming more mainstream (especially in the development of robots and self-driving cars – read more about our previous discussion in NVIDIA’s Omniverse). If we can conjure images and video using text, we can lower the technical hurdle to create such synthetic systems that will accelerate training of other AI models.
-
- Games and MetaVerse. Similarly, systems such as OpenAI’s DALL.E 2 can potentially create auto-generated 3D worlds for gaming and the MetaVerse.
-
- Better image editing for Microsoft Office. Considering OpenAI has been partnering with Microsoft, it wouldn’t be surprising if the capabilities are integrated into Office.
We continue to gain significant improvements through scale
Another interesting trend is that the field continues to make significant progress through continued increase in scale. Google’s PaLM has 540 billion parameters, which is 3x the parameter count of the previous state-of-the-art GPT-3.
See Figure 6 below to see the rapid increase in model size through the years (which can be used as a proxy for model complexity and memory requirement). Note that the Y-axis is in log scale. This means, every tick mark is a 10x increase. For most domains, the model size seems to double every 18 – 24 months (about the same as Moore’s law). After 2018, the language domain (orange square) seems to have accelerated in model size (about 750x over 2 years). This is because of the switch of focus towards using Transformers for language models.
Google’s PaLM has 540 billion parameters, which is 3x the parameter count of the previous state-of-the-art-language model GPT-3.
Ok, so it seems that large data sets and model parameters are extremely important, especially for Transformers type models. What does this mean for the hardware makers?
NVIDIA’s products solving these scale challenges
It’s impossible to talk about AI advancement without discussing NVIDIA. Although the PaLM model was trained using Google’s proprietary AI chips, most AI development is still done with NVIDIA’s hardware.
The consistent exponential return of scale seems to suggest that the key bottleneck of performance is the hardware. To move a large amount of data during training through a very large AI model requires hardware parallelization capable of very large data (terabytes) throughput.
The importance of managing large data is evident in NVIDIA’s latest AI chip product (to be sold to data centers). The company’s latest product, the H100, doubled down on hardware features that moves data faster and built into the chip design to speed up calculations involving Transformer type models (they call it the Transformer Engine). Compared to the previous generation, the A100, the H100 is multiple times faster (Figure 7).
This is why NVIDIA continues to be the market leader in the AI chip space. Its revenue at the end of the previous quarter was $7.4 billion. That’s a 47% increase over the same quarter last year, primarily due to growth of the data center (AI) segment, which grew 71% (see Figure 8 below).