Lao Huang's late-night bombing, AIGC enters the iPhone moment! Hugging Face is connected to the strongest supercomputer, and the mysterious graphics card is better than A100
Last night, Lao Huang, who returned to the SIGGRAPH stage, once again shocked the world with "billion points".
The era of generative AI has arrived, and its iPhone time has arrived!
Just on August 8, Nvidia CEO Huang Renxun once again boarded the stage of SIGGRAPH, the world's top computer graphics conference.
A series of major updates followed - the next generation GH200 super chip platform, AI Workbench, OpenUSD...
And Nvidia has also used this to integrate all the innovations of the past few decades, such as artificial intelligence, virtual worlds, acceleration, simulation, collaboration, etc., in one fell swoop.
In this era of LLM explosion, Lao Huang still dares to say boldly: "The more you buy, the more you save!"
Nvidia's most powerful AI supercomputer is upgraded again
At SIGGRAPH 5 years ago, NVIDIA redefined computer graphics by bringing artificial intelligence and real-time ray tracing to the GPU.
Lao Huang said: "When we redefine computer graphics through AI, we are also redefining GPU for AI."
With that comes increasingly powerful computing systems. For example, the NVIDIA HGX H100 integrates 8 GPUs and has 1 trillion transistors.
Just today, Lao Huang once again brought AI computing to a higher level——
In addition to equipping NVIDIA GH200 Grace Hopper with more advanced HBM3e memory, the next-generation GH200 superchip platform will also have the ability to connect multiple GPUs for superior performance and easily scalable server designs.
And this new platform with multiple configurations will be able to handle the world's most complex generative workloads, including large language models, recommendation systems, vector databases, and more.
For example, the dual-core solution includes a server equipped with 144 Arm Neoverse cores and 282GB of HBM3e memory, which can provide 8 petaflops of AI computing power.
Among them, the new HBM3e memory is 50% faster than the current HBM3. The combined bandwidth of 10TB/sec also enables the new platform to run models 3.5 times larger than the previous version, while improving performance through 3 times faster memory bandwidth.
It is reported that the product is expected to be launched in the second quarter of 2024.
RTX Workstation: Excellent Knife, 4 Graphics Cards Are New
This time, Lao Huang's desktop AI workstation GPU series is also comprehensively updated, launching 4 new products in one go: RTX 6000, RTX 5000, RTX 4500 and RTX 4000.
If the H100 and the supporting product line show the skyline of Nvidia GPU performance, these products for desktops and data centers are an excellent "knife technique" for cost-sensitive customers.
When the new GPU was released, there was an unexpected tidbit on the scene.
When Lao Huang took out the first GPU from the backstage, he seemed to have accidentally stained his fingerprints on the mirror panel.
After Lao Huang found out, he felt that he might have screwed up, so he was very embarrassed to say sorry to the audience, saying that this product launch may be the worst one ever.
It seems that even if the development conference is as proficient as Lao Huang, there will be moments of overturning.
And such a lovely old Huang also made the audience laugh constantly.
Closer to home, as a flagship professional card, the performance parameters of the RTX 6000 are undoubtedly the strongest among the four new products.
With 48GB of video memory, 18176 CUDA cores, 568 Tensor cores, 142 RT cores, and a bandwidth of up to 960GB/s, it can be described as the best.
The RTX 5000 is equipped with 32GB of video memory, 12,800 CUDA cores, 400 Tensor cores, and 100 RT cores.
The RTX 4500 is equipped with 24GB of video memory, 7680 CUDA cores, 240 Tensor cores, and 60 RT cores.
The RTX 4000 is equipped with 20GB of video memory, 6144 CUDA cores, 192 Tensor cores, and 48 RT cores.
Based on the newly released 4 new GPUs, Huang also prepared a one-stop solution for enterprise customers - RTX Workstation.
It supports up to 4 RTX 6000 GPUs, and can complete the fine-tuning of GPT3-40B with 860 million tokens within 15 hours.
It also enables the Stable Diffusion XL to generate 40 images per minute, 5 times faster than the 4090.
OVX server: Equipped with L40S, the performance is slightly better than A100
The performance of NVIDIA L40S GPU, which is specially designed for building data centers, is even more explosive.
Based on the Ada Lovelace architecture, the L40S is equipped with 48GB of GDDR6 memory and a bandwidth of 846GB/s.
With the blessing of the fourth-generation Tensor core and FP8 Transformer engine, it can provide a tensor processing capability of more than 1.45 petaflops.
For computationally demanding tasks, the L40S' 18,176 CUDA cores can deliver nearly five times the single-precision floating-point (FP32) performance of the A100, accelerating complex calculations and data-intensive analysis.
In addition, in order to support professional visual processing such as real-time rendering, product design and 3D content creation, Nvidia also equipped the L40S with 142 third-generation RT cores, which can provide 212 teraflops of ray tracing performance.
For generative AI workloads with billions of parameters and multiple modalities, the L40S can achieve up to 1.2x inference performance and up to 1.7x training performance over its predecessor, the A100.
With the support of L40S GPU, Lao Huang launched an OVX server that can carry up to 8 L40S for the data center market.
For the GPT3-40B model with 860 million tokens, the OVX server only needs 7 hours to complete the fine-tuning.
For the Stable Diffusion XL model, 80 images per minute can be generated.
AI Workbench: Accelerating Custom Generative AI Applications
In addition to a variety of powerful hardware, Huang also released a new NVIDIA AI Workbench to help develop and deploy generative AI models.
In summary, AI Workbench provides developers with a unified and easy-to-use toolkit to quickly create, test, and fine-tune models on a PC or workstation, and scale seamlessly to virtually any data center, public cloud, or NVIDIA DGX Cloud superior.
Specifically, the advantages of AI Workbench are as follows:
- easy to use
AI Workbench simplifies the development process by providing a single platform to manage data, models, and computing resources, enabling collaboration across machines and environments.
- Integrate AI development tools and repositories
AI Workbench is integrated with services such as GitHub, NVIDIA NGC, Hugging Face, etc. Developers can use tools such as JupyterLab and VS Code, and develop on different platforms and infrastructures.
- Enhanced collaboration
AI Workbench adopts a project-centric architecture that facilitates developers to automate complex tasks such as version control, container management, and handling confidential information, while also supporting collaboration between teams.
- Access to accelerated computing resources
AI Workbench is deployed in a client-server model. Teams can develop on local computing resources now, then switch to data center or cloud resources when training tasks become larger.
Stable Diffusion XL Custom Image Generation
First, open AI Workbench and clone a repository.
Next, in Jupyter Notebook, load the pre-trained Stable Diffusion XL model from Hugging Face and ask it to generate a "Toy Jensen in Space".
However, as can be seen from the output image, the model does not know who Toy Jensen is.
At this point it was time to go through DreamBooth and fine-tune the model using 8 images of Toy Jensen.
Finally, re-run inference on the UI.
Now, knowing who Toy Jensen is modeled on, it's time to generate images that suit our needs.
Hugging Face One-click access to the strongest computing power
As one of the most popular platforms for AI developers, Hugging Face, which has 2 million users, over 250,000 models, and 50,000 data sets, has also successfully reached a cooperation with Nvidia this time.
Now, developers can directly obtain the support of NVIDIA DGX Cloud AI supercomputing through the Hugging Face platform, so as to complete the training and fine-tuning of AI models more efficiently.
Among them, each DGX Cloud instance is equipped with 8 H100 or A100 80GB GPUs, and each node has a total of 640GB video memory, which can meet the performance requirements of top AI workloads.
In addition, Nvidia will also launch a new "Training Cluster as a Service" service in conjunction with Hugging Face to simplify the process of creating and customizing generative AI models for enterprises.
In this regard, Lao Huang said excitedly: "This time, Hugging Face and NVIDIA have truly connected the world's largest AI community with the world's leading cloud AI computing platform. Users of Hugging Face can Access NVIDIA's strongest AI computing power."
AI Enterprise 4.0: Custom Enterprise-Grade Generative AI
In order to further accelerate the application of generative AI, NVIDIA has also upgraded its enterprise platform NVIDIA AI Enterprise to version 4.0.
Today, AI Enterprise 4.0 not only provides enterprises with the tools needed for generative AI, but also provides the security and API stability required for production deployment.
A cloud-native framework for building, customizing, and deploying large language models. With NeMo, NVIDIA AI Enterprise provides end-to-end support for creating and customizing large language model applications.
- NVIDIA Triton Management Services
Help enterprises to automate and optimize production deployment, so that they can automatically deploy multiple inference server instances in Kubernetes, and realize efficient operation of scalable A through model coordination.
- NVIDIA Base Command Manager Essentials cluster management software
Helps enterprises maximize the performance and utilization of AI servers in data centers, multi-cloud and hybrid cloud environments.
In addition to Nvidia itself, AI Enterprise 4.0 will also be integrated into other partners, such as Google Cloud and Microsoft Azure.
Additionally, MLOps providers, including Azure Machine Learning, ClearML, Domino Data Lab, Run:AI, and Weights & Biases, will seamlessly integrate with the NVIDIA AI Platform to simplify the development of generative AI models.
Omniverse: Adding Big Language Models to the Metaverse
Finally, an update to the NVIDIA Omniverse platform.
After accessing OpenUSD and AIGC tools, developers can more easily generate 3D scenes and graphics that simulate the real world.
Just like its name, Omniverse is positioned as a 3D graphics production collaboration platform that integrates various tools.
3D developers can co-create 3D graphics and scenes on Omniverse just like text editors in Feishu or DingTalk.
Moreover, the results produced by different 3D production tools can be directly integrated into Omniverse, completely opening up the production workflow of 3D graphics and scenes, simplifying the complexity.
And in this update, what is the connected OpenUSD?
OpenUSD (Universal Scene Description) provides an open-source, universal scene description format, enabling barrier-free collaboration between different brands and types of 3D design software.
Omniverse itself is built on the USD system. The upgrade of Omniverse for OpenUSD enables Omniverse to introduce more frameworks and resource services for developers and enterprises.
Based on OpenUSD, an open source 3D image editing format, five companies (Apple, Pixar, Adobe, Autodesk, and Nvidia) established the AOUSD Alliance to further promote the adoption of OpenUSD in the 3D image industry.
Moreover, with the establishment of the AOUSD Alliance, Omniverse developers can also easily create a variety of materials and content compatible with Apple's ARKit or RealityKit. After the update, Omniverse also supports the OpenXR standard, enabling Omniverse to support HTC VIVE, Magic Leap , Vajio and other VR headsets.
API, ChatUSD and other updates
In addition, Nvidia has released a new Omniverse Cloud API that allows developers to more seamlessly deploy OpenUSD pipelines and applications.
The most striking thing is the support of ChatUSD based on the large language model.
ChatUSD based on large language model technology can answer developers' related questions on the Omniverse platform like Github Copilot, or automatically generate Python-USD codes, which greatly increases developer efficiency.
All in all, Nvidia once again used violent products, amazing technology, and far-sighted insights to let the world see again how it will lead the new wave of world AI and graphics computing in the future.
In Lao Huang's classic saying "the more you buy, the more you save!", Lao Huang walked off the stage slowly, but pushed the atmosphere to the climax.