NVIDIA has revealed they have sealed yet another deal with Amazon Web Services to expand on their products to cater to generative AI applications.
This new partnership will see AWS as the first cloud service provider to offer Team Green’s GH200 Grace Hopper Superchips with multi-node NVLink which takes advantage of the AWS 3rd-Gen Elastic Fabric Adapter (EFA) interconnect leading to up to 400Gbps per Superchip of low latency and high bandwidth networking throughput together with superior scalability in EC2 UltraClusters.
These specific EC2 instances will have 4.5 TB of HBM3e memory which is 7.2x more than H100-powered EC2 P5d instances plus the CPU-to-GPU memory interconnect providing up to 7x higher bandwidth than PCIe.
All of these will be deployed with a set of liquid cooling solutions for maximum efficiency and rack space utilization. Additionally, AWS Nitro System is also helping in the EC2 instances by offloading I/O functions to dedicated hardware for consistent performance and a protective execution environment.
On the other hand, NVIDIA DGX Cloud with NVIDIA AI Enterprise integrated will be joining AWS as well for easy and fast access to LLM and generative AI model training.
Here are some other AWS instances’ improvements through the latest collaboration looks like summarized:
- AWS P5e
- H200 GPU with 141GB HBM3e memory (1.8x more capacity, 1.4x faster, up to 3200Gbps of EFA networking)
- AWS EC2 G6e
- L40S GPU (video and graphics-related workload, cost-effective, energy efficient)
- AWS EC2 G6
- L40 GPU (video and graphics-related workload, cost-effective, energy efficient)
Meanwhile, there’s this new NVIDIA NeMo Retriever microservice platform offering new tools to create super accurate chatbots and summarization tools using accelerated semantic retrieval, and BioNeMo, a specialized version for drug discovery utilized by big pharma, will be coming to AWS on NVIDIA DGX Cloud on top of the existing Amazon SageMaker.
Back to the topic of NeMo Retriever, the microservice is using NVIDIA-optimized algorithms to enable generative AI apps to churn out more accurate responses based on business data that is residing in the cloud or data centers.
This retrieval-augmented generation (RAG) capability is currently being worked with Cadence, Dropbox, SAP, and ServiceNow to make some production-ready models so that businesses can use them as reference and craft custom generative AI applications and services more quickly.
While there are certainly open-source RAG toolkits, NVIDIA’s NeMo Retriever “one-ups” them via commercially viable models API stability, security patches, and enterprise support.
Some examples come in the form of optimized embedding models that capture relationships between words for the highest accuracy results possible, and if needed, even support different data types like images, videos, and PDFs.