Nvidia unveils generative AI microservice for accurate answers with enterprise data


Nvidia Corp. today announced a new generative artificial intelligence microservice designed to allow enterprise businesses to connect custom chatbots, copilots and AI summarization tools to real-time proprietary company data to deliver more accurate results.

The new service, called NeMo Retriever, is part of the Nvidia NeMo cloud-native family of frameworks and tools for building, customizing and deploying generative AI models. It’s designed to provide enterprise organizations the ability to build retrieval-augmented generation capabilities into their generative AI applications.

Retrieval-augmented generation, or RAG, is a method for increasing the accuracy and safety of generative AI models by filling in the gaps in the “knowledge” of large language models with facts and data retrieved from external sources. An LLM receives up-front training that provides it with a lot of general task knowledge and capabilities such as understanding conversational prompts, summarization and providing question-and-answer capabilities. Training is expensive and time-consuming, so it is often only done once, or rarely, to prepare a model for deployment.

However, once deployed the model itself will lack real-time information and up-to-date domain-specific expertise, which can lead to inaccuracies and what are called “hallucinations” – when an LLM answers a question confidently but incorrectly.

Using NeMo Retriever, up-to-date data can be fed into an LLM from a multitude of sources including databases, HTML, PDFs, images, videos and other modalities. This means that the model will have a much more rounded set of facts provided by the enterprise customer’s own proprietary sources that can be updated as data becomes available. The data can reside anywhere including clouds, data centers or on-premises and it can be securely accessed.

“This is the holy grail for chatbots across the enterprise because the vast majority of useful data is the proprietary data that is not the publicly available data embedded inside of these models but what is available inside companies,” said Ian Buck, vice president of hyperscale and high-performance computing at Nvidia. “So, combining AI with a customer’s database makes it more productive, more accurate, more useful and lets customers optimize models’ capabilities.”

Through the addition of proprietary data inaccurate answers can be reduced because the LLM has better contextual information to draw upon to produce results, which increases its accuracy. Similar to how research papers provide citations to where they got their information, Retriever’s RAG capability provides extra sources of expert information based on internal domain-specific knowledge of a company in order to better inform the LLM so that it can deliver better, more accurate answers to questions posed to it.

Unlike community-led open-source RAG toolkits, Nvidia said that Retriever is designed to support commercial and production-ready generative AI models that are already available and optimized for RAG capabilities, enterprise support and managed security patches.

Enterprise customers such as electronics systems designer Cadence Design Systems Inc., Dropbox Inc., SAP SE and ServiceNow Inc. are already working with Nvidia to use the new capability to bring RAG into their custom generative AI tools, applications and services.

Anirudh Devgan, president and chief executive of Cadence, said researchers at the company are working with Nvidia to use Retriever to help produce higher-quality electronics by increasing accuracy. “Generative AI introduces innovative approaches to address customer needs, such as tools to uncover potential flaws early in the design process,” Devgan said.

Buck said that through using Retriever customers could get more accurate results with less time training generative AI models. This means that enterprise customers could take more off-the-shelf models and simply deploy them and use their own internal data without the need to spend as much time, expense and energy to train models consistently to keep them up to date.

NeMo Retriever will add the RAG capabilities described above as part of Nvidia AI Enterprise, an end-to-end cloud-native software platform for streamlining the development of AI applications. Developers can sign up for early access to NeMo Retriever today.

Image: Nvidia

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy



Leave a Reply

Your email address will not be published. Required fields are marked *