Why Snowflake is backing embedding startup Voyage AI to improve enterprise RAG 


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


In the world of Retrieval Augmented Generation (RAG) for enterprise AI, embedding models are critical.

It is the embedding model that essentially translates different types of content into vectors, where it can be understood and used by AI and RAG approaches. OpenAI at one point dominated the embeddings space with its ada embeddings model, but some enterprises have come to realize over time that it’s not specific enough for their particular use cases. That’s where Voyage AI fits into the market.

The startup today announced that it has raised a $20 million series A round of funding to advance the development of its embedding and retrieval models for enterprise RAG AI use cases. Among the company’s backers is cloud data vendor Snowflake, which is now also set to integrate the Voyage AI models into its Cortex AI service. Specifically, the Voyage AI will land in the Cortex AI search service which is based on technology from Snowflake’s acquisition of AI search vendor Neeva.

Voyage AI’s mission is all about making enterprise RAG better. The company has a multilingual embedding model that supports 27 languages, with a high degree of accuracy.

“Basically, we make RAG better by improving the retrieval quality,” Tengyu Ma, founder and CEO of Voyage AI, told VentureBeat. “When you have more relevant documents, the response becomes better, because if you don’t have relevant documents, then the large language model will hallucinate.”

How Voyage AI improves enterprise RAG with better embeddings

Embedding models are nothing new and are a foundational element of large language model (LLM) training and RAG deployments.

Ma explained that Voyage AI is about building embedding and reranker models for improving retrieval quality. Ma said that when it comes to RAG where specific domain or enterprise information is needed, existing approaches, particularly OpenAI’s approach, aren’t enough.

“I think people realize that OpenAI’s ada is not good enough now, because when you have higher and higher accuracy requirements, it is not accurate enough,” Ma said. “So we do embeddings with better accuracy and more understanding of complex concepts.”

He explained that the way Voyage AI improves accuracy is with a number of advanced techniques. Voyage AI optimizes every part of the training pipeline. That includes collecting and filtering the data. Ma also noted that his company trains its models for different specific domains such as coding, finance and legal use cases.

“This allows us to get even better performance for a particular domain,” he said.

How a contrastive learning approach improves training

Training is often a particularly thorny issue as most data is unlabelled. 

In order to get value from unlabelled data for an enterprise, Voyage AI uses a technique called contrastive learning to train its models. Ma explained that contrastive learning is a different approach than the typical ‘next word prediction’ approach that is used for some training operations. In the next-word approach, the model predicts what word or words should follow another word or phrase based on patterns. Contrastive learning takes a different path.

“You create this kind of so called contrastive pairs from unlabeled data, and use that to train the model,” Ma said.

Why Snowflake is embracing Voyage AI to improve enterprise RAG

For Snowflake, supporting Voyage AI and integrating it into its Cortex AI services, is all about making AI more useful to enterprise users.

“Every provider is trying to build some kind of a RAG system and very much the angle we take is you point us at the data, you can talk to your data, and whether it’s structured or unstructured, it will just work,” Vivek Raghunathan, SVP of Engineering at Snowflake told VentureBeat.

Raghunathan added that Snowflake is excited about Voyage AI’s models because of the improved and advanced capabilities that they will bring to Snowflake’s customers including multilingual capabilities. He also noted that Voyage AI provides longer context windows which will also help to improve enterprise use cases.

Snowflake already has its own Arctic embedding model which is currently often the default. The Voyage AI models will provide an optional alternative for users.

“Think of the Pareto frontier of efficiency versus quality, our models tend to be focused for a certain size,” Raghunathan said. “Voyage AI ‘s models are far higher quality for the really hard use cases.”



Source link

About The Author