Puppygraph speeds up LLMs’ access to graph data insights


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


As enterprises continue to invest heavily in advanced analytics and large language models (LLMs), graph technology has become one of the most favored approaches for setting up the data stack. It allows users to understand complex relationships in their datasets, which are often not apparent in traditional relational databases.

However, maintaining and querying graph databases alongside traditional relational databases is quite a hassle (and an expensive one). Today, PuppyGraph, a San Francisco-based startup founded by former Google and LinkedIn employees, raised $5 million to solve this gap with the world’s first and only zero-ETL query engine. The engine allows users to query their existing relational data as a unified graph without needing a separate graph database and long extract-transform-load (ETL) processes. 

The engine launched in March 2024 and is already being used by several enterprises to simplify data analytics. Its forever-free developer edition alone is witnessing a 70% month-over-month download increase. 

The need for PuppyGraph

A graph database architecture mirrors sketching on a whiteboard, storing all the information in nodes (representing entities, people and concepts) with relevant context and connections between them. Using this graph structure, users can identify complex patterns and relationships that may not be easily apparent in traditional relational databases (queried via SQL) and deploy algorithms to quickly enable use cases such as AI/ML, fraud detection, customer journey mapping and risk management for networks. 

In the current scheme of things, the only way to adopt graph technologies is to set up a separate native graph database and keep it in sync with the source database. The task sounds easy but becomes very complicated, with teams having to set up complex and resource-intensive ETL pipelines to migrate their datasets to graph storage. This can easily cost millions and take months, keeping users from running critical business queries. 

Not to mention, once the database is set up, they also have to manage it continuously, which further adds to the cost and creates scalability problems in the long run. 

To address these gaps, former Google and LinkedIn employees Weimo Liu, Lei Huang and Danfeng Xu came together and started PuppyGraph. The idea was to provide teams with a way to query their existing relational databases and data lakes as graphs, without data migrations.

This way, the same data that is analyzed with SQL queries could be analyzed as a graph, leading to faster access to insights. This can be particularly useful for cases where the data is deeply connected with multi-level relationships, like in supply chain or cybersecurity. 

“The deeper the level, the more complex the query becomes in a traditional SQL query. This is because each additional level requires an additional table join operation, compounding the complexity and potentially slowing down the query performance dramatically… In contrast, graph query handles these multi-level relationships much more efficiently. They are designed to quickly traverse these connections using paths through the graph, regardless of the depth of the connection,” Zhenni Wu, who joined PuppyGraph’s founding team, told VentureBeat. 

Wu said PuppyGraph eliminates the need for extensive ETL setups entirely, enabling ‘deployment to query’ in just about 10 minutes. All the user has to do is connect the tool with their data source of choice. Once done, it automatically creates a graph schema and queries the tables in graph models. Also, the engine’s distributed design allows it to handle extremely large datasets and complex multi-hop queries.

It can connect to all mainstream data lakes, including Google BigQuery and Databricks, to run accelerated graph analytics – while keeping costs on the lower side at the same time.

“The separation of storage and compute architecture means that low cost is PuppyGraph‘s one of the biggest advantages. There is zero storage cost because the engine directly queries data from users’ existing data lake/warehouse. It provides the flexibility to scale compute resources as needed, allowing adjustments to handle fluctuating workloads efficiently, without risking resource contention or performance degradation,” Wu added.

Significant impact in early days

While the company is less than a year old, it is already witnessing success with several enterprises, including Coinbase, Clarivate, Dawn Capital and Prevelant AI.

In one case, an enterprise transitioned to PuppyGraph from a legacy graph database system and managed to cut its total cost of ownership by over 80%. A leading financial trading platform was able to achieve a 5-hop path query between account A and account B across around 1 billion edges in less than 3 seconds. 

Before PuppyGraph, their self-built SQL-based solution couldn’t even query beyond a 3-hop query and had batch time-out issues. 

With this funding, the company plans to accelerate its product development, expand its team and increase its market presence by taking the zero-ETL graph query engine to more organizations worldwide.

According to Gartner, the market for graph technologies will grow to $3.2 billion by 2025 with a CAGR of 28.1%. Other players in the category are Neo4j, AWS Neptune, Aerospike and ArrangoDB. 



Source link

About The Author