Meet e6data: The Kubernetes-native data compute engine promising massive cost savings


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Even when relying on cutting-edge tools from data warehouse providers such as Snowflake and Databricks, enterprises may still find themselves struggling to deal with certain mission-critical workloads. 

But San Francisco-based startup e6data claims to have a solution.

The startup, which has just raised $10 million from Accel and others, has developed a “reimagined” Kubernetes-native compute engine that can slot into any mainstream data intelligence platform, allowing customers to handle compute-intensive workloads with 5x better performance and half the total-cost-of-ownership (TCO) as compared to other mainstream compute engines.

The offering is still new compared to mainstream vendor-backed and open-source compute engines including Spark Trino/Presto (including Starburst), but major industry players, including Freshworks, are already beginning to adopt it for potential price-performance benefits. 

How exactly does e6data solve performance bottlenecks?

Today, nearly every modern data platform — from Snowflake and Databricks to Google BigQuery and Amazon Redshift — has a compute engine at its heart to handle data workloads.

It essentially acts as a workhorse that processes large volumes of data in response to queries, executing operations like data transformation, analysis and modeling. 

While most engines are pretty good at handling traditional workloads like analytical dashboarding and reporting, things begin to get complicated with next-gen use cases like real-time analytics (such as fraud detection or personalization) and generative AI.

These workloads revolve around high query volumes, large-scale data processing or queries on near real-time data, which demands faster computing from the central engine and increases the associated costs.

“These workloads are non-discretionary and growing very, very fast for our customers… It’s not uncommon for the spending on these heavy workloads to be increasing 100-200% per annum…The larger and more mature the enterprise is, the more this pain is being felt today. But this pain is coming for every enterprise data leader,” Vishnu Vasanth, founder and CEO at e6data, tells VentureBeat.

The main reason behind these performance bottlenecks, Vasanth says, is the architecture behind most commercial and open source compute engines.

Being 10-12 years old, most engines are dominated by a central coordinator or driver system responsible for several critical activities across a query’s or job’s lifecycle. The approach works, but when faced with high load, concurrency, or complexity of heavy workloads, these centralized, monolithic components become a source of resource inefficiency or even a single point of failure.

“The traditional notion of the compute engine is that it has a central “brain” that is highly monolithic and top-down in its command and control structure. Think of it being architected with a central puppet master who allocates work to workers and then pulls all the strings to keep them coordinated. Under heavy workload, this architecture is prone to get stuck and deliver inefficiency,” Vasanth explained.

Addressing the gap

To address this gap and give enterprises a better way to handle heavy workloads, he and the e6data team, which has worked on several commercial and open-source data projects, reimagined the compute engine architecture by disaggregating it with decentralized components that can independently and granularly scale in response to various forms of load. 

For these components, the company then implemented a Kubernetes-native (allowing them to run any node in a Kubernetes cluster rather than specific physical nodes) distributed processing approach that did away with centrally driven task scheduling and coordination.

“What we have done differently is break down the central command and control structure into independent decentralized functions that can run at their own pace and coordinate with each other in a bottom-up way. Think of it as a flock of starlings–there is no central puppet master who gets stuck under a heavy load. This architecture is new, and this is our fundamental technical innovation,” Vasanth added.

Significant cost and performance benefits

With this purpose-built compute engine, e6data claims to be delivering 5x better query performance on the heaviest and most pressing workloads and as much as 50% lower TCO than most compute engines on the market. 

e6data vs mainstream compute engine

However, it’s important to note that these metrics have been gathered from early customers, including Freshworks and Chargebee, doing an “apples-to-apples” comparison of the e6 engine vs others. Industry-standard benchmarks from verified institutions will be released in due time, Vasanth said.

Beyond this, the CEO also emphasized that the compute engine stands out in the market by avoiding the hassle of lock-in. 

“With monolithic architectures, they tend to push customers more and more in terms of handing over control of their data stack. They may say ‘Yes you can store your data in that other popular format, but our engine won’t work so well there because it’s specialized for our format.’ Or they may say ‘To use our engine you also have to write all your queries in this specific dialect of SQL (from over 20) that we support.’ These are all ways of locking in the customer to your ecosystem, and it ends up becoming expensive over time.

E6data, on the other hand, easily slots into the existing platform being used by an enterprise, with support for all the most common open table formats (Hive, Delta, Iceberg, Hudi), data catalogs and common SQL dialects. 

“The proof of that is we will not ask you to move the data, change your application or have any downtime. You can get going with us in 2 days flat. And it will work just as well no matter what format you started with,” Vasanth said. 

With these capabilities, it will be interesting to see how quickly e6data can draw the attention of enterprises. Globally, the total addressable market (TAM) for data and AI solutions is slated to touch $230 billion in 2025, with 60% of CXOs planning to increase their spending over the next year alone.



Source link

About The Author