InfluxData avoids ’AI magic bean’s in InfluxDB time series database update for enterprises


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


InfluxData has released today a series of updates for its namesake InfluxDB time series database, bringing new deployment options and observability to users. 

A time series database optimizes the storage and querying of time-stamped (also referred to as time series) data. Time series databases have a variety of enterprise and operational use cases including powering operational monitoring and real-time dashboards. Organizations widely use time series databases to help optimize server, system, and sensor performance. To date, InfluxDB 2.0 has been available as an open-source technology, as well as a fully managed service known as Amazon Timestream for InfluxDB. InfluxDB 3.0 which provides more performance and other real-time database capabilities is available in a service called InfluxDB Cloud Dedicated. Today, InfluxData is adding a new InfluxDB 3.0 option with the debut of InfluxDB Clustered, which provides organizations the option to run on-premises and in private cloud deployments. 

Alongside the new InfluxDB Clustered service InfluxData is improving its InfluxDB offerings with better observability, dashboards and performance. The updated capabilities and deployment options are all part of the company’s ongoing effort to continue to meet enterprise requirements for time series data use cases.

“There’s been a whole lot of work around basically just maturing the database, optimizing performance, working with early customers to make sure they’re getting what they need out of the product,” Paul Dix, co-founder and CTO of InfluxData told VentureBeat. “InfluxDB 3.0 was basically a ground-up rewrite of the entire database, there’s a lot of work you have to do after an initial product release to just basically tune things and get everything going.”

Why serverless is not an ideal option for time series data

A prevailing trend with multiple database vendors in recent years has been to offer some form of so-called serverless database. All the major cloud vendors have serverless database offerings, as do some of the leading independent vendors including vector database pioneer Pinecone.

The basic promise of serverless is that the database only runs when needed, saving users money by not needing to run long-running services. InfluxData does have a serverless offering that is available on AWS, but Dix argued that it’s not the primary way that most time series database users want or need to deploy.

Dix said that serverless tend to only appeal to InfluxDB customers who basically just want to try out the product and pay for usage in a limited deployment. 

“For almost every customer that we’ve seen in larger tiers where it’s more performance critical, they actually don’t want serverless environments, they want dedicated environments and they want more predictable pricing,” Dix said. “A lot of the larger customers are kind of allergic to this idea of usage-based pricing.”

With serverless there is no fixed component for cost. In contrast with a dedicated database approach, InfluxDB charges a fixed rate based on the number of virtual machines used for compute and the amount of data stored.

The reason why dedicated services, which InfluxDB Cloud Dedicated and InfluxDB Clustered both provide, are directly related to the use cases for time series data. Dix explained that organizations typically do not use time series data for ad hoc data analysis. Rather some common long-running processes need to always be available.

With InfluxDB, Dix said organizations are commonly using it for monitoring and learning systems, which are executing queries all the time at a fairly consistent rate. Organizations commonly use InfluxDB for real-time dashboards, which also require a persistent time series database.

Why AI for time-series databases is ‘magic beans’

While it seems like nearly every database vendor is talking about adding AI support in some way, InfluxData is not one of them.

Dix emphasized that data is obviously very important for AI and you can’t train a model without data. To that end, InfluxDB could potentially be used to help train a model, but that’s not a core focus for the company.

“We’re not trying to bring AI into our product and do things like make predictions of time series data,” Dix said. “AI-based predictions on time series are magic beans, it’s total BS.”

That’s not to say that time series data doesn’t have forecasting and prediction needs, it’s just that those needs have been met for years by non-AI-based algorithms and data science techniques. 

“All those tools, depending on the thing, can be accurate and very useful, particularly in an industrial setting,” Dix said. “But trying to apply AI to magically get better results, usually doesn’t pan out very well.”

What’s next for time series database technology at InfluxData

Looking forward, InfluxDB plans to add a few key technology capabilities to its time series database services in the coming months.

Dix noted that later this year InfluxDB will be adding more granular access control features, allowing filtering of queries based on key-value pairs and more fine-grained write permissions.

InfluxData is also working on adding support for the Apache Iceberg open-source data lake table specification. Iceberg is increasingly becoming a de facto standard for data lakes, and large vendors including Snowflake, Microsoft, and Databricks, among others, already support it.

“What we’re building out right now is integration with Iceberg so that, essentially you can ingest all your data inside of InfluxDB, and then it also gets exposed as an Iceberg catalog, so that you can then query that data using tools like Snowflake, Databricks or whatever other tool you want,” Dix said.



Source link

About The Author