Microsoft-backed startup debuts task optimized enterprise AI models that run on CPUs


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


A new enterprise AI focussed startup is emerging from stealth today with the promise of providing what it calls ‘task-optimized’ models that provide better performance at lower cost.

Fastino , based in San Francisco is also revealing that it has raised $7 million in a pre-seed funding round from Insight Partners and M12, Microsoft’s Venture Fund as well as participation from Github CEO Thomas Dohmke. Fastino is building its own family of enterprise AI models as well as developer tooling. The models are new and are not based on any existing Large Language Models (LLMs). Like most generative AI vendors, Fastino’s models have a transformer architecture though it is using some innovative techniques designed to improve accuracy and enterprise utility.  Unlike most other LLMs providers, Fastino’s models will run well on general-purpose CPUs, and do not require high-cost GPUs to run.

The idea for Fastino was born out of the founders’ own experiences in the industry and real-world challenges in deploying AI at scale. 

Ash Lewis, CEO and co-founder of the company had been building a developer agent technology known as DevGPT. His co-founder,  George Hurn-Maloney, was previously the founder of Waterway DevOps which was acquired by JFrog in 2023. Lewis explained that his prior company’s developer agent was using OpenAI in the background, which led to some issues.

“We were spending close to a million dollars a year on the API,” Lewis said. “We didn’t feel like we had any real control over that.” 

Fastino’s approach represents a departure from traditional large language models. Rather than creating general-purpose AI models, the company has developed task-optimized models that excel at specific enterprise functions. 

“The whole idea is that if you narrow the scope of these models, make them less generalist so that they’re more optimized for your task, they can only respond within scope,” Lewis explained.

How the task optimized model approach could bring more efficiency to enterprise AI

The concept of using a smaller model to optimize for a specific use case, isn’t an entirely new idea. Small Language Models (SLM) , such as Microsoft’s Phi-2 and vendors like Arcee AI have been advocating the approach for a while.

Hurn-Maloney said that Fastino is calling its models task optimized rather than SLMs for a number of reasons. For one, in his view, the term “small” has often  carried the connotation of being less accurate, which is not the case for Fastino. Lewis said that the goal is to actually create a new model category that is not a generalist model that is just large or small by parameter count.

Fastino’s models are task-optimized rather than being generalist models. The goal is to make the models less broad in scope and more specialized for specific enterprise tasks. By focusing on specific tasks, Fastino claims that its models are able to achieve higher accuracy and reliability compared to generalist language models.

These models particularly excel at:

  • Structuring textual data
  • Supporting RAG (retrieval-augmented generation) pipelines
  • Task planning and reasoning
  • Generating JSON responses for function calling

Optimized models means no GPU is required, lowering enterprise AI costs

A key differentiator for the Fastino models is the fact that they can run on CPUs and do not require the use of GPU AI accelerator technology.

Fastino enables fast inference on CPUs using a number of different techniques.

“If we’re just talking absolutely simple terms, you just need to do less multiplication,” Lewis said. “A lot of our techniques in the architecture just focus on doing less tasks that require matrix multiplication.”

He added that the models deliver responses in milliseconds rather than seconds. This efficiency extends to edge devices, with successful deployments demonstrated on hardware as modest as a Raspberry Pi.

“I think a lot of enterprises are looking at TCO [total cost of ownership] for embedding AI in their application,” Hurn-Maloney added. ” So the ability to remove expensive GPUs from the equation, I think, is obviously helpful, too.”

Fastino’s models are not yet generally available. That said, the company is already working with industry leaders in consumer devices, financial services and e-commerce, including a major North American device manufacturer for home and automotive applications. 

“Our ability to run on-prem is really good for industries that are pretty sensitive about their data,” Hurn-Maloney explained. “The ability to run these models on-prem and on existing CPUs is quite enticing to financial services, healthcare and more data sensitive industries.”



Source link

About The Author