Cohere’s smallest, fastest R-series model excels at RAG, reasoning in 23 languages


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Proving its intention to support a wide range of enterprise use cases — including those that don’t require expensive, resource-intensive large language models (LLMs) — AI startup Cohere has released Command R7B, the smallest and fastest in its R model series. 

Command R7B is built to support fast prototyping and iteration and uses retrieval-augmented generation (RAG) to improve its accuracy. The model features a context length of 128K and supports 23 languages. It outperforms others in its class of open-weights models — Google’s Gemma, Meta’s Llama, Mistral’s Ministral — in tasks including math and coding, Cohere says.

“The model is designed for developers and businesses that need to optimize for the speed, cost-performance and compute resources of their use cases,” Cohere co-founder and CEO Aidan Gomez writes in a blog post announcing the new model.

Outperforming competitors in math, coding, RAG

Cohere has been strategically focused on enterprises and their unique use cases. The company introduced Command-R in March and the powerful Command R+ in April, and has made upgrades throughout the year to support speed and efficiency. It teased Command R7B as the “final” model in its R series, and says it will release model weights to the AI research community.

Cohere noted that a critical area of focus when developing Command R7B was to improve performance on math, reasoning, code and translation. The company appears to have succeeded in those areas, with the new smaller model topping the HuggingFace Open LLM Leaderboard against similarly-sized open-weight models including Gemma 2 9B, Ministral 8B and Llama 3.1 8B. 

Further, the smallest model in the R series outperforms competing models in areas including AI agents, tool use and RAG, which helps improve accuracy by grounding model outputs in external data. Cohere says Command R7B excels at conversational tasks including tech workplace and enterprise risk management (ERM) assistance; technical facts; media workplace and customer service support; HR FAQs; and summarization. Cohere also notes that the model is “exceptionally good” at retrieving and manipulating numerical information in financial settings.

All told, Command R7B ranked first, on average, in important benchmarks including instruction-following evaluation (IFeval); big bench hard (BBH); graduate-level Google-proof Q&A (GPQA); multi-step soft reasoning (MuSR); and massive multitask language understanding (MMLU). 

Removing unnecessary call functions

Command R7B can use tools including search engines, APIs and vector databases to expand its functionality. Cohere reports that the model’s tool use performs strongly against competitors in the Berkeley Function-Calling Leaderboard, which evaluates a model’s accuracy in function calling (connecting to external data and systems). 

Gomez points out that this proves its effectiveness in “real-world, diverse and dynamic environments” and removes the need for unnecessary call functions. This can make it a good choice for building “fast and capable” AI agents. For instance, Cohere points out, when functioning as an internet-augmented search agent, Command R7B can break complex questions down into subgoals, while also performing well with advanced reasoning and information retrieval.

Because it is small, Command R7B can be deployed on lower-end and consumer CPUs, GPUs and MacBooks, allowing for on-device inference. The model is available now on the Cohere platform and HuggingFace. Pricing is $0.0375 per 1 million input tokens and $0.15 per 1 million output tokens.

“It is an ideal choice for enterprises looking for a cost-efficient model grounded in their internal documents and data,” writes Gomez. 



Source link

About The Author