Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Mistral AI is finally venturing into the multimodal arena. Today, the French AI startup taking on the likes of OpenAI and Anthropic released Pixtral 12B, its first ever multimodal model with both language and vision processing capabilities baked in.
While the model is not available on the public web at present, its source code can be downloaded from Hugging Face or GitHub to test on individual instances. The startup, once again, bucked the typical release trend for AI models by first dropping a torrent link to download the files for the new model.
However, Sophia Yang, the head of developer relations at the company, did note in an X post that the company will soon make the model available through its web chatbot, allowing potential developers to take it for a spin. It will also come on Mistral’s La Platforme, which provides API endpoints to use the company’s models.
What does Pixtral 12B bring to the table?
While the official details of the new model, including the data it was trained upon, remain under wraps, the core idea appears that Pixtral 12B will allow users to analyze images while combining text prompts with them. So, ideally, one would be able to upload an image or provide a link to one and ask questions about the subjects in the file.
The move is a first for Mistral, but it is important to note that multiple other models, including those from competitors like OpenAI and Anthropic, already have image-processing capabilities.
When an X user asked Yang what makes the Pixtral 12-billion parameter model unique, she said it will natively support an arbitrary number of images of arbitrary sizes.
As shared by initial testers on X, the 24GB model’s architecture appears to have 40 layers, 14,336 hidden dimension sizes and 32 attention heads for extensive computational processing.
On the vision front, it has a dedicated vision encoder with 1024×1024 image resolution support and 24 hidden layers for advanced image processing.
This, however, can change when the company makes it available via API.
Mistral is going all in to take on leading AI labs
With the launch of Pixtral 12B, Mistral will further democratize access to visual applications such as content and data analysis. Yes, the exact performance of the open model remains to be seen, but the work certainly builds on the aggressive approach the company has been taking in the AI domain.
Since its launch last year, Mistral has not only built a strong pipeline of models taking on leading AI labs like OpenAI but also partnered with industry giants such as Microsoft, AWS and Snowflake to expand the reach of its technology.
Just a few months ago, it raised $640 million at a valuation of $6B and followed it up with the launch of Mistral Large 2, a GPT-4 class model with advanced multilingual capabilities and improved performance across reasoning, code generation and mathematics.
It also has released a mixture-of-experts model Mixtral 8x22B, a 22B parameter open-weight coding model called Codestral, and a dedicated model for math-related reasoning and scientific discovery.
Source link