- Source: Mistral AI
Mistral AI is a French company specializing in artificial intelligence (AI) products, headquartered in Paris. Founded in April 2023 by former employees of Meta Platforms and Google DeepMind, it has quickly risen to prominence in the AI sector. The company is named after the mistral, a strong, cold, northwesterly wind that blows in southern France.
Mistral AI focuses on producing open source large language models, emphasizing the foundational importance of free and open-source software, and positioning itself as an alternative to proprietary models.
In October 2023, Mistral AI raised €385 million. By December 2023, it was valued at over $2 billion.
In June 2024, Mistral AI announced a new funding round of €600 million ($645 million), significantly boosting its valuation to €5.8 billion ($6.2 billion). This round was led by the venture capital firm General Catalyst, with participation from existing investors.
Mistral AI has published three open-source models available as weights. Additionally, three more models—Small, Medium, and Large—are available via API only.
Based on valuation, the company is in fourth place in the global AI race and in first place outside the San Francisco Bay Area, ahead of several of its peers, such as Cohere, Hugging Face, Inflection, Perplexity and Together. Mistral AI aims to "democratize" AI by focusing on open-source innovation.
History
Mistral AI was founded in April 2023 by three French AI researchers: Arthur Mensch, Guillaume Lample and Timothée Lacroix. Prior to founding Mistral AI, Mensch worked at Google DeepMind which is Google's artificial intelligence laboratory, while Lample and Lacroix worked at Meta Platforms. The three co-founders met while students at École polytechnique.
In June 2023, the start-up carried out a first fundraising of €105 million ($117 million) with investors including the American fund Lightspeed Venture Partners, Eric Schmidt, Xavier Niel and JCDecaux. The valuation is then estimated by the Financial Times at €240 million ($267 million).
On 27 September 2023, the company made its language processing model “Mistral 7B” available under the free Apache 2.0 license. This model has 7 billion parameters, a small size compared to its competitors.
On 10 December 2023, Mistral AI announced that it had raised €385 million ($428 million) as part of its second fundraising. This round of financing notably involves the Californian fund Andreessen Horowitz, BNP Paribas and the software publisher Salesforce.
On 11 December 2023, the company released the Mixtral 8x7B model with 46.7 billion parameters but using only 12.9 billion per token thanks to the mixture of experts architecture. The model masters 5 languages (French, Spanish, Italian, English and German) and outperforms, according to its developers' tests, the "LLama 2 70B" model from Meta. A version trained to follow instructions and called “Mixtral 8x7B Instruct” is also offered.
On 26 February 2024, Microsoft announced a new partnership with the company to expand its presence in the rapidly evolving artificial intelligence industry. Under the agreement, Mistral's rich language models will be available on Microsoft's Azure cloud, while the multilingual conversational assistant "Le Chat" will be launched in the style of ChatGPT.
On 10 April 2024, the company released the mixture of expert models, Mixtral 8x22B, offering high performance on various benchmarks compared to other open models.
On 16 April 2024, reporting revealed that Mistral was in talks to raise €500 million, a deal that would more than double its current valuation to at least €5 billion.
On November 19, 2024, the company announced significant updates for Le Chat. It added the ability to create images, in partnership with Black Forest Labs, utilizing the Flux Pro models. Additionally, it introduced the capability to search for information on the internet to provide reliable and up-to-date information. Furthermore, it launched the Canvas system, a collaborative interface where the AI generates code and the user can modify it. The company also introduced a new model, Pixtral Large, which is an improvement over Pixtral 12B, integrating a 1-billion-parameter visual encoder coupled with Mistral Large 2. This model has also been enhanced, particularly for long contexts and function calls.
The company had over 100 employees by late fall 2024.
Models
= Open Weight Models
=Mistral 7B
Mistral 7B is a 7.3B parameter language model using the transformers architecture. Officially released on September 27, 2023, via a BitTorrent magnet link, and Hugging Face. The model was released under the Apache 2.0 license. The release blog post claimed the model outperforms LLaMA 2 13B on all benchmarks tested, and is on par with LLaMA 34B on many benchmarks tested.
Mistral 7B uses grouped-query attention (GQA), which is a variant of the standard attention mechanism. Instead of computing attention over all the hidden states, it computes attention over groups of hidden states.
Both a base model and "instruct" model were released with the latter receiving additional tuning to follow chat-style prompts. The fine-tuned model is only intended for demonstration purposes, and does not have guardrails or moderation built-in.
Mixtral 8x7B
Much like Mistral's first model, Mixtral 8x7B was released via a BitTorrent link posted on Twitter on December 9, 2023, and later Hugging Face and a blog post were released two days later.
Unlike the previous Mistral model, Mixtral 8x7B uses a sparse mixture of experts architecture. The model has 8 distinct groups of "experts", giving the model a total of 46.7B usable parameters. Each single token can only use 12.9B parameters, therefore giving the speed and cost that a 12.9B parameter model would incur.
Mistral AI's testing shows the model beats both LLaMA 70B, and GPT-3.5 in most benchmarks.
In March 2024, research conducted by Patronus AI comparing performance of LLMs on a 100-question test with prompts to generate text from books protected under U.S. copyright law found that Open AI's GPT-4, Mixtral, Meta AI's LLaMA-2, and Anthropic's Claude2 generated copyrighted text verbatim in 44%, 22%, 10%, and 8% of responses respectively.
Mixtral 8x22B
Similar to Mistral's previous open models, Mixtral 8x22B was released via a BitTorrent link on Twitter on April 10, 2024, with a release on Hugging Face soon after. The model uses an architecture similar to that of Mistral 8x7B, but with each expert having 22 billion parameters instead of 7. In total, the model contains 141 billion parameters, as some parameters are shared among the experts.
Mistral Large 2
Mistral Large 2 was announced on July 24, 2024, and released on Hugging Face. Unlike the previous Mistral Large, this version was released with open weights. It is available for free with a Mistral Research Licence, and with a commercial licence for commercial purposes. Mistral AI claims that it is fluent in dozens of languages, including many programming languages. The model has 123 billion parameters and a context length of 128,000 tokens. Its performance in benchmarks is competitive with Llama 3.1 405B, particularly in programming-related tasks.
Codestral 22B
Codestral is Mistral's first code focused open weight model. Codestral was launched on 29 May 2024. It is a lightweight model specifically built for code generation tasks. As of its release date, this model surpasses Meta's Llama3 70B and DeepSeek Coder 33B (78.2% - 91.6%), another code-focused model on the HumanEval FIM benchmark. Mistral claims Codestral is fluent in more than 80 Programming languages Codestral has its own license which forbids the usage of Codestral for Commercial purposes.
Mathstral 7B
Mathstral 7B is a model with 7 billion parameters released by Mistral AI on July 16, 2024. It focuses on STEM subjects, achieving a score of 56.6% on the MATH benchmark and 63.47% on the MMLU benchmark. The model was produced in collaboration with Project Numina, and was released under the Apache 2.0 License. It has a context length of 32k tokens.
Codestral Mamba 7B
Codestral Mamba is based on the Mamba 2 architecture, which allows it to generate responses even with longer input. Unlike Codestral, it was released under the Apache 2.0 license. While previous releases often included both the base model and the instruct version, only the instruct version of Codestral Mamba was released.
= API-Only Models
=Unlike Mistral 7B, Mixtral 8x7B and Mixtral 8x22B, the following models are closed-source and only available through the Mistral API.
Mistral Large
Mistral Large was launched on February 26, 2024, and Mistral claims it is second in the world only to OpenAI's GPT-4.
It is fluent in English, French, Spanish, German, and Italian, with Mistral claiming understanding of both grammar and cultural context, and provides coding capabilities. As of early 2024, it is Mistral's flagship AI. It is also available on Microsoft Azure.
In July 2024, Mistral Large 2 was released, replacing the original Mistral Large. Unlike the original model, it was released with open weights.
Mistral Medium
Mistral Medium is trained in various languages including English, French, Italian, German, Spanish and code with a score of 8.6 on MT-Bench. It is ranked in performance above Claude and below GPT-4 on the LMSys ELO Arena benchmark.
The number of parameters, and architecture of Mistral Medium is not known as Mistral has not published public information about it.
Mistral Small
Like the Large model, Small was launched on February 26, 2024. It is intended to be a light-weight model for low latency, with better performance than Mixtral 8x7B.
References
External links
Official website
Mistral AI on Twitter
Mistral AI on YouTube
Kata Kunci Pencarian:
- Inuyashiki
- Mistral AI
- Large language model
- Claude (language model)
- Mistral
- OpenAI
- XAI (company)
- Open-source artificial intelligence
- Meta AI
- Artificial intelligence
- Runway (company)