HomeAI News
A magnet link sweeps the AI ​​​​circle, and 87GB seeds directly open source the 8x7B MoE model

A magnet link sweeps the AI ​​​​circle, and 87GB seeds directly open source the 8x7B MoE model

Hayo News
Hayo News
December 11th, 2023
View OriginalTranslated by Google

"High-end" open source often adopts the simplest release method.

Yesterday, Mistral AI launched a magnet link on the X platform and announced new open source actions.

There is no long official blog and no deliberately accelerated demo. This company can be regarded as a "clean stream" in the current field of large models.

When I opened it, I found that it was a seed of nearly 87 GB:

What is the parameter configuration? Many people gave up their weekends to download and run it as soon as possible.

It seems that Mistral 8x7B uses a very similar architecture to GPT-4, but a "shrunk version":

  • 8 total experts instead of 16 (halved)

  • 7B parameters per expert instead of 166B (24x reduction)

  • 42B total parameters (estimated) instead of 1.8T (42x reduction)

  • Same 32K context as original GPT-4

Within 24 hours after its release, a developer had already created an online experience website: https://replicate.com/nateraw/mixtral-8x7b-32kseqlen

Some researchers said: "The closed-source large model has come to an end."

Google, which has already caused ridicule from the crowd this week, has also been Cued again:

Mixing of Experts (MoE) is a technique commonly used in LLM to improve its efficiency and accuracy. This approach works by dividing complex tasks into smaller, more manageable subtasks, each handled by specialized mini-models or "experts."

Specifically, the "expert layer" is a smaller neural network that is trained to be highly skilled in a specific field. Each expert processes the same input, but in a manner consistent with its specific expertise; the "gated network" is the MoE architecture The decision maker can evaluate which expert is best suited for the given input data. The network calculates compatibility scores between the input and each expert, and then uses these scores to determine each expert's level of involvement in the task.

We all know that the OpenAI team has been tight-lipped about the parameter quantities and training details of GPT-4. Earlier, someone broke the news that GPT-4 uses an integrated system composed of 8 expert models. Later, there were rumors that ChatGPT was only a model with tens of billions of parameters (probably around 20 billion).

The rumors cannot be proven, but Mistral 8x7B may provide an open source option that is "very close to GPT-4." As can be seen from the model metadata, Mistral 8x7B only uses 2 experts for inference per token.

What’s even more interesting is that this is the third piece of content released by the company’s official account since its opening. Both important releases have no text descriptions, and there are no accompanying pictures:

The link posted at the end of September was Mistral 7B, which is still known as the "best 7B model", outperforming Llama-2 13B in every benchmark test, and superior in code, mathematics and inference. In LLaMA-1 34B.

Mistral AI, founded in May 2023, is a French artificial intelligence startup and one of the few star teams in the field of large model open source from Europe.

Mistral AI received a record $118 million in seed round financing in June, with only a 7-page PPT, which is said to be the largest seed round in European history.

Mistral AI team members

Arthur Mensch, one of the company's founders, told the Financial Times in October that Mistral AI's technology was more efficient and cheaper than that developed by some of its powerful competitors in the United States.

The superior technical strength has also brought the company continued attention from investors.

Recently, the Financial Times reported on Mistral AI’s new round of financing: the new round of financing is about 400 million euros, mainly composed of equity, and may be officially announced next week. Currently, the company's latest valuation is around 2 billion euros.

According to people familiar with the matter, the new round of financing was led by the famous Silicon Valley venture capital Andreessen Horowitz, and other participants include Nvidia, Salesforce, General Catalyst, BNP Paribas, etc. Other investors in Mistral AI include former Google CEO Eric Schmidt, French telecom billionaire Xavier Niel and French state-backed investment bank Bpifrance.

The report also mentioned that Arthur Mensch said that "while the company has not yet made any money, it expects that to change by the end of the year as it prepares a new platform for customers to access its artificial intelligence models."

Reference links:


Reprinted from 机器之心View Original