icon 字幕
正在加载字幕...

MiniMax M2.5 explained in 5min..

youtube 翻译 youtube 中文翻译 youtube 字幕 youtube 中文字幕 youtube 翻译成中文 youtube 视频翻译 youtube translate to chinese translate youtube to chinese youtube transcript to chinese translate youtube video to chinese

YouTube transcript, YouTube translate

32/32

A quick preview of the first subtitles so you know what the video covers.

Minimax M2.5 is the next iteration from the previous M2.1 model and the broad market implication of this model is quite large. The model is a contemporary to the state-of-the-art models like Anthropic Opus 4.6, OpenAI GBT 5.2 and Gemini 3 Pro. But what's really impressive is the fact that each token only activates 10 billion parameters out of the 230 billion parameters in total. Now, this kind of sparsity isn't anything new, per se, where the model only activates around 4% of its brain power to generate output. But when you compare Miniax M2.5 against other models like GLM5, Kimik 2.5, and Deepseek V3.2 Special, you start to see why this is such a big achievement. Now, you might wonder why this even matters. And here's why. In order to fit trillions of tokens of training data into the model, you need to pick the right size of the model. And Miniax chose 230 billion parameters in size for their M2 series. And now once this knowledge is packed into the model's weights, you also need to retrieve that information out of the models efficiently during inference. And in the case of Miniax M2 series, they chose to activate only 4% of the entire parameters at 10 billion parameters. Meaning every token that the model outputs only uses 4% of the model's entire weights through mixture of experts. Now, even at the 10 billion parameter sparity, they were able to achieve 80% in the Sweetbench verified benchmark, which is neck-to-neck with Enthropic's most recent model, Opus 4.6. And because the memory footprint is so low at 10 billion parameters, MiniAX is not only able to serve the model at 3% of the cost of Opus 4.6 in output tokens, but they also host the model nearly twice the speed at 100 tokens per second. Now looking at the progression going from M1 to M2, M2.1 to M2.5 at each progression, they were able to up the performance given their release cycle of around 50 days during each release. And they were able to do this while maintaining the cost at 30 per million input tokens and output token cost at $1.20 per million output tokens. Here's another way to look at why this is such a big deal. Ever since Moldbook and Open Claw became viral, one impression that left on people's minds was this idea called always on agent, where you can have an agent that's always listening, always doing some background tasks, and always ready to work on things. But one of the problems with this was when it comes to cost. As much as having an always on agent was extremely cool, the total cost of ownership was just too high for a lot of people. So, assuming you're running an agent 24/7 for an entire year that's constantly outputting tokens, just like leaving the tap on in the bathroom, the cost will look different depending on what model you choose.

设置

100%

翻译目标语言

🔊 音频播放
正在播放翻译音频