Sends out an email whenever Artificial Analysis posts a new model to their evals page
Join the group then check out other DAY50 stuff.
Also, the script art-analysis.sh is a stand-alone.
Should be on as of 2026-06-17...
Example output (truncated):
...
Score Days Size Name
----- ------- ------ --------------------------------------
38.9 314 - GPT-5 (medium)
39 126 large GLM-5 (Non-reasoning)
39.4 211 - Gemini 3 Pro Preview (low)
39.6 141 large Kimi K2.5 (Reasoning)
39.8 54 large DeepSeek V4 Flash (Reasoning, High Effort)
40.5 342 - Grok 4
40.5 71 - Grok 4.20 0309 v2 (Reasoning)
41 104 - GPT-5.4 (Non-reasoning)
41 48 - Grok 4.3 (high)
41.3 121 large Qwen3.5 397B A17B (Reasoning)
41.4 91 - MiMo-V2-Pro
41.9 91 large MiniMax-M2.7
42.1 56 large MiMo-V2.5
42.2 99 - Grok 4.20 0309 (Reasoning)
42.6 182 - Gemini 3 Flash Preview (Reasoning)
42.9 205 - Claude Opus 4.5 (Non-reasoning)
42.9 76 - Qwen3.6 Plus
43 120 - Claude Sonnet 4.6 (Non-reasoning, Low Effort)
43 188 - GPT-5.2 Codex (xhigh)
43.2 54 large DeepSeek V4 Pro (Reasoning, High Effort)
43.4 16 large MiniMax-M3
43.4 71 large GLM-5.1 (Reasoning)
43.9 29 - Gemini 3.5 Flash (medium)
43.9 92 - GPT-5.4 nano (xhigh)
44.2 126 large GLM-5 (Reasoning)
44.2 188 - GPT-5.2 (medium)
44.7 216 - GPT-5.1 (high)
44.9 58 - Qwen3.6 Max Preview
45 29 - Gemini 3.5 Flash (high)
45.1 43 - GPT-5.5 Instant (May 2026)
45.5 56 large MiMo-V2.5-Pro
45.6 104 - GPT-5.4 (low)
45.6 5 large Kimi K2.7 Code
46.4 120 - Claude Sonnet 4.6 (Non-reasoning, High Effort)
46.5 16 - Qwen3.7 Plus
46.5 211 - Gemini 3 Pro Preview (high)
46.7 449 - Gemini 2.5 Pro Preview (Mar 25)
47.1 29 - Gemini 3.5 Flash (minimal)
47.1 58 large Kimi K2.6
47.5 54 large DeepSeek V4 Pro (Reasoning, Max Effort)
47.5 70 - Muse Spark
47.6 132 - Claude Opus 4.6 (Non-reasoning, High Effort)
47.8 205 - Claude Opus 4.5 (Reasoning)
48.1 132 - Claude Opus 4.6 (Adaptive Reasoning, Max Effort)
48.6 55 - GPT-5.5 (Non-reasoning)
48.7 188 - GPT-5.2 (xhigh)
50.1 29 - Qwen3.7 Max
50.7 1 large GLM-5.2 (max)
50.9 120 - Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)
51.5 92 - GPT-5.4 mini (xhigh)
52.1 55 - GPT-5.5 (low)
...