Skip to content

day50-dev/aa-eval-email

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sends out an email whenever Artificial Analysis posts a new model to their evals page

Join the group then check out other DAY50 stuff.

Also, the script art-analysis.sh is a stand-alone.

Should be on as of 2026-06-17...

Example output (truncated):

...
Score Days    Size   Name
----- ------- ------ --------------------------------------
38.9  314     -      GPT-5 (medium)
39    126     large  GLM-5 (Non-reasoning)
39.4  211     -      Gemini 3 Pro Preview (low)
39.6  141     large  Kimi K2.5 (Reasoning)
39.8  54      large  DeepSeek V4 Flash (Reasoning, High Effort)
40.5  342     -      Grok 4
40.5  71      -      Grok 4.20 0309 v2 (Reasoning)
41    104     -      GPT-5.4 (Non-reasoning)
41    48      -      Grok 4.3 (high)
41.3  121     large  Qwen3.5 397B A17B (Reasoning)
41.4  91      -      MiMo-V2-Pro
41.9  91      large  MiniMax-M2.7
42.1  56      large  MiMo-V2.5
42.2  99      -      Grok 4.20 0309 (Reasoning)
42.6  182     -      Gemini 3 Flash Preview (Reasoning)
42.9  205     -      Claude Opus 4.5 (Non-reasoning)
42.9  76      -      Qwen3.6 Plus
43    120     -      Claude Sonnet 4.6 (Non-reasoning, Low Effort)
43    188     -      GPT-5.2 Codex (xhigh)
43.2  54      large  DeepSeek V4 Pro (Reasoning, High Effort)
43.4  16      large  MiniMax-M3
43.4  71      large  GLM-5.1 (Reasoning)
43.9  29      -      Gemini 3.5 Flash (medium)
43.9  92      -      GPT-5.4 nano (xhigh)
44.2  126     large  GLM-5 (Reasoning)
44.2  188     -      GPT-5.2 (medium)
44.7  216     -      GPT-5.1 (high)
44.9  58      -      Qwen3.6 Max Preview
45    29      -      Gemini 3.5 Flash (high)
45.1  43      -      GPT-5.5 Instant (May 2026)
45.5  56      large  MiMo-V2.5-Pro
45.6  104     -      GPT-5.4 (low)
45.6  5       large  Kimi K2.7 Code
46.4  120     -      Claude Sonnet 4.6 (Non-reasoning, High Effort)
46.5  16      -      Qwen3.7 Plus
46.5  211     -      Gemini 3 Pro Preview (high)
46.7  449     -      Gemini 2.5 Pro Preview (Mar 25)
47.1  29      -      Gemini 3.5 Flash (minimal)
47.1  58      large  Kimi K2.6
47.5  54      large  DeepSeek V4 Pro (Reasoning, Max Effort)
47.5  70      -      Muse Spark
47.6  132     -      Claude Opus 4.6 (Non-reasoning, High Effort)
47.8  205     -      Claude Opus 4.5 (Reasoning)
48.1  132     -      Claude Opus 4.6 (Adaptive Reasoning, Max Effort)
48.6  55      -      GPT-5.5 (Non-reasoning)
48.7  188     -      GPT-5.2 (xhigh)
50.1  29      -      Qwen3.7 Max
50.7  1       large  GLM-5.2 (max)
50.9  120     -      Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)
51.5  92      -      GPT-5.4 mini (xhigh)
52.1  55      -      GPT-5.5 (low)
...

About

Get an email whenever a new model eval hits Artificial Analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages