6 Comments
User's avatar
Liberty's avatar

No Opus 4.7?

Rohit Krishnan's avatar

Costs growing quadratically :-)

Liberty's avatar

😬

Shawn बलि K's avatar

very interesting, and great ideas

H. Floyd's avatar

BenchBench tests models against each other. The grounded test is models against instruments professionals actually use.

Anthropic matched Opus 4.7 against ChemDraw on NMR. It matched on prediction and beat on splitting patterns. That measures whether the daily tool becomes the model.