Skip to content
GPT 5.4 is a big step for Codex On evaluating and understanding the frontier of agents, and why I still turn to Claude. I’m a little late to this model review, but that has given me more time to think about the axes that matter for agents. Traditional benchmarks reduce model performance to a single score of correctness – they always have because that was simple, easy to quickly use to gauge perfor...