Researchers led by Suketu Patel tested several frontier transformer language models on the classic **Stroop task**, reporting a sharp, length-dependent collapse in accuracy as stimulus lists lengthened. Per the paper published in **PNAS Nexus**, models including `GPT-4o`, `GPT-5`, `Claude 3.5 Sonnet`, `Claude Opus 4.1`, and `Gemini 2.5` scored well on short, five-item incongruent lists but fell dramatically on longer lists. The PNAS Nexus results show an example trajectory where `GPT-4o` dropped from **91% accuracy** at five words to **57%** at ten words and **15%** at 40 words, and mixed matching/mismatched lists produced near-**0%** accuracy on the conflicting items, the paper reports. Reporting outlets (ScienceDaily, TechXplore, NeuroscienceNews, Heise) reproduced these findings and framed them as evidence of a structural divergence between human executive control and transformer attention. Researchers led by Suketu Patel tested several frontier transformer language models on the classic Stroop task, reporting a sharp, length-dependent collapse in accuracy as stimulus lists lengthened. Per the paper published in PNAS Nexus, models inclu... [1495 chars]