There are “needle in the haystack” benchmarks for long context performance. It w...

throwuxiytayq · 2025-11-09T13:45:08 1762695908

These aren’t really indicative of real world performance. Retrieving a single fact is pretty much the simplest possible task for a long context model. Real world use cases require considering many facts at the same time while ignoring others, all the while avoiding the overall performance degradation that current models seem susceptible to when the context is sufficiently full.

d4rkp4ttern · 2025-11-10T11:47:07 1762775227

I agree, retrieving a single fact is necessary but not sufficient.