Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>The sliding window technique really does effectively cripple its recall to approximately just 8k tokens which is just plain insufficient for a lot of tasks.

What are you basing this observation on; personal experience, or is there a benchmark somewhere confirming it?



I found this discussion on the local llama subreddit that digs a little bit more into what effects the sliding window might have, in case you or anyone else reading this comment thread finds it interesting: https://old.reddit.com/r/LocalLLaMA/comments/17k2mwq/i_dont_....

It refers to the original Mistral 7B though not the new Mixtral fwiw


Real world testing and experience. If you only need the llm to retain the "gist" of the input tokens in order to return a "related" answer, the sliding window design is fine. But if you need actual technical analysis or tasks that involve verbatim referencing, quoting, recomposing, etc based off parts of the input documents, it doesn't work.

I tried using it for "business document" use cases but have ran into this with code as well; the latter might be a better explanation given where we're having this discussion. If you only need the llm to retain the general shape of your inputs so it can reuse them to influence the output, the sliding context is fine. But if you need it to actually reuse code verbatim from the input that you fed it (or to remember the api calls and their surrounding context verbatim to recall from a sample of just one that this api must be called before that api, when the prompt includes instructions to that effect) the "decomposition" of the input tokens with the sliding model is insufficient and the llm completely fails at the assigned task.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: