Everyone and I mean everyone I know doing AI / ML work values VRAM above all. Th...

saagarjha · on July 19, 2024

What they really value is bandwidth. More VRAM is just more bandwidth.

wing-_-nuts · on July 22, 2024

Well, to give an example, 32GB of vram would be vastly more preferable to 24GB of higher bandwidth vram. You really need to be able to put the entire LLM in memory for best results, because otherwise you're bottlenecking on the speed of transfer between regular old system ram and the gpu.

You'll also note that M1/2 macs with large amounts of system memory are good at inference because of the fact that the gpu has a very high speed interconnect between the soldiered on ram modules and the on die gpu. It's all about avoiding bottlenecks whereever possible.