Are people running llama 3.1 405B on them?

rspoerri · on Sept 8, 2024

I'm running 70B models (usually in q4 .. q5_k_m, but possible to q6) on my 96Gbyte Macbook Pro with M2-Max (12 cpu cores, 38 gpu cores). This also leaves me with plenty of ram for other purposes.

I'm currently using reflection:70b_q4 which does a very good job in my opinion. It generates with 5.5 tokens/s for the response, which is just about my reading speed.

edit: I usually dont run larger models (q6) because of the speed. I'd guess a 405B model would just be awfully slow.

throwthrowuknow · on Sept 8, 2024

Not going to work for training from scratch which is what the author is doing.

rspoerri · on Sept 9, 2024

192GByte of RAM are not enough to train 405B models. Reflection 70B requires 140GByte of RAM in fp16, 405 would need ~810Gbyte of RAM.

throwthrowuknow · on Sept 9, 2024

Pretty sure he said he’s inferencing llama3 405 and training his own custom model from scratch. He didn’t say how big his custom model will be.