Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are people running llama 3.1 405B on them?


I'm running 70B models (usually in q4 .. q5_k_m, but possible to q6) on my 96Gbyte Macbook Pro with M2-Max (12 cpu cores, 38 gpu cores). This also leaves me with plenty of ram for other purposes.

I'm currently using reflection:70b_q4 which does a very good job in my opinion. It generates with 5.5 tokens/s for the response, which is just about my reading speed.

edit: I usually dont run larger models (q6) because of the speed. I'd guess a 405B model would just be awfully slow.


Not going to work for training from scratch which is what the author is doing.


192GByte of RAM are not enough to train 405B models. Reflection 70B requires 140GByte of RAM in fp16, 405 would need ~810Gbyte of RAM.


Pretty sure he said he’s inferencing llama3 405 and training his own custom model from scratch. He didn’t say how big his custom model will be.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: