Any chance you could point me in the right direction on how to set something lik... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		a20eac1d on Aug 11, 2023 \| parent \| context \| favorite \| on: Llama 2 on ONNX runs locally Any chance you could point me in the right direction on how to set something like this up? Right now, I'm using pure CPU Llama but only the 17B version, based on I believe llama.cpp. How do I mix both CPU and GPU together for more performance?

brucethemoose2 on Aug 11, 2023 [–]

The easy way: download koboldcpp. Otherwise you have to compile llama.cpp (or kobold.cpp) with opencl or cuda support. There are instructions for this on the git page.

Then offload as many layers as you can to the gpu with the gpu layers flag. You will have to play with this and observe your gpu's vram.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact