Does anyone know the feasibility of converting the ONNX model to CoreML for acce...

kiratp · on Aug 10, 2023

If you’re working with LLMs, just use this - https://github.com/ggerganov/llama.cpp

It has Metal support.

refulgentis · on Aug 10, 2023

That's sort of a non-sequitor, so does ONNX. Conversely, $X.cpp is great for local hobbyist stuff but not at all for deployment to iOS.

kiratp · on Aug 10, 2023

Nobody is deploying 3+GB models to iOS beyond some enthusiast “because you can” apps. Amazing tech but not feasible for any mainstream use yet.

hustwindmaple1 · on Aug 11, 2023

If you have played any large mobile games, then you would not be surprised to see apps downloading massive files during first open.

brucethemoose2 · on Aug 11, 2023

A small download + an in-app weights download (and a space requirement warning) is probably sane, right?

refulgentis · on Aug 12, 2023

I agree, we're too far down a chain of hypotheticals motivated by "but ONNX must be bad compared to $MODELX.cpp?"

Wouldn't make sense to deploy 4-bit quantization as a product either.

turnsout · on Aug 10, 2023

The size makes it tough for App Store deployment, but I could imagine using a local LLM on-device for an enterprise app.

mchiang · on Aug 10, 2023

refulgentis · on Aug 10, 2023

They still do. HN is way behind on ONNX and I'd go so far as to say it's the "Plastics."[1] of 2023.

refulgentis · on Aug 10, 2023

"Not even wrong" question, ONNX is a runtime that can use/uses CoreML.

turnsout · on Aug 10, 2023

I didn't realize that! I wonder how performant a small Llama would be on iOS.

brucethemoose2 · on Aug 10, 2023

MLC's Apache TVM implementation can also compile to Metal.

Not sure if they made an autotuning profile for it yet.