Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does anyone know the feasibility of converting the ONNX model to CoreML for accelerated inference on Apple devices?


If you’re working with LLMs, just use this - https://github.com/ggerganov/llama.cpp

It has Metal support.


That's sort of a non-sequitor, so does ONNX. Conversely, $X.cpp is great for local hobbyist stuff but not at all for deployment to iOS.


Nobody is deploying 3+GB models to iOS beyond some enthusiast “because you can” apps. Amazing tech but not feasible for any mainstream use yet.

Eg: https://apps.apple.com/app/id6444050820


If you have played any large mobile games, then you would not be surprised to see apps downloading massive files during first open.


A small download + an in-app weights download (and a space requirement warning) is probably sane, right?


I agree, we're too far down a chain of hypotheticals motivated by "but ONNX must be bad compared to $MODELX.cpp?"

Wouldn't make sense to deploy 4-bit quantization as a product either.


The size makes it tough for App Store deployment, but I could imagine using a local LLM on-device for an enterprise app.


They used to have this: https://github.com/onnx/onnx-coreml


They still do. HN is way behind on ONNX and I'd go so far as to say it's the "Plastics."[1] of 2023.

[1] https://www.youtube.com/watch?v=PSxihhBzCjk


"Not even wrong" question, ONNX is a runtime that can use/uses CoreML.


I didn't realize that! I wonder how performant a small Llama would be on iOS.


MLC's Apache TVM implementation can also compile to Metal.

Not sure if they made an autotuning profile for it yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: