AMD's support is terrible, if you believe their documentation. ROCm/hip works ju...

NekkoDroid · on Jan 11, 2024

To add a bit more context: I remember reading somewhere (may have been in the Phoronix forum) from an official AMD engeneer that "supported" means validated and actively tested in their pipeline, while "unsupported" generally just means "we don't do any or minimal testing on these, they should work and we don't explicitly prevent them from working but we don't guarantee anything" (at least when it comes to same die/gen cards).

In the same post they also wrote that they are gonna look at integrating more of those "unsupported" cards into their suite.

Honestly, I hope they change their wording for this to something like "validated", "supported" and "unsupported" with actual explenations what each of these mean (fully tested, works theoretically, does not work even theoretically)

Edit: I actually found the post I was talking about https://www.phoronix.com/forums/forum/linux-graphics-x-org-d...

makomk · on Jan 11, 2024

ROCm is also incredibly fragile and buggy at the best of times, so anything not actively tested by them stands a good chance of not working. Hell, I remember that a while back people were having problems with machine learning code giving garbage results on one of the few consumer GPUs that was officially listed as supported and AMD eventually replying to the bug report and declaring that actually, no they weren't going to support it and they'd remove it from the list ratther than try and fix the issue. I think this was back when newer consumer GPUs were genuinely unsupported as in the code simply wasn't there too. Integrated GPUs have also alwayu had a lot of problems.

There's also questionable OS compatiblity. ROCm is Linux-based and has extremely limited and rather experimental Windows support. Their fancy new neural processing unit is Windows-only, tied in with a Microsoft framework, and they don't seem to have any kind of definite plan for supporting it elsewhere. So there's quite possibly no single OS where all the hardware that theoretically could be used for machine learning and AI in these chips actually has even vaguely functioning code that works on that OS to make it work that way.

jacoblambda · on Jan 12, 2024

> Their fancy new neural processing unit is Windows-only, tied in with a Microsoft framework, and they don't seem to have any kind of definite plan for supporting it elsewhere. So there's quite possibly no single OS where all the hardware that theoretically could be used for machine learning and AI in these chips actually has even vaguely functioning code that works on that OS to make it work that way.

What they did was embed their Vitis IP in their CPUs. Those IP modules have been available in ASIC accelerator cards from AMD or their Xilinx FPGA offerings for a while now and have Linux support.

So they are releasing these APUs (which are primarily focused at a windows audience) with Windows support via ONNX. And then once there are chips in circulation, AMD can get Linux Vitis support for these chips set up with members of the Linux community being able to review and validate those patchsets.

jiggawatts · on Jan 11, 2024

Tensorflow is open source on GitHub. What’s stopping AMD engineers from contributing with Pull Requests? A casual search through the open PRs shows nothing of interest being submitted by team red.

snvzz · on Jan 11, 2024

AMD is not at the edge of bankruptcy anymore.

They should be able to afford testers, doing both manual and automated testing on a wide range of cards.

ROCm's "supported hardware" list is small to the point of concerning.

More cards also means more testers outside of AMD itself.

wmf · on Jan 11, 2024

Does ROCm even work on XDNA? I don't think it does?

antx · on Jan 11, 2024

ah yes, the good old HSA_OVERRIDE_GFX_VERSION=10.3.0 switcheroo