wasm2c (part of WABT) does this transpilation in a spec-conforming way; it passe...

kaba0 · on Dec 11, 2023

So how is it any different than just adding bound checkings for normal C code?

keithwinstein · on Dec 11, 2023

Well, a bunch of ways.

It's much faster to execute than adding a software bounds-check on every load. (Because the module declares its memories explicitly, it's very easy for a runtime to use a zero-cost strategy to enforce that memory loads/stores are all in-bounds.)

But Wasm's safety is more than bounds-checking memory loads/stores. E.g., Wasm indirect function calls are safe, including cross-module function calls for modules compiled separately, because there's a runtime type check (which wasm2c does very efficiently, but not zero-cost).

And, Wasm modules are provably isolated (their only access outside the module is via explicit imports). Whereas if you wanted that from "normal C code," it's a lot harder -- at some point you'll have to scan something (the source? the object file?) to enforce isolation and make sure it's not, e.g., jumping to an arbitrary address or making a random syscall. There's obviously a huge amount of good work on SFI but it's not easy to do either on "normal C code" or on arbitrary x86-64 machine code.

stefanha · on Dec 12, 2023

> it's very easy for a runtime to use a zero-cost strategy to enforce that memory loads/stores are all in-bounds

I believe your statement is only true for wasm32 on a 64-bit host where guard pages can be placed around the memory.

Has anyone come up with a zero-cost strategy for wasm64?

This is something that CPU vendors could help with. x86 used to have segment registers but the limit checks were removed in x86_64 so FS/GS cannot be used for this purpose anymore.

kaba0 · on Dec 11, 2023

If a c code can automatically be compiled to wasm that is compiled to a safer c and machine code, then the same c code can also be transformed the same way without that extra step to that output. It’s either that the original c code that can be compiled to wasm is a subset of all c codes (partially true), or that compilers trade off safety for performance (wasm has more rigid control flow, for example).

keithwinstein · on Dec 11, 2023

The goal usually isn't for one party to take a C program and transform it into "safe" machine code. The goal is for a possible adversary to take a C program and produce an IR, and then for somebody else (maybe you) to validate that IR and produce safe machine code. Wasm is a vastly better interchange format between distrustful parties than C would be!

(There are probably even better interchange formats coming on the horizon; Zachary Yedidia has some cutting-edge work on "lightweight fault isolation" that will be presented at the upcoming ASPLOS. Earlier talk here: https://youtu.be/AM5fdd6ULF0 . But outside of the research world, it's hard to beat Wasm for this.)

Less important: I don't think going through Wasm has to be viewed as an "extra step" -- every compiler uses an IR, and if you want that IR to easily admit a "safe" lowering (especially one that enforces safety across independently compiled translation units), it will probably look at least a little like Wasm, which is quite minimal in its design. Remember that Wasm evolved from things like PNaCl which is basically LLVM IR, and RLBox/Firefox considered a bunch of other SFI techniques before wasm2c.

zyedidia · on Dec 12, 2023

Thanks for the shout-out -- in case someone wants to check it out, the code for Lightweight Fault Isolation is available here: https://github.com/zyedidia/lfi.

ncruces · on Dec 11, 2023

30% is in the ballpark of what I'm expecting, actually.

SpaghettiCthulu · on Dec 12, 2023

is it possible to `mprotect` less than a page? If not, how does this bounds checking work?