Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

wasm2c (part of WABT) does this transpilation in a spec-conforming way; it passes all* the WebAssembly tests and enforces the memory-safety and determinism requirements and the rest of the spec. The memory bounds-checking itself doesn't have a runtime performance impact because it's all done with mprotect() and a segfault handler. (There are some other differences between w2c2 and wasm2c that also have to do with spec-conformance and safety; e.g., enforcing type-safety of indirect function calls. This costs <4 cycles but it's not zero.)

Re: bounds checks, the thing that consumes cycles isn't the bounds check itself, it's Wasm's requirement that OOB accesses produce a deterministic trap, even if the result of an OOB load is never observed and could be optimized out. wasm2c has to prevent the compiler from optimizing out an unobserved OOB load, and that forced liveness defeats some compiler optimizations (probably more than it needs to). But even with all that, we're talking like a <30% slowdown compared with native compilation across the SPECcpu benchmarks.

If you want to transpile arbitrary Wasm to native code in a spec-conforming way, you're probably better-off using wasm2c (which, disclosure, I work on). If you trust the Wasm module, or you're good with the isolation you get from your operating system and don't need Wasm's determinism, w2c2 seems great. Both of these are far less battle-hardened than V8 or wasmtime, especially when you include the fact that now you need an optimizing C compiler in the TCB.

---

* The Wasm testsuite repo has recently merged in the "v4" version of the exception-handling proposal, and WABT is still on "v3". But it does pass all the core tests (including tail calls) at least until GC is merged.



So how is it any different than just adding bound checkings for normal C code?


Well, a bunch of ways.

It's much faster to execute than adding a software bounds-check on every load. (Because the module declares its memories explicitly, it's very easy for a runtime to use a zero-cost strategy to enforce that memory loads/stores are all in-bounds.)

But Wasm's safety is more than bounds-checking memory loads/stores. E.g., Wasm indirect function calls are safe, including cross-module function calls for modules compiled separately, because there's a runtime type check (which wasm2c does very efficiently, but not zero-cost).

And, Wasm modules are provably isolated (their only access outside the module is via explicit imports). Whereas if you wanted that from "normal C code," it's a lot harder -- at some point you'll have to scan something (the source? the object file?) to enforce isolation and make sure it's not, e.g., jumping to an arbitrary address or making a random syscall. There's obviously a huge amount of good work on SFI but it's not easy to do either on "normal C code" or on arbitrary x86-64 machine code.


> it's very easy for a runtime to use a zero-cost strategy to enforce that memory loads/stores are all in-bounds

I believe your statement is only true for wasm32 on a 64-bit host where guard pages can be placed around the memory.

Has anyone come up with a zero-cost strategy for wasm64?

This is something that CPU vendors could help with. x86 used to have segment registers but the limit checks were removed in x86_64 so FS/GS cannot be used for this purpose anymore.


If a c code can automatically be compiled to wasm that is compiled to a safer c and machine code, then the same c code can also be transformed the same way without that extra step to that output. It’s either that the original c code that can be compiled to wasm is a subset of all c codes (partially true), or that compilers trade off safety for performance (wasm has more rigid control flow, for example).


The goal usually isn't for one party to take a C program and transform it into "safe" machine code. The goal is for a possible adversary to take a C program and produce an IR, and then for somebody else (maybe you) to validate that IR and produce safe machine code. Wasm is a vastly better interchange format between distrustful parties than C would be!

(There are probably even better interchange formats coming on the horizon; Zachary Yedidia has some cutting-edge work on "lightweight fault isolation" that will be presented at the upcoming ASPLOS. Earlier talk here: https://youtu.be/AM5fdd6ULF0 . But outside of the research world, it's hard to beat Wasm for this.)

Less important: I don't think going through Wasm has to be viewed as an "extra step" -- every compiler uses an IR, and if you want that IR to easily admit a "safe" lowering (especially one that enforces safety across independently compiled translation units), it will probably look at least a little like Wasm, which is quite minimal in its design. Remember that Wasm evolved from things like PNaCl which is basically LLVM IR, and RLBox/Firefox considered a bunch of other SFI techniques before wasm2c.


Thanks for the shout-out -- in case someone wants to check it out, the code for Lightweight Fault Isolation is available here: https://github.com/zyedidia/lfi.


30% is in the ballpark of what I'm expecting, actually.


is it possible to `mprotect` less than a page? If not, how does this bounds checking work?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: