OK, for starters, ECC has to become standard. Then the rate of ECC errors has to...

XMPPwocky · on Nov 16, 2021

While ECC memory is probably important and probably better than nothing, if there's one thing we've learned about Rowhammer it's that the obvious mitigations that "should" stop or detect it often fail to a clever attacker.

Just the first thing that popped into my head, but: say you watch the ECC correctable error rate over time, and somehow (not so easy!) determine which process is causing those errors. You forcibly kill the process and log a message about it, and also terminate/notify processes potentially affected (say, send them a SIGBUS or something and unmap the pages containing the affected data).

I, a "clever" attacker, use this to leak out your secrets- I do my hammering juuust right so that, if some secret bit is 0, your hammering flips ECC bits, while if it's 1 your hammering doesn't affect things. Lovely little side-channel.

Universal memory encryption and authentication seems to be the only sure way out of the cycle of "attack, mitigation, attack the mitigation", and it's already starting to roll out.

tentacleuno · on Nov 16, 2021

> I, a "clever" attacker, use this to leak out your secrets- I do my hammering juuust right so that, if some secret bit is 0, your hammering flips ECC bits, while if it's 1 your hammering doesn't affect things. Lovely little side-channel.

Would ASLR make this harder? I assume it would because it'd be a lot harder to get to the correct memory location (?).

account42 · on Nov 16, 2021

ASLR adds nothing to attacks that rely on the physical location of pages since those area already allocated independently of the virtual address space.

rajnathani · on Nov 17, 2021

Noob question: With DDR5's option for in-chip ECC, if Animats' suggestion of monitoring ECC anomalies is implemented on a control unit on the DIMM module, then will that make the attack impossible?

pjerem · on Nov 16, 2021

When I was a CS student, a little more than a decade ago, we learnt what ECC was and how it worked. Interestingly, I remember that we were taught that ECC was and should be everywhere.

From physical storage to data transfer protocols, we were said you really wouldn’t bypass ECC anywhere if you wanted things to work reliably against the hard physical world. It was like a basic CS rule.

And still, there we are, still arguing that some reliability is needed into objects as critical as DRAM chips.

patentatt · on Nov 16, 2021

Maybe this is common knowledge, but can ECC be done in software? I don’t necessarily see why not, despite the common-sense wisdom that it has to be done in hardware. With dram so cheap these days, why not just go full RAID 1 (where ECC is analogous to a RAID-5 implementation), and store each critical-path memory byte twice to detect bit flips or three times to fix? Let’s see row hammer attacks flip the same bit in two or three places at the same time. Sure it would impact performance, but maybe for something like encryption or banking it’s an acceptable trade off.

carlmr · on Nov 16, 2021

In some safety related contexts you do have bitflip protection on the software level. E.g. you write every variable together with it's complement, to see if a random bit flip occurred.

It's quite a lot of work, but if you have a long running embedded system, or a lot of the same, e.g. in cars, the incidence of bit flips increases.

This is it course additional to other protection mechanisms.

datameta · on Nov 16, 2021

In enterprise server memory controller firmware we use soft and hard row repair algorithms.

iamthemalto · on Nov 16, 2021

I'm a CS student as well, curious what course taught you about ECC? Would be interested in such a course as well, the only course I can imagine teaching this is an information theory course (which my university doesn't offer).

arcticbull · on Nov 15, 2021

Luckily all DDR5 DIMMs will have on-chip ECC. My understanding is it's not a complete mitigation but does make exploitation harder.

eqvinox · on Nov 15, 2021

> My understanding is it's not a complete mitigation but does make exploitation harder.

It won't. It's designed to counter silicon limitations to increased density, i.e. it's made to correct the errors that result from packing cells beyond the limit of error-free operation.

The extra redundancy from on-chip ECC is intended to be "consumed" by the chip itself, and since this will allow optimizing chip manufacture to denser and cheaper, it's no question at all that it will get pushed to the very limit.

There's still "classic" ECC for DDR5. 8 bits mapped to 9, terminated at the CPU which can look at things. That's what I want, need, and will buy.

P.S.: Shame on Intel for still walling off desktop CPUs from ECC. https://ark.intel.com/content/www/us/en/ark/search/featurefi...

Dylan16807 · on Nov 15, 2021

> It won't. It's designed to counter silicon limitations to increased density, i.e. it's made to correct the errors that result from packing cells beyond the limit of error-free operation.

I'd love to see actual parameters for the error correction codes, but DDR5 could pretty easily be a lot more robust than DDR4.

When you have no error correction at all, you need ridiculously high reliability. Even if these new memory cells are have a much higher error rate, if they're designed to seamlessly handle a few bits in the same row flipping then the overall reliability could skyrocket.

Edit: Oh, there's a paper from micron talking about DDR5 only having single bit correction internally. That's not as useful as it could be against attacks...

> There's still "classic" ECC for DDR5. 8 bits mapped to 9

But Single Error Correction, Dual Error Detection is not enough to prevent attacks.

Also because DDR5 uses a smaller width you actually need to map 8 bits to 10.

eqvinox · on Nov 15, 2021

> I'd love to see actual parameters for the error correction codes

Well, the spec is $369, but a PDF with some early discussion (rev 0.1…) says:

DDR5 devices will implement internal Single Error Correction (SEC) ECC to improve the data integrity within the DRAM. The DRAM will use 128 data bits to compute the ECC code of 8 ECC Check Bits.

However, I have no idea whether this is what got specified in the end. 8 ECC bits on 128 data bits would be half the amount of added redundancy compared to "classical" ECC. Note also it's not SECDED, just SEC, confirming the Micron paper you mentioned.

> But Single Error Correction, Dual Error Detection is not enough to prevent attacks.

The OP suggested monitoring for a sudden increase in ECC events, which hopefully this would work for. It's not a perfect countermeasure, just a statistical one, but I'll take what I can get...

> Also because DDR5 uses a smaller width you actually need to map 8 bits to 10.

Now that kinda makes it better, doesn't it? :)

Dylan16807 · on Nov 16, 2021

> Note also it's not SECDED, just SEC, confirming the Micron paper you mentioned.

Which is a shame because it would only cost 1 more bit to do SECDED.

> Now that kinda makes it better, doesn't it? :)

Very slightly, but 32->40 isn't much better than 64->72.

SECDED requires 39. I don't know if the 40th bit is used for anything? It would certainly be possible to use the 16 extra bits per burst to add a second layer of SECDED. Or an extra layer of triple error detection, if I'm remembering right. That would be really good against rowhammer.

eqvinox · on Nov 16, 2021

> SECDED requires 39. I don't know if the 40th bit is used for anything?

FWIW this is completely up to the CPU. "Classical" ECC is completely transparent to the DIMM, it just remembers and returns what the CPU gave to it.

32 → 40 has already been around for quite some time (particularly when the DDR controller only has 64 data lines but you still want ECC you can sometimes do 32+8 at half performance & max capacity) but I don't see anything beyond SECDED in the datasheets... (looked through some Freescale/NXP PowerPC bits)

Dylan16807 · on Nov 16, 2021

> FWIW this is completely up to the CPU. "Classical" ECC is completely transparent to the DIMM, it just remembers and returns what the CPU gave to it.

True, but I guess I assumed normal ECC was standardized. Is it?

> 32 → 40 has already been around for quite some time (particularly when the DDR controller only has 64 data lines but you still want ECC

Is that keeping a burst length of 8 on DDR3/DDR4? One thing to consider is that a full memory transfer plus 16 extra bits is much more useful than a smaller transfer with 8 extra bits. That could easily be a tipping point from "not worth bothering" to "they really should do something here".

eqvinox · on Nov 16, 2021

> True, but I guess I assumed normal ECC was standardized. Is it?

This is outside my area of expertise, but I believe everyone just does the same thing without it being a standard. (Especially since even CPU vendors frequently just buy a done-and-tested DDR controller design.)

> Is that keeping a burst length of 8 on DDR3/DDR4?

All the "classical" ECC just uses additional data lines, and I assumed DDR5 would do the same thing. I hadn't checked until a minute ago, but yes, DDR5 (like previous versions) just extends the width of the data bus by the ECC bits.

Dylan16807 · on Nov 16, 2021

> All the "classical" ECC just uses additional data lines

I know that. But DDR5, at least under normal use, increases the number of bits sent across each data line in a single memory access. So now you have 16 spare bits across the entire thing, instead of just 8. This means that even if you had 32->40 ECC before, the value you could extract from the extra bits goes up a lot with DDR5.

eqvinox · on Nov 17, 2021

Oh, sorry, I misunderstood. I don't think any implementation does that, since it would require holding back the entire burst in a buffer in the DDR controller. Which would significantly impact performance through the added latency.

But honestly I don't know — and also this seems like if some vendor wanted to take (or not take) the performance hit for extra safety, they could well deviate from everyone else.

Dylan16807 · on Nov 17, 2021

To be 100% safe, it would add latency to part of the reads. Though not much compared to the normal speed of a memory access.

To be 99% safe you could send a signal a couple nanoseconds late that says "wait, this data is bad, you should abort".

It also might make writes a bit more awkward.

_0w8t · on Nov 16, 2021

In theory 8 bits on 128 is way better than 2 per 16 a it allows to use a better error correction code. This all due to Shannon theorem on noisy channels.

The problem is that this assumes that a probability of flipping individual bit is independent from others. But this may not be the case. And if so rowhummer is possible.

dogma1138 · on Nov 15, 2021

That’s not exactly correct whilst it does there mainly to allow for higher densities and frequencies it’s designed to prevent bit flips from happening on chip.

It’s not end to end ECC as in it doesn’t prevent flips that happen on the bus or in CPU cache but it does prevent single bit errors on DRAM.

eqvinox · on Nov 16, 2021

> it’s designed to prevent bit flips from happening on chip.

Bit flips that are getting more and more common as the RAM cells are getting tinier and tinier, the stored charges ever smaller and smaller, and thus susceptible to flipping…

arcticbull · on Nov 15, 2021

I think on-chip ECC would mitigate this problem just as well as off-chip ECC. Off-chip ECC is meant to catch errors during transmission (i.e. 72 bits transmitted for 64 bit words), not necessarily just the ones that occur internal to the package.

I agree it's meant to counter limitations due to increased density, but it should catch this to an extent also as this error is induced on-package right, not during transmission. Or am I mistaken?

monocasa · on Nov 16, 2021

I think the point the parent is making is that this ECC is already fixing errors. There's no redundancy because the redundancy is already consumed by the defective cells in the chip. Any additional flips such as through rowhammer have no extra redundancy to fall back on in the general case with DDR5's built-in ECC.

arcticbull · on Nov 16, 2021

Ah i see. I didn’t view the errors as additive. Thanks for the clarification!

snak · on Nov 15, 2021

Yes, the article mentions it:

> What if I have ECC-capable DIMMs?

> Previous work showed that due to the large number of bit flips in current DDR4 devices, ECC cannot provide complete protection against Rowhammer but makes exploitation harder.

hinkley · on Nov 15, 2021

It sounds to me like ECC isn't being included in the DDR5 spec due to magnanimity so much as because it doesn't function without it. That ECC has become 'load-bearing'.

Does that mean we need an extended ECC to deal with critical systems that require additional robustness?

Legion · on Nov 15, 2021

Who error checks the error checkers?

RedShift1 · on Nov 15, 2021

It's just a matter of time before someone finds a way to exploit the ECC part, calls it Hammerrow and brings us back to square one...

aaaaaaaaaaab · on Nov 15, 2021

Rowhamming would be a better pun, as DDR5 uses a Hamming code for error correction.

kevin_thibedeau · on Nov 16, 2021

> Luckily all DDR5 DIMMs will have on-chip ECC

Until someone starts building chips with fake ECC to boost profit.

guerrilla · on Nov 16, 2021

From the article:

> What if I have ECC-capable DIMMs? > Previous work[1] showed that due to the large number of bit flips in current DDR4 devices, ECC cannot provide complete protection against Rowhammer but makes exploitation harder.

1. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8835222

jl6 · on Nov 16, 2021

This attack is great news for ECC. Now someone like Apple could launch a consumer product with “High Security Memory” and the rest of the industry can follow.

butterknife · on Nov 16, 2021

They already tried with G5. May be they can try again?