Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

OK, for starters, ECC has to become standard.

Then the rate of ECC errors has to be monitored. If something is trying a rowhammer attack, it's going to cause unwanted bit flips which the ECC will correct. Normally, the ECC error rate is very low. Under attack, it should go up. So an attack should be noticeable. You might get some false alarms, but that just means it's time to replace memory.



While ECC memory is probably important and probably better than nothing, if there's one thing we've learned about Rowhammer it's that the obvious mitigations that "should" stop or detect it often fail to a clever attacker.

Just the first thing that popped into my head, but: say you watch the ECC correctable error rate over time, and somehow (not so easy!) determine which process is causing those errors. You forcibly kill the process and log a message about it, and also terminate/notify processes potentially affected (say, send them a SIGBUS or something and unmap the pages containing the affected data).

I, a "clever" attacker, use this to leak out your secrets- I do my hammering juuust right so that, if some secret bit is 0, your hammering flips ECC bits, while if it's 1 your hammering doesn't affect things. Lovely little side-channel.

Universal memory encryption and authentication seems to be the only sure way out of the cycle of "attack, mitigation, attack the mitigation", and it's already starting to roll out.


> I, a "clever" attacker, use this to leak out your secrets- I do my hammering juuust right so that, if some secret bit is 0, your hammering flips ECC bits, while if it's 1 your hammering doesn't affect things. Lovely little side-channel.

Would ASLR make this harder? I assume it would because it'd be a lot harder to get to the correct memory location (?).


ASLR adds nothing to attacks that rely on the physical location of pages since those area already allocated independently of the virtual address space.


Noob question: With DDR5's option for in-chip ECC, if Animats' suggestion of monitoring ECC anomalies is implemented on a control unit on the DIMM module, then will that make the attack impossible?


When I was a CS student, a little more than a decade ago, we learnt what ECC was and how it worked. Interestingly, I remember that we were taught that ECC was and should be everywhere.

From physical storage to data transfer protocols, we were said you really wouldn’t bypass ECC anywhere if you wanted things to work reliably against the hard physical world. It was like a basic CS rule.

And still, there we are, still arguing that some reliability is needed into objects as critical as DRAM chips.


Maybe this is common knowledge, but can ECC be done in software? I don’t necessarily see why not, despite the common-sense wisdom that it has to be done in hardware. With dram so cheap these days, why not just go full RAID 1 (where ECC is analogous to a RAID-5 implementation), and store each critical-path memory byte twice to detect bit flips or three times to fix? Let’s see row hammer attacks flip the same bit in two or three places at the same time. Sure it would impact performance, but maybe for something like encryption or banking it’s an acceptable trade off.


In some safety related contexts you do have bitflip protection on the software level. E.g. you write every variable together with it's complement, to see if a random bit flip occurred.

It's quite a lot of work, but if you have a long running embedded system, or a lot of the same, e.g. in cars, the incidence of bit flips increases.

This is it course additional to other protection mechanisms.


In enterprise server memory controller firmware we use soft and hard row repair algorithms.


I'm a CS student as well, curious what course taught you about ECC? Would be interested in such a course as well, the only course I can imagine teaching this is an information theory course (which my university doesn't offer).


Luckily all DDR5 DIMMs will have on-chip ECC. My understanding is it's not a complete mitigation but does make exploitation harder.


> My understanding is it's not a complete mitigation but does make exploitation harder.

It won't. It's designed to counter silicon limitations to increased density, i.e. it's made to correct the errors that result from packing cells beyond the limit of error-free operation.

The extra redundancy from on-chip ECC is intended to be "consumed" by the chip itself, and since this will allow optimizing chip manufacture to denser and cheaper, it's no question at all that it will get pushed to the very limit.

There's still "classic" ECC for DDR5. 8 bits mapped to 9, terminated at the CPU which can look at things. That's what I want, need, and will buy.

P.S.: Shame on Intel for still walling off desktop CPUs from ECC. https://ark.intel.com/content/www/us/en/ark/search/featurefi...


> It won't. It's designed to counter silicon limitations to increased density, i.e. it's made to correct the errors that result from packing cells beyond the limit of error-free operation.

I'd love to see actual parameters for the error correction codes, but DDR5 could pretty easily be a lot more robust than DDR4.

When you have no error correction at all, you need ridiculously high reliability. Even if these new memory cells are have a much higher error rate, if they're designed to seamlessly handle a few bits in the same row flipping then the overall reliability could skyrocket.

Edit: Oh, there's a paper from micron talking about DDR5 only having single bit correction internally. That's not as useful as it could be against attacks...

> There's still "classic" ECC for DDR5. 8 bits mapped to 9

But Single Error Correction, Dual Error Detection is not enough to prevent attacks.

Also because DDR5 uses a smaller width you actually need to map 8 bits to 10.


> I'd love to see actual parameters for the error correction codes

Well, the spec is $369, but a PDF with some early discussion (rev 0.1…) says:

DDR5 devices will implement internal Single Error Correction (SEC) ECC to improve the data integrity within the DRAM. The DRAM will use 128 data bits to compute the ECC code of 8 ECC Check Bits.

However, I have no idea whether this is what got specified in the end. 8 ECC bits on 128 data bits would be half the amount of added redundancy compared to "classical" ECC. Note also it's not SECDED, just SEC, confirming the Micron paper you mentioned.

> But Single Error Correction, Dual Error Detection is not enough to prevent attacks.

The OP suggested monitoring for a sudden increase in ECC events, which hopefully this would work for. It's not a perfect countermeasure, just a statistical one, but I'll take what I can get...

> Also because DDR5 uses a smaller width you actually need to map 8 bits to 10.

Now that kinda makes it better, doesn't it? :)


> Note also it's not SECDED, just SEC, confirming the Micron paper you mentioned.

Which is a shame because it would only cost 1 more bit to do SECDED.

> Now that kinda makes it better, doesn't it? :)

Very slightly, but 32->40 isn't much better than 64->72.

SECDED requires 39. I don't know if the 40th bit is used for anything? It would certainly be possible to use the 16 extra bits per burst to add a second layer of SECDED. Or an extra layer of triple error detection, if I'm remembering right. That would be really good against rowhammer.


> SECDED requires 39. I don't know if the 40th bit is used for anything?

FWIW this is completely up to the CPU. "Classical" ECC is completely transparent to the DIMM, it just remembers and returns what the CPU gave to it.

32 → 40 has already been around for quite some time (particularly when the DDR controller only has 64 data lines but you still want ECC you can sometimes do 32+8 at half performance & max capacity) but I don't see anything beyond SECDED in the datasheets... (looked through some Freescale/NXP PowerPC bits)


> FWIW this is completely up to the CPU. "Classical" ECC is completely transparent to the DIMM, it just remembers and returns what the CPU gave to it.

True, but I guess I assumed normal ECC was standardized. Is it?

> 32 → 40 has already been around for quite some time (particularly when the DDR controller only has 64 data lines but you still want ECC

Is that keeping a burst length of 8 on DDR3/DDR4? One thing to consider is that a full memory transfer plus 16 extra bits is much more useful than a smaller transfer with 8 extra bits. That could easily be a tipping point from "not worth bothering" to "they really should do something here".


> True, but I guess I assumed normal ECC was standardized. Is it?

This is outside my area of expertise, but I believe everyone just does the same thing without it being a standard. (Especially since even CPU vendors frequently just buy a done-and-tested DDR controller design.)

> Is that keeping a burst length of 8 on DDR3/DDR4?

All the "classical" ECC just uses additional data lines, and I assumed DDR5 would do the same thing. I hadn't checked until a minute ago, but yes, DDR5 (like previous versions) just extends the width of the data bus by the ECC bits.


> All the "classical" ECC just uses additional data lines

I know that. But DDR5, at least under normal use, increases the number of bits sent across each data line in a single memory access. So now you have 16 spare bits across the entire thing, instead of just 8. This means that even if you had 32->40 ECC before, the value you could extract from the extra bits goes up a lot with DDR5.


Oh, sorry, I misunderstood. I don't think any implementation does that, since it would require holding back the entire burst in a buffer in the DDR controller. Which would significantly impact performance through the added latency.

But honestly I don't know — and also this seems like if some vendor wanted to take (or not take) the performance hit for extra safety, they could well deviate from everyone else.


To be 100% safe, it would add latency to part of the reads. Though not much compared to the normal speed of a memory access.

To be 99% safe you could send a signal a couple nanoseconds late that says "wait, this data is bad, you should abort".

It also might make writes a bit more awkward.


In theory 8 bits on 128 is way better than 2 per 16 a it allows to use a better error correction code. This all due to Shannon theorem on noisy channels.

The problem is that this assumes that a probability of flipping individual bit is independent from others. But this may not be the case. And if so rowhummer is possible.


That’s not exactly correct whilst it does there mainly to allow for higher densities and frequencies it’s designed to prevent bit flips from happening on chip.

It’s not end to end ECC as in it doesn’t prevent flips that happen on the bus or in CPU cache but it does prevent single bit errors on DRAM.


> it’s designed to prevent bit flips from happening on chip.

Bit flips that are getting more and more common as the RAM cells are getting tinier and tinier, the stored charges ever smaller and smaller, and thus susceptible to flipping…


I think on-chip ECC would mitigate this problem just as well as off-chip ECC. Off-chip ECC is meant to catch errors during transmission (i.e. 72 bits transmitted for 64 bit words), not necessarily just the ones that occur internal to the package.

I agree it's meant to counter limitations due to increased density, but it should catch this to an extent also as this error is induced on-package right, not during transmission. Or am I mistaken?


I think the point the parent is making is that this ECC is already fixing errors. There's no redundancy because the redundancy is already consumed by the defective cells in the chip. Any additional flips such as through rowhammer have no extra redundancy to fall back on in the general case with DDR5's built-in ECC.


Ah i see. I didn’t view the errors as additive. Thanks for the clarification!


Yes, the article mentions it:

> What if I have ECC-capable DIMMs?

> Previous work showed that due to the large number of bit flips in current DDR4 devices, ECC cannot provide complete protection against Rowhammer but makes exploitation harder.


It sounds to me like ECC isn't being included in the DDR5 spec due to magnanimity so much as because it doesn't function without it. That ECC has become 'load-bearing'.

Does that mean we need an extended ECC to deal with critical systems that require additional robustness?


Who error checks the error checkers?


It's just a matter of time before someone finds a way to exploit the ECC part, calls it Hammerrow and brings us back to square one...


Rowhamming would be a better pun, as DDR5 uses a Hamming code for error correction.


> Luckily all DDR5 DIMMs will have on-chip ECC

Until someone starts building chips with fake ECC to boost profit.


From the article:

> What if I have ECC-capable DIMMs? > Previous work[1] showed that due to the large number of bit flips in current DDR4 devices, ECC cannot provide complete protection against Rowhammer but makes exploitation harder.

1. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8835222


This attack is great news for ECC. Now someone like Apple could launch a consumer product with “High Security Memory” and the rest of the industry can follow.


They already tried with G5. May be they can try again?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: