LDM/STM are awful for modern cores, and was even microcoded on at least early AR...

duskwuff · on Oct 15, 2020

LDM/STM were also the source of some really wacky hardware bugs on some STM32 microcontrollers, such as:

> If an interrupt occurs during an CPU AHB burst read access to an end of SDRAM row, it may result in wrong data read from the next row if all the conditions below are met:

> • The SDRAM data bus is 16-bit or 8-bit wide. 32-bit SDRAM mode is not affected.

> • RBURST bit is reset in the FMC_SDCR1 register (read FIFO disabled).

> • An interrupt occurs while CPU is performing an AHB incrementing bursts read access of unspecified length (using LDM = Load Multiple instruction).

> • The address of the burst operation includes the end of an SDRAM row.

(https://www.st.com/resource/en/errata_sheet/dm00068628-stm32..., section 2.3.5)

Admittedly, this was a silicon bug in ST's memory controller. But it was a bug which was only triggered by LDM/STM instructions!

monocasa · on Oct 15, 2020

FWIW, disabling the read fifo like that would be a really goofy choice, but yeah, very good point. These are very special cased instructions that are different even than DMA transfers you might expect them to look like.

duskwuff · on Oct 15, 2020

There's a reason for that. Another errata for the same part explains that:

> If an interrupt occurs during an CPU AHB burst read access to one SDRAM internal bank followed by a second read to another SDRAM internal bank, it may result in wrong data read if all the conditions below are met:

> • SDRAM read FIFO enabled. RBURST bit is set in the FMC_SDCR1 register

> • An interrupt occurs while CPU is performing an AHB incrementing bursts read access of unspecified length (using LDM = Load Multiple instruction) to one SDRAM internal bank and followed by another CPU read access to another SDRAM internal bank.

bee_rider · on Oct 15, 2020

Could it also be the case that this is rendered partially obsolete by vector instructions? Obviously vector loads/stores don't cover all these cases, but I have to imagine they cover quite a few, and without all the bookkeeping (who knew loading one big thing would be so much easier to keep track of than loading a handful of tiny things).

monocasa · on Oct 15, 2020

No, because you still want fairly dense dumps of registers out to cache for function prologues. So blits from the integer register file still show up in your profile traces, hence LDP/STP.