A lot of time can be wasted with poor debounce implementations. Cherry switches claims <8ms bounce IIRC, but you only need to debounce one of the edges. So as soon as you see the key start bouncing, you know it is activated, so you can immediately register the press, then do debounce. That way you only delay the registration of the release. By using a shift register you can also minimize it to the actual bounce time, not a predetermined delay. I (personally) don't know of any commercial keyboards that implements it this way, but given that their software is closed source, it is very possible that some do use this algorithm.
Not only that, but the matrix scanning could be done via interrupts instead of via polling as is typically[0] done. An interrupt could fire on a level shift of any of the column GPIOs, then the MCU needs to search the rows to find the selected key. After the interrupt the MCU will need to poll until all the keys are released. So there's another source of latency that could be improved.
I think the only source of polling left is in USB, but I think that's inherent to the USB protocol (someone correct me if I'm wrong here). Without the USB polling I think it would be possible to have key-press-to-USB-packet completely interrupt driven which should make the latency in the keyboard itself negligible.
[0]: I say typically but like you said a lot of implementations are closed source, so who knows. All of the discussions I've seen on matrix scanning use the polling method, and the open source implementations use polling as well (e.g. Keyberon[1]).
It’s counter-intuitive (and you probably already know this but for the benefit of others) but depending on frequency and CPU availability, polling can have a lower latency than interrupts. You just have a much smaller window of time to execute your handling code (unless you don’t care).
It’s actually partially why (some?) hard real-time systems eschew interrupts altogether. They introduce a source of non-determinism into the mix as an interrupt handler can stall out non-interrupt performance-sensitive code (or starve it).
The few open source implementations I have looked at scan the matrix as fast as the processor allows. So using interrupts can reduce power consumption a lot, but depending on the actual matrix layout, probably not any worthwhile speedup in the nominal case, as you still have to scan the rows after the interrupt.
AFAIR, the USB poll rate is a function of USB rate, and what the endpoint reports/requests. Low-speed, full-speed and high-speed all have different poll rates.
That's what I thought when I set my handwired keyboard firmware settings using QMK, just trigger on the edge. But then occasionally switching the exhaust fan switch in the room next door would make errant keystrokes, not cool, so I decided I could live with a few ms more latency. Maybe a proper PCB like a bought keyboard could be more noise immune though.
It was probably the best you could do in your position, but debounce is a hack to work around EMI and not the actual solution. Ideally, the circuit should be electrically resilient to EMI and the debounce should only be handling mechanical noise from the switch itself and not concern itself with any outside interference.
I made an implementation of what I describe for TMK years ago. There is possibly a problem with the hardware design of your keyboard, probably too weak pull ups coupled with EMI susceptibility, of the hard-wiring. I am having no issues with false triggers on my end.