The technical report says (page 7):

> Our architectural choices are closely aligned with principles observed in biological brains.

How? They point out three design choices: linear attention, MoE layers, and spike coding.

Apparently linear attention is brain-inspired because it can be viewed as a "simplified abstraction of dendritic dynamics with multi-branch morphology." Who knows what that means exactly [1]. They don't discuss it further. MoE layers apparently reflect "a principle of modular specialization." Fine, whatever.

Now, using a dozen attention variants + MoE is bog standard. The real novelty would be spike coding. Page 11 is dedicated to the different ways they could turn signals into spike trains, including such biologically-inspired mechanisms as using two's complement. However, they don't actually do spike coding in a time domain. In their implementation, "spike coding" apparently means to turn activations into integers. Section 3.3.3 claims that this lets us simulate an underlying spiking neural network, so we can validate the spiking approach without using special hardware. But if your SNN can be simulated faithfully on a GPU by turning things into integers, isn't that a bit of a depressing SNN?

Either I'm missing something, or this is just just dressing standard techniques with loads of meaningless jargon. Of course that’s a very popular way to operate in deep learning nowadays.

[1] Like, attention can draw from multiple tokens, sort of like how different spines of a dendrite can draw from multiple axons? Can’t make this stuff up.