How does ChaCha20 compare to the established AES standard? Is it stronger? weake...

tptacek · on Jan 10, 2017

Chacha/Salsa is:

* Intrinsically simpler than AES

* Easier to implement

* As an ARX design, doesn't need S-boxes, and so doesn't leave a cache footprint

* Has free key setup

AES is:

* A global standard

* Available in hardware on most platforms (extremely important)

* A conventional block cipher for which a bunch of modes (in particular: wide-block and AEAD) are already defined

But unlike Salsa, AES:

* Has relatively complicated key schedule (you have to expand its key input to a series of per-round keys, which imposes a cost when you switch keys)

* Relies on S-boxes for security and so must carefully avoid microarchitectural side channels

* Is much harder to implement

* Is not a native stream cipher, so requires an adapter (usually: GCM mode) to use safely.

AES is usually faster on modern systems because it's implemented directly in silicon. Salsa is usually the fastest pure-software option. Both are so fast that the speed difference is not particularly important, but most systems will prefer AES when hardware support is present.

Salsa is almost certainly the better choice for new designs just because of its simplicity. It's harder to screw up Salsa20 or its derivatives than it is to screw up AES (it is very easy to screw up AES), and its performance is more than satisfactory.

nullc · on Jan 11, 2017

> Both are so fast that the speed difference is not particularly important,

Without hardware support timing attack resistant AES is not so fast.

(and then there is the adventure of many motherboards shipping with hardware AES disabled in the bios...)

tptacek · on Jan 11, 2017

Without hardware support AES isn't competitive with Salsa20 anyways.

the8472 · on Jan 10, 2017

> * A global standard

ChaCha is standard enough to make it into TLS and IPSec

stouset · on Jan 10, 2017

Even then, they actually used a tweaked version of ChaCha20 that uses a 96-bit nonce (just barely large enough to be suitable for randomly-generated nonces) and a 32-bit counter (limiting its use to 128GiB for a given nonce). Also, an extension XChaCha20 was recently published which performs an extra 20 rounds to initialize the cipher state, allowing for 192-bit nonces with no corresponding reduction in counter size.

So now there's three variants of ChaCha20

  * ChaCha20 (256-bit key, 64-bit nonce, 64-bit counter)
  * IETF ChaCha20 (256-bit key, 96-bit nonce, 32-bit counter)
  * XChaCha20 (256-bit key, 192-bit nonce, 64-bit counter)

loup-vaillant · on Jan 11, 2017

> an extension XChaCha20 was recently published

It has? With test vectors and all? I want that, do you have a link?

stouset · on Jan 11, 2017

I could have sworn I saw a paper on this recently. I may have hallucinated it.

Edit: Shit, considering it further, what I was remembering was the recent paper on BLAKE2X, not XChaCha20.

tptacek · on Jan 10, 2017

Yes, but that's true of all sorts of things that aren't really global standards. Don't get me wrong: you should use Salsa ciphers. I'm just trying to provide the most honest possible accounting.

wolf550e · on Jan 10, 2017

AES is hard to implement on a general purpose computer in a way that is both fast and doesn't leak through cache timing attacks.

The safe way to use AES is by using a hardware implementation, like modern x86 and some ARM CPUs.

The best software implementations use bitslicing and SSE, but are still slow. The best I saw is an Emilia Kasper and Peter Schwabe paper[1] from 2009 on bitsliced AES-GCM has 21.99 cycles/byte performance for constant-time implementation authenticated AES-GCM.

For comparison, Intel shows[2] 0.77 cycles/byte for same with a hardware implementation, albeit on a newer CPU.

Chacha is fast on modern general purpose CPUs without the need for a hardware implementation of chacha. One reason it's fast is that it was designed so that a normal compiler can generate machine code from regular-looking C code in such a way that it uses vector (wide) registers and uses independent operations to use as many operations in the CPU in parallel in the same clock cycle, without requiring an assembly wizard to do that. Intel can afford assembly wizards (i.e. Shay Gueron), other people can't.

Modern TLS stacks prefer AES when running on a CPU that has AES hardware and fallback to chacha otherwise. They of course fallback to either a slow or an insecure implementation of AES if the other side doesn't support chacha.

1 - https://eprint.iacr.org/2009/129

2 - https://software.intel.com/en-us/articles/improving-openssl-...

floodyberry- · on Jan 10, 2017

Basic Chacha C implementations do not get auto-vectorized down to ultra-efficient code. The most efficient implementations are intrinsic/assembler that process 4 (SSE2/AVX/NEON) or 8 (AVX2) Chacha20 blocks at once. This is due to layout of variables and operations being designed for efficient SIMD use and the blocks being independent of each other. (Shay's Chacha20 implementation is also not the fastest!)

rstuart4133 · on Jan 12, 2017

Basic GNU C implementations don't get auto-vectorised full stop. But with a little bit of effort Chacha20 can be made to vectorise. The implementation in here is vectorised by GNU C:

  https://sourceforge.net/p/pbkdf2/code/ci/default/tree/pbkdf2.c

If "ultra-efficient code" means what could be produced by a programmer highly skilled in some amd64 implementation (intel core2, amd bulldozer, ...) for that implementation then yes I doubt GNU C produces it. But the odds are GNU C's output runs faster than that's guru's code on other amd64 implementations.

floodyberry- · on Jan 12, 2017

That Salsa implementation is not being vectorized? Salsa also requires some values to be shuffled around to actually work in SSE registers, djb made a bit of a boo-boo when designing it. Chacha fixes that, so its SIMD implementations are a bit more straightforward.

koverstreet · on Jan 11, 2017

But ChaCha is simple enough that even implementing it in assembly with AVX or whatever isn't all that hard.

JoachimS · on Jan 10, 2017

If you don't have HW-acceleration, for example AES-NI instructions in x86-64, ChaCha will normally be quite a lot faster. Esp on 32-bit and 64-bit architectures.

Being a stream cipher you can also precompute the keystream. This reduces encrypt/decrypt to a simple XOR when handling the message - depending on message length of course. And yes, AES-CTR can also be used like this.

cwmma · on Jan 10, 2017

It's main claimed benefits are that it is resistant to side channel timing attacks, simpler (easier to implement correctly) and faster.

all things being equal, in practice AES is often implemented in hardware which gives it an advantage.

TillE · on Jan 10, 2017

AES is a block cipher, Salsa/ChaCha are streams.

This makes them very useful for, say, file encryption with random access.

loup-vaillant · on Jan 10, 2017

Chacha20 can do random access. See the end of my article, when I talk about counter mode. To get the part of the stream you want, you just generate the block you need (they're all the same, only the counter changes), then encrypt it. No need to generate all previous blocks.

Indeed, one reason for using AES in counter mode is this random access, which among other things enables parallel encryption. The same strategy works with Chacha20.

rakoo · on Jan 10, 2017

I'm probably stating the obvious here, but whatever your strategy for decrypting is you still must verify the ciphertext integrity, which unfortunately for you is calculated on the whole ciphertext. You may win some time by not reading the stuff before the block you're interested, but you will have to read the whole stuff anyway if you want to be safe.

I'm no expert of course so I don't even know if there's an AEAD that can bring you integrity on parts of the input; at least I know that minilock (https://github.com/kaepora/miniLock/blob/master/README.md#-m...) builds some kind of counter mode where each chunk is properly encrypted and has everything needed to check its integrity.

tptacek · on Jan 10, 2017

The most widespread way of using Salsa/ChaCha is in the "Chapoly" construction, which combines ChaCha20 with DJB's Poly1305 polynomial MAC; this is an authenticated construction. Pretty much every mainstream application of Salsa20 is in fact a Salsa/Poly1305 construction.

You can also just combine Salsa and HMAC.

It's true that you need to authenticate your data, but this is true for any cipher that you use.

It's a bad idea to implement your own cipher code, no matter what you're doing. If you're looking to include Salsa/ChaCha in an application, use Nacl, which refuses to give you unauthenticated ciphertext.

wolf550e · on Jan 10, 2017

These days people use AES-CTR (or authenticated encryption modes based on AES-CTR, like AES-GCM), which can be treated as stream ciphers.

tptacek · on Jan 10, 2017

There's virtually no difference in utility, since pretty much the only thing we ever do with a block cipher is adapt it to encrypt streams --- this is true conceptually even when we're not literally turning the block cipher into a PRF with something like CTR mode.

the8472 · on Jan 10, 2017

Under the hood chacha is a block cipher too. It just happens to have counter mode baked in, which turns it into a stream cipher.

orlp · on Jan 10, 2017

Under the hood ChaCha is a 128 bit -> 512 bit hash function with a 128 bit key, running in CTR mode to get a stream cipher.

It is most assuredly NOT a block cipher under the hood.

loup-vaillant · on Jan 10, 2017

Err, the key can be 256-bits. This is the preferred key size these days.

orlp · on Jan 10, 2017

Oops, that should indeed be 256 bits (8 * 32 = 256). The rest of the comment stands though.