Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Building a Stateless API Proxy (thea.codes)
152 points by panarky on May 30, 2019 | hide | past | favorite | 53 comments


Firstly, great writing. Secondly, magic proxies are awesome. And pyca/cryptography instead of something terrible like pycryptodome! All great stuff.

Some critique of the crypto bits, since I'm a crypto person.

1. Do you actually need asymmetric cryptography for this? It seems like at some point the proxy has full authority, and it could just encrypt the token for you symmetrically? (This is valuable because symmetric crypto is a lot less precarious than asymmetric crypto, see next point)

2. Please don't use PKCS1v15 padding for encryption in new systems. It's been known to be busted for about 20 years now. We have workarounds, and they may well be deployed in the exact context you're using it. But they keep breaking, because we just have them to keep the infinite amount of already-deployed software running, not because we think it's fundamentally fixable. This is also the textbook example of a vulnerable service: one that takes ciphertexts and decrypts them on demand. With PKCSv15, I can modify the ciphertext so that _the way you treat the modified ciphertext_ tells me something about how to make private key operations. And in this setup, that means I get the real token, so that sounds bad. The good news is that you've successfully designed around it by adding a signature, so I don't think it's mountable... but. Please, no more PKCS1v15 :-(

3. It feels a little awkward to use JWT for the outside signature but not the inside encryption. But less JWT is a good thing :)

Concrete suggestion: use Fernet (you're already using pyca/cryptography) or libsodium's secretbox and then all of the crypto problems go away. You keep the security engineering dilemma of stateful v stateless proxy (do I want the real token in the proxy at all?) -- but that's another argument.


This great feedback. I'm definitely not a crypto expert. I'm happy to update the post with a different padding algorithm if you want to suggest one. I'm a little hesitant on swapping out RSA because I intentionally picked something relatively simple and familiar to folks, but yeah, will still do some research. Others had suggested ECC which I think is totally worth noting in the post one way or another.

Thank you!


The answer to your question is OAEP but I feel like I'd still be doing you a disservice there because I am convinced the answer ought to be box/secretbox or Fernet or AESGCM maybe -- but that hinges on my question about asymmetric cryptography which is elsewhere in the thread :)


Okay, it's updated to use OAEP and PSS padding. Thank you!


Oh, and we use asym because it's useful for us to be able to inspect the token (and see stuff like it's permissions and expiration client side). I only kinda mention this a bit later, but I might make that a bit clearer.


Huh, yeah, sorry I definitely missed that. Could you ping me if you make that change? I'd love to read that bit.


Yeah! My blog is actually on GitHub if you want to follow or whatever. It's theacodes/blog.thea.codes.


>> And pyca/cryptography instead of something terrible like pycryptodome!

pycrypto is terrible. The pycryptodome fork fixes most problems in it.

Also, maybe worth sharing you are listed as fourth contributor [1] to the cryptography library, and your web book is prominent on the project homepage, so this piece of opinion may be biased.

>> Concrete suggestion: use Fernet

Please don't. Don't use boutique protocols with informal specs and without test vectors generated by a sufficient number of independent implementations. Stick to RFC-backed protocols. Use JWT with rigid parameters. Even other cryptography author states that just supporting JWT and not Fernet would have been better [2].

[1] https://github.com/pyca/cryptography/blob/master/AUTHORS.rst

[2] https://github.com/pyca/cryptography/issues/2900


I'm not super interested in debating 'zimmerfrei but for everyone else: no, I don't think you should use a library that randomly slaps copyright headers of the fork author on source files [0] and introduces C implementations of MD5 in 2018 [1]. I do think it's ironic that they suggest sticking to RFC'd specs with many competing implementations while defending a project with no mandatory code review, mostly 1 author, and currently failing CI :)

[0]: https://github.com/Legrandin/pycryptodome/commit/8675e6f03fc... [1]: https://github.com/Legrandin/pycryptodome/commit/87c2d6aedb3...

The number of cryptographers willing to do hours and hours of free, often thankless, open source work is pretty small, so no, I'm also not going to write up a disclaimer every time I tell someone to use a library. Of course I'm going to work on the projects that I think are doing the right thing.


Thank you for a superb response!

If the length of the token was important, and you wanted to issue the shortest possible (yet secure) token: what would you recommend? I looked at Fernet but the ciphered text is... massive.


You mean massive in general not relative to the encrypt+sign-JWT combo detailed in the blog post?

Any kind of encryption is going to make the ctext be effectively random bits, and is going to add a MAC tag that's some number of bits wide, and introduce some randomization (IV or nonce). You can tweak the size of some of those, but I feel the biggest cost you're paying is probably the b64 encoding.

What are you encapsulating this thing in? An HTTP response?


Yes in general, not the JWT version. I was thinking in terms of issuing API tokens like this, and how the longer they are, the more room for copy and paste errors. SHA256 has been a good length to provide to people, and encrypting real text is going to make it longer than that; but do some ciphers require less padding? Changing the encoding to say b85 will help.


Just a quick supportive comment on this - it's a really nice piece of writing, introducing a topic which tends to cause people to glaze over, and doing it in a very approachable and easy to pick up way - and that takes some skill and effort!


Thank you so much, I really appreciate the kind words!


I am having the same thought too after reading through it. Good job on the step by step explanation together with the illustrations.


+1. Really a well structured and written article, with a good mix of clear prose & graphics to keep it interesting.

I'm in awe of folks who make the time for the kind of effort this requires.


Agreed! I really like how they introduce the topic incrementally, eventually getting to JWT, but not throwing that at you from the get go.


Using Elliptic Curve cryptography would've resulted in much smaller signatures, libsodium is considered secure and has bindings to most sane/modern environments: https://download.libsodium.org/doc/

JWT also has its fair share of security issues in itself: https://paragonie.com/blog/2017/03/jwt-json-web-tokens-is-ba...


That JWT link sounds like hyperbole.

Can anybody chime in on whether JWT is absolutely broken as stated in the article, or, while it has some issues, the author likes being a bit too dramatic?


The root of most of the points in the article is "when you use JWT for the wrong use case it is bad" which is self-evident. Other than that it is hyperbole imo.


Unnecessarly and overly dramatic, essentially arguing that because JWT/JOSE is insecure because it can be used in an insecure manner.


JWT is multiple layers of bad. My favorite summary is that it has poor implementations of a harebrained scheme designed to solve a problem you don't have.

The idea that I need to read the header, which is unauthenticated, to parse the token violates the Cryptographic Doom Principle. Has that led to vulnerabilities? Of course it has: I just said it violates the Cryptographic Doom Principle.

The idea that it has everything plus the kitchen sink -- even for drastically different behavior and opinions on how the world works, from symmetric encryption to asymmetric signing and multiple implementations of each at that, is anathema to modern cryptographic design. Wireguard has one scheme and it does a lot more complicated stuff than "encrypt a session token".

JWT's saving grace here is that few people implement all of it. And ... that's ... cool? Until they do, of course.

You can argue that something is an implementation problem and not a spec problem. Some issues definitely are, but if every major implementation has the same damn bug, then I think it's a spec problem. Unauthenticated headers are a spec problem. PKCS1V15 enc is a spec problem. The fact that an implementation can patch around it doesn't make it not a spec problem. I'm sitting on several more vulns in ~every JWT library that are, to cryptographers, literally too boring to publish even though one of them is _key recovery_.

Other posters have said that it's silly to say that merely the ability to use it unsafely is a problem. But good crypto looks exactly like bad crypto while you're doing it, and there's good crypto that doesn't have that set of problems, so why would you ever choose the poor design?

(Don't use JWT.)


Most people use the JWT compact serialization, which cannot carry the unprotected header at all. If you're exchanging JWT compact tokens, the header is protected by the signature or the encryption.


What? You mean protected by the _MAC_? The header is never encrypted: the header is how you even figure out what to do with the rest of the message at all. That is why it definitionally can not be protected by it (that's the definition of the cryptographic doom pricnople!) and is how the bugs I am referencing are exploited to begin with. The only sense that a JWT header is "protected" is that the spec calls it that.

Have you ever exploited a JWT vuln? Which one? Because odds are there's a way it boils back down to the JWT header design choice being silly.

I mean there's an easier way to have this conversation: if the header is "protected", how did the alg=none bug ever work?


In a JWS the header is integrity-protected by the signature if the alg isn't none. This is prominently noted in the specs and alg=none artifacts are referred to as "unsecured JWS". In a JWE the protected header is integrity protected by the AEAD cipher, because all encs must be AEAD.

The alg=none substitution issues happened because of bad usage of mediocre libraries. Other algorithm substitution can arise for the same reason. The invalid curve attacks were the ones that the spec didn't call out as a security consideration.

I support the arguments that say algorithmic agility is a bad idea and a new protocol with algorithmic agility shouldn't have come out at a time when other protocols (like TLS) were finally starting to catch on to this fact. But the JWT cat is out of the bag, and won't go back in: it's widely deployed and people are using it thinking it's solving problems they actually have. Education is the proper remedy.

The PASETO effort attempts to provide better answers and better design to an audience familiar with JWT, but there's also been an uptick in the kind of advice that heavily condemns JWT without supplying some migration paths. That latter brand of advice is harmful.


Same points I made before: if more than one library has a flaw, it’s a design flaw and not a one-off implementation flaw, and if you’re trusting the header before you validate (which is necessary!), then it is not meaningfully protecting anything, which is why those bugs work.

And, finally: we’ve put together an extensive list of recommendations, repeatedly, both in general and in the articles on this thread.


So it seems that the Cryptographic Right Answers is lacking a section on "stateless tokens carrying a small payload". What should one do in this case?


I mean, part of the answer is "don't do that" but if you have to, secretbox or PASETO. Part of the problem is that "stateless token" can mean a lot of things depending on context; for internal use you generally want symmetric MAC possibly w/ symmetric encryption, for external use you probably want signing -- all of which have answers in Cryptographic Right Answers :)


I was wondering more about how to format a payload that may be shared between agents in a standard, secure format, but that is probably not even a Cryptographic Question :)


Still the same answer unfortunately: depends on the use case. Sometimes you just want signing, sometimes it's OK to share a key, sometimes...


EC isn’t widely supported on older systems, especially in enterprises: https://support.globalsign.com/customer/en/portal/articles/1...

Yes, it’d be a smaller payload and less CPU to use EC over RSA, but EC still isn’t the least common denominator. I speculate the author is optimizing for comparability over performance which is a perfectly valid trade off to make in a blog post :)


I actually just aimed for simplicity and stuff folks might be semi-familiar with. It might be worth adding a little note that EC would be more efficient all around.


Not saying the standard is good, but in this case, the author is specifying a fixed signature algorithm, so that problem doesn't apply, as far as I can tell.


In that case, I don't see why they didn't simply use symmetric cryptography.


I asked that upthread; apparently the answer is "inspectability", they're updating the blog post to highlight that, and I have a hunch my suggestion is going to be the AD in AEAD :)


Slightly OT: Github has API rate limits that often slaughter us (we use Jenkins to scan our Github org), and will get worse in the future as we move/create more repos. Could this be used to alleviate that? I believe that would need some caching, but ... I don't know how caches work exactly, and how I would go about implementing that here...


Take a look at Maintner, a tool made by the Go team to load and store GitHub metadata, particularly issues, comments, reviews, and events: https://godoc.org/golang.org/x/build/maintner. It handles backoff, holds all data in memory, and supports backing up data to GCS.

My team is successfully using it to track ~200k issues/PRs across ~300 repos. We wrapped our deployment in a minimal API to give our infra blazing fast access to that data without needing to worry about GitHub auth or rate limits (since Maintner handles that).

And, we use the proxy described in the article.


Yes, you could potentially do something like that. A cache is only in essence "I keep a local store, if someone asks for something which is already in the local store, I give them that - if not, I go and get it, put it in the local store for the next person that wants it, and then give them that". Of course, having things expire (knowing when to remove them from the cache) is not a trivial problem, but there's a lot of prior art out there (look for "cache expiry") and there's also quite a lot built in to HTTP to ensure that HTTP content can be cached well (proxies are a very common thing - look in to "HTTP cache headers").

Of course, in your case, it might be debatable in terms of utility - if you're trying to replace a scan, you probably are trying to get results as "up to date" as you can - which would be prevented if the cache avoided doing that! The cache would have the same rate limits, and so you would be just as well off adjusting the frequency you currently scan. Of course, if you can't control that frequency (perhaps multiple uncoordinated scanners) a proxy is one way to give you that control - so maybe useful!


shameless plug: I initially created webhookrelay for just this case - Jenkins & Github :) Instead of polling, you can start builds on webhook request. It can create one-way tunnels so your Jenkins is not exposed to the internet (for PRs https://webhookrelay.com/blog/2019/04/17/automated-github-pu...). Open source client for webhook forwarding is also available! Now the project evolved into way more, but still, started from Jenkins, polling and waiting for the build to start after pushing changes.


This is great! A couple of years ago I needed to write a quick-and-dirty proxy in Golang to get around some draconian security policies placed on us by the team running our GitHub: Enterprise server. We released it as an open-source project here: https://github.com/electric-it/hubbard


Is there a reason why they don't encrypt both the token and the permissions?

This would remove the needs of a signature altogether.


Because JWT is well specified and supported and it's useful in our case for proxy users to be able to inspect the token.


You still need an integrity check or signature on the encrypted data, otherwise it’s potentially possible for someone to tamper with the ciphertext to change specific parts, such as the permissions.

If you are encrypting using an AEAD cipher like AES-GCM or ChaCha20-Poly1305 then it is already built in. But AES-CBC and others need an explicit verification on top.


Ok I'm a bit of a cryptography noob, but are you saying it's possible to alter the encrypted token such that when decrypted by the private key the permissions are changed, in this example, but the real GitHub token is not?

Edit: nevermind, I think I've just misunderstood the process outlined in the article. I confused the tokens.


I built something similar for the Amazon SES (Simple Email Service) for my cron jobs and other private applications to use.

https://github.com/ricardbejarano/postino


As the article mentioned, revocation is a problem with the stateless approach. I've never seen a way to revoke individual tokens statelessly -- you either need to maintain state about the valid tokens, or maintain state about a revocation list.


I don't think it's possible to do revocation in a stateless manner. The token that you are revoking was once valid, and when you decide to invalidate it, this change of state needs to be persisted somewhere. But yeah, if revocation is rare and only few tokens are invalid at any given time (which can be easy by adding an expiry field to tokens), keeping a revocation list is the way to go.


Cool stuff! I use a similar thing in my own servers to generate short-lived or one-time access to individual requests. Nice to be able to share an url or `curl` command with somebody to see what I am seeing or for ad-hoc permission grants.


How does the proxy get the initial token? Do you hand it a GH token and get back your magic token?


Yes, it has an API you can call that will encrypt the token with the proxy's key. Implementation here: https://github.com/theacodes/magic-github-proxy/blob/master/...


Anyone knows what tool is used to generate/ those images in the post?


I hand-drew the illustrations using an iPad pro & photoshop. The code highlighting is using Sphinx + witchhazel.thea.codes.


This begs the question why not be github? I bet some permutation of this idea occurs to the next developer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: