JWT scope claim compression using a bitmap

sonofgod · on May 23, 2020

That's a great garden path sentence. (or at least beautifuly ambiguous)

I initally parsed it as "[The] James Webb Telescope ([which is a tele]scope) [team] claim [they can] compress [pictures somehow] using a bitmap".

cpcallen · on May 23, 2020

I've skimmed the article and I'm still not sure what JWT stands for.

rmedaer · on May 23, 2020

I edited my article with a link to the RFC of JSON Web Tokens, which is for me the first result on Google or DuckDuckGo when I search "JWT".

eyelidlessness · on May 23, 2020

At least the first several google results look pretty explanatory to me?

ernesth · on May 23, 2020

> the first several google results look pretty explanatory to me?

Is the article really about Java Web Toolkit?

eyelidlessness · on May 25, 2020

Maybe my Google results are especially tailored to my search/browsing history :( Every result for me was about JSON Web Tokens.

notadog · on May 23, 2020

JSON Web Token

dylan604 · on May 23, 2020

you're not alone, except James Web is abbreviated JWST (Space Telescope).

shoo · on May 23, 2020

As a casual reader not familiar with problem of large JWT scopes, I suggest the strength of the argument for this proposal could be improved by explaining exactly why larger JWT tokens cause problems in practice, defining some metrics or benchmarks that can be used to measure the impact of the problem, then comparing the proposed solution to the baseline approach using those metrics.

E.g. are large JWT tokens bad because they take longer to send? Then compare durations between both implementations. Or are large JWT tokens bad because they exceed some maximum supported token size? Then compare how many scopes are supported by the standard approach and new approach before hitting limit.

Another thought: the proposed compression scheme requires that each token producer & consumer needs to depend upon a centralised scope list that defines the single source of truth of which scope is associated with which index for all scopes in the ecosystem.

If we assume we've set that up, why not generalise and say the centralised shared scope list is actually a centralised JWT compression standard that defines how to compress and decompress the entire JWT.

This could be implemented as something like https://facebook.github.io/zstd/ running in dictionary compression mode, where a custom compression dictionary is created that is good at compressing the expected distribution of possible JWT values. As long as each JWT producer and consumer is using the same version of the centrally managed compression dictionary the compression & decompression could still occur locally.

In practise in any given JWT ecosystem perhaps there's a subset of scopes that commonly appear together. If common enough that entire subset of common scopes could be encoded in 1 or 0 bits using a custom compression scheme tuned to the frequency of observed JWT scopes, instead of compressing each scope independently.

to11mtm · on May 23, 2020

Practically speaking, you don't want JWTs over around 6 or 7KB unless there's a good reason.

The justification for that number is that JWTs may or may not need to be included in headers. More headers may or may not be added if Proxies/LBs/etc are involved between communicating machines. Some webservers have default header limits around 8kb, and that's how we get to this number.

rmedaer · on May 23, 2020

Hi shoo, thanks for your feedback. At this point in time this is only a proposal. A proposal which needs to be challenged. So thanks again for your useful feedback.

Indeed the goal is to limit the size of JWT tokens. I can't tell you if it really improves the performances for now. I already started a spreadsheet to compare the bitmap scope list against the space-separated list. Although I need some real examples and metrics to do a relevant analysis of the impact.

One of the questions you raised is about the scopes commonly used. Should we defined a shared dictionary/registry ? Maybe. I would propose to open an issue on Github to discuss about that. Here it is: https://github.com/rmedaer/rmedaer.github.io/issues/2

If there is more interest in this proposal, I would propose to create a dedicated repository where ~~I~~ we could discuss, compare and challenge it.

Kind regards,

magicalhippo · on May 23, 2020

> Indeed the goal is to limit the size of JWT tokens.

At work we just implemented some M2M auth using JWT[1]. The other party requires a full certificate chain as our identification and RS256 as algorithm, so our "compact" tokens end up around 8k in size.

At least the auth token we get back lasts a couple of minutes.

[1]: https://difi.github.io/felleslosninger/maskinporten_protocol...

rmedaer · on May 23, 2020

I see that you have a lot of scopes (https://github.com/difi/felleslosninger/blob/ad9ef79b4fef61f...). Especially from 3rd parties (https://integrasjon.difi.no/scopes/all https://integrasjon-ver2.difi.no/scopes/all)

Do you have some statistics about that ? For instance, do you know how many scopes are usually requested, on average ?

magicalhippo · on May 23, 2020

Unfortunately not. We're just outsiders, using Maskinporten to get an auth token to be used against the REST API of some other gov't agency. For that we use one of two scopes, prod scope or test scope, as they (the agency we talk to) haven't narrowed it down further yet.

But if you're interested maybe try to contact the Difi folks running Maskinporten, from my impression there's a high chance there's someone willing to share there.

Maskinporten is being phased in as the primary M2M auth solution for any Norwegian gov't agency, so they're bound to get a lot more "users" (agencies), and hence scopes, going forward.

tmzt · on May 23, 2020

What about using a .well-known path and standarizing via internet draft heading towards RFC. (Assuming there's even a registry for these paths.)

The format could be a simple json file listing supported dictionary items in order grouped by the key used in the JWT json itself.

Maybe a special claim could point to the url or use "wk" as a special value to direct it to the .well-known path on the issuer?

cbanek · on May 23, 2020

Where I work we run JWT, and I've been bitten by the giant JWT tokens all the time. While HTTP doesn't specify a max header length, many implementations insert one along the way. So I've had to find places where nginx is truncating tokens, etc. I think node also has another place you have to set to make sure you can expand the default header length as well.

eyelidlessness · on May 23, 2020

Wait, nginx just truncates when it reaches its max length, rather than error? Wow that is surprising and dangerous.

jiofih · on May 23, 2020

The moment you have a centralized xxx, you need a distributed DB, your use of JWT has become pointless and you can just go back to sessions without all the complexity.

This is why you can’t have a blacklist of manually expired tokens, one of the most commonly raised issues in JWT.

shoo · on May 23, 2020

I agree that adding new dependencies on external central services, or things that need to be centrally coordinated, is something we'd generally want to avoid, unless adding the dependencies gives us a lot of value in excess of the costs.

But, isn't there's a difference between needing to centrally coordinate a common protocol vs central management of state of individual tokens? JWT protocol itself can be regarded as some centralised definition of how different services agree to interoperate with JWT tokens, that all token producers and consumers must implement. It doesn't logically follow that we need a distributed DB that must be queried at runtime when processing tokens to implement JWT support. Similarly for nonstandard variations on JWT protocol that are independent of the state stored in any given token -- all services would need to embed some library that can understand the new (centrally defined) protocol, but there would not need to be any dependency on an external database at runtime.

Even with standard usage of JWT tokens, does there not need to be some degree of agreement and coordination between token producers about what a particular scope string means & agreement between different services in the same ecosystem not to use the same scope string to mean two completely different things?

rmedaer · on May 23, 2020

> need to be some degree of agreement and coordination between token producers about what a particular scope string means & agreement between different services (...)

I agree with you on this point. Actually there are already some "common" scopes, for instance OpenID Connect defines some scopes ("openid", "profile", "email", ...): https://openid.net/specs/openid-connect-core-1_0.html#ScopeC...

However I don't think (to be confirmed) there is a IANA registry for scopes (source: https://www.iana.org/protocols)

But there is one for claims in JWT:

https://www.iana.org/assignments/jwt/jwt.xhtml#claims

Peeda · on May 23, 2020

Yeah the centralized protocol should be easy to manage because it's largely static. I'd add a b_scope_ver type field. As scopes are added it gets incremented. A static doc describing the version can be fetched and cached permanently on demand when a new version is seen, maybe. Can even serve the static docs out of s3 even.

aasasd · on May 23, 2020

A common protocol is more like code being deployed, not data.

Even though there are gradations in how much you do of one versus the other, and you can ship code as data in the db—you don't have to do that.

tyingq · on May 23, 2020

I wonder if this is much smaller than using one character claims and regular http transport deflate/gzip compression.

rmedaer · on May 23, 2020

Here I'm talking about the value of one particular claim: `scope`. If you identify each scope by only one character it would be limited to the size of your alphabet.

If you talk about claim names, they basically aim to be short. For instance, claims defined in RFC7519 Section 4.1 (https://tools.ietf.org/html/rfc7519#section-4.1) are only 3 characters long. As explained in the same section:

  "All the names are short because a core goal of JWTs is for the representation to be compact."

shoo · on May 23, 2020

that'd be a great baseline to benchmark against

tehbeard · on May 23, 2020

> Your resources (aka content) ACL should not be in the scope itself

I have to disagree on this one, being able to specify which resources an OAuth client can tinker with is useful (e.g. only read access to x,y and z repos).

I'm also curious on how often these use cases are of needing many scopes / a god JWT, vs. production usage and keeping a narrow scope for the task at hand. There's also the other option (if in charge of authoring the resource server) to have broader scopes that encompass several others.

user5994461 · on May 23, 2020

In enterprise, one example is when active directory groups are put into the token.

This makes sense because permissions are often managed by groups (read write, read only, user, admin, etc...), so employees can request a specific group to access some business application(s). This causes issues when an employee invariably has a hundred groups, adding multiple kilobytes to the token, more than is permitted by HTTP headers.

tehbeard · on May 23, 2020

Ah, I haven't had the pleasure of dealing with enterprise AD that convoluted yet.

tlarkworthy · on May 23, 2020

The access token is designed for saving size. Scopes are in the ID token. The disadvantage of the access token is the required back channel. But then this scheme also needs a shared backchannel so it's pretty much an access token implementation (you could express it as such)

https://developer.okta.com/docs/guides/validate-id-tokens/ov...

cafxx · on May 23, 2020

How about side-stepping the problem of big lists of scopes by training a zlib or zstd dictionary with the list of scopes, and then compressing the scopes in the jwt token using this dictionary?

Obvious benefit is that you can still represent a scope that is not included in the ones used to train the dictionary (vs. the proposed approach that breaks down in this case)

abhishektwr · on May 23, 2020

I am curious, why will you not use OAuth 2 userinfo endpoint which can serve a lot more detail and keep claims in JWT simpler and lightweight.

quaffapint · on May 23, 2020

If you can just pass around the JWT you can save a network call. I would say the size of the JWT wouldn't matter as much as that call.

abhishektwr · on May 24, 2020

You still have to make network calls to obtain public key (JWKS) to validate token signature. Unless you are using shared private keys. With userinfo you will know if token is invalidated or not.

I guess it also depends on use case. If you are in domains such as banking with elevated security requirements, then probably you want to hit userinfo endpoint else you can continue with token validation with cached or stored keys.

jepcommenter · on May 24, 2020

You don't pull JWKS on every request

quaffapint · on May 23, 2020

Should http/2 header compression not take care of the JWT size to the point that would make this more work than it's worth?

rmedaer · on May 23, 2020

Thanks quaffapint to raise this point. To be honnest, I hesitated to add this question in the post. Indeed HPACK could partially solve the issue. But as you said, it requires HTTP/2. Btw, HPACK is well explained here: https://developers.google.com/web/fundamentals/performance/h...

I tried in this post to not talk about the "transport". Indeed JWTs can be used with HTTP/1.1, HTTP/2 or even SIP. Furthermore HPACK maybe disabled in some cases. Here is what the RFC 7541 (HPACK) says about Authorization header (https://tools.ietf.org/html/rfc7541#section-7.1.3):

  "An encoder might also choose not to index values for header fields that are considered to be highly valuable or sensitive to recovery, such as the Cookie or Authorization header fields."

amaccuish · on May 23, 2020

This very similar to SID compression in Windows Kerberos. Funny to see the same challenges and problems in the web space.

compassionate · on May 23, 2020

I would like to see a standardized scopes composer tool to complement this.

Ken_Adler · on May 23, 2020

I commend you for the attempt...

But the issue you are trying to mitigate (heavy tokens due to complex scope strategy) is a symptom of a bigger problem that has caused OAuth-using folks to scratch their heads for a long while. (of course, also realtes to non-Oauth JWTs)

Tldr: The new "Cloud native" way of solving for this is to not push your "Permissions" thru the token.

Basically, you limit the scopes included in a token to just a few basic ones (essentially assigning the user to a "Role" - think RBAC)....

... and then you use a modern Authorization approach (e.g. CNCF Open Policy Agent) to implement the detailed/fine grain authorization.

Its hella cool, declarative, distributed, and infinitely scalable...

... and it obviates the whole "heavy JWT" issue before it starts....

Source: This is what I do day in day out in my day job....

cordite · on May 23, 2020

What libraries or services do you recommend using to implement that very approach?