Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Public Suffix List (publicsuffix.org)
76 points by aleyan on July 15, 2021 | hide | past | favorite | 44 comments


Before you begin to make use of the PSL, consider some of its problems: https://github.com/sleevi/psl-problems

FWIW, the link above successfully convinced me and a coworker not to use the PSL.


I've had hours of discussions with Azure support patiently explaining to them that using the PSL is the incorrect way to verify domains for certificates.

I had to create a cert like: "dev.app.org.dept.nsw.gov.au"

They dutifully looked up the "owner" of this domain using the PSL and incorrectly determined that it should be "nsw.gov.au". That's a state, not a department. Secondly, that specific registration is managed by a different super-department that has never heard of me, my app, or the "org" that it is actually managed by.

For contrast, Let's Encrypt correctly verifies management of the FQDN, not some bizarre "TLD+1" that can't be reliably identified. Worse, even if it could be magically identified reliably, it's still wrong! The entire point of the Domain Name System is to delegate ownership hierarchically for scalability.

You can't have some random DevOps guy pestering a completely separate team on the other side of the planet in a Mega Corp for certificate validation! That just doesn't work at scale. It works for "myfirstblog.io", which is the only thing Azure ever uses for their tests and demos, however.

I'm informed that this misfeature is "working as designed", which means that all but one NSW Government department can never use automatically issued Azure certificates... by design.


I'm having trouble understanding your example. Who is the correct owner of dev.app.org.dept.nsw.gov.au and how does Let's Encrypt get it without the use of such list?


Each and every level of that hierarchy has a different owner! That's why the domain name system is a hierarchy and not a flat list, to enable this delegation capability.

In this case, it was something like:

    dev.app.org.dept.nsw.gov.au -- Outsourcer (private company)
        app.org.dept.nsw.gov.au -- Project Team in "org" managing the outsourcer
            org.dept.nsw.gov.au -- A small government agency
                dept.nsw.gov.au -- A huge super-department that manages the agency
                     nsw.gov.au -- A *different* super-department
                         gov.au -- The federal government
                             au -- The federal government

Each level is a separate set of name servers, DNS zones, and hosting platforms. Each group "owns" and manages DNS zones at their level. Not some arbitrary, random level above them based on some random text file some dude at Mozilla maintains for GUIs and cookies.

At the leaf level of that heirarchy I was trying to deploy Azure "ARM templates", which are supposed to be fully automated. My account permissions were scoped to the Resource Groups and existing resources associated with the non-production application environment. I could create DNS records, for example, but only under "dev.app.org.dept.nsw.gov.au".

Instead, the Azure platform insisted on trying to email "admin@nsw.gov.au", which is a totally unrelated department. They rightly would see this is a phishing attempt, and even if I were to call them on the phone their right move is to reject the request. Or even call the police. The best option here is to get people from my org's department to pass things up the chain, then across to the dept managing "nsw.gov.au", then down to some tech. This takes about a month.

What the Azure platform should have done is requested that I create a marker record such as a TXT record such as "_verify.dev.app.org.dept.nsw.gov.au" or whatever. This can be fully automated as a part of a scripted or templated deployment. Emails to random bureaucrats... not so much.


Got it, thanks for the thorough response.

I guess that, regardless of this, there is still an ownership difference between the multiple levels in your example, and the ownership difference between a TLD and a SLD. In your example, nsw.gov.au controls the TLD and they could register dept2.nsw.gov.au without asking anyone else, they could even register a.b.c.d.dept2.nsw.gov.au, but they can't register nsw2.gov.au. I assume that's why the PSL is still useful for some use cases, but I agree this one is definitely not one.


With Let's Encrypt your script can just publish a dns record "_acme-challenge.dev.app.org.dept.nsw.gov.au", and Let's Encrypt will verify it based on DNS delegations. The fact that you can publish it means you control/own the domain. A similar thing occurs implicitly with HTTP verification, the A record verifies that the owner of the domain trusts the web server (in some sense).

It sounds like Azure require some kind of manual, out of band verification. Maybe they tried emailing a well-known email (like postmaster@_nsw.gov.au), based on information from the PSL. A tiny contractor deploying a single application may control that one URL, but not the whole domain.


Letsencrypt still has a notion of a domain against which all the quotas are computed. If they incorrectly determine the domain an operator could have a configuration mistake and deplete the quota of an unrelated domain


Looking at the list -- that URL hasn't been on the PSL for 12 years.. https://bugzilla.mozilla.org/show_bug.cgi?id=547985


Similarly, Azure can't create certificates correctly for agencies or departments in the Australian Capital Territory (act.gov.au) and Northern Territory (nt.gov.au).


I see the value of this, but I find the wisdom of it to be highly questionable for anything but the highest-level TLDs.

For example, it enumerates the domains of many US state school districts:

  k12.pr.us
  // k12.ri.us  Removed at request of Kim Cournoyer <netsupport@staff.ri.net>
  k12.sc.us
  // k12.sd.us  Bug 934131 - Removed at request of James Booze <James.Booze@k12.sd.us>
  k12.tn.us
  k12.tx.us
  k12.ut.us
  k12.vi.us
  k12.vt.us
  k12.va.us
  k12.wa.us
  k12.wi.us
  // k12.wv.us  Bug 947705 - Removed at request of Verne Britton <verne@wvnet.edu>
  k12.wy.us
These seem like awfully specific subdomains to be hardcoded into general-purpose software and entirely reasonable ones to want to set a cookie on or otherwise treat as not-TLDs. The list itself includes evidence of this in the form of exclusions due to bug reports and even makes this point specifically in the case of Hawaii:

  // k12.hi.us  Bug 614565 - Hawaii has a state-wide DOE login
It’s regrettable that browser vendors, even generally responsible ones like Mozilla, feel an incentive to do this.


You think that's bad? Take a look at cn-northwest-1.eb.amazonaws.com.cn or s3.dualstack.us-east-2.amazonaws.com. There are 22 monstrosities like that.


If everything was a neat tidy standardized set of TLDs, the PSL wouldn't exist.

But it isn't. PSL allows computers to understand political, social, financial, and arbitrary decisions that have an impact on security.

For example, uk.com has no legitimate right to be a TLD. But someone managed to get enough other people to buy subdomains that we have to treat it as such. If we refuse to do so, it harms end users and not uk.com.


This could never happen because you get things like github/lab pages which gives you a subdomain which you can run JS on. Anyone could create a new public suffix at any time.


And if they're aware of the implications, they add it to the list of public suffixes when they do create one.

github.io and gitlab.io are on that list, for example.


> It’s regrettable that browser vendors, even generally responsible ones like Mozilla, feel an incentive to do this.

The 'supercookie' problem they mention does seem to warrant the effort though, even for these 3rd level hosts.

Can't recall which TLD it was under but there was a fairly high profile example of this a few years back.


k12.XX.us isn't a state school district, it's an organization of school districts within the state. For example, I used to work for ggusd.k12.ca.us (they no longer publicize that domain), and there were other districts registered under that heirarchy too.

It wouldn't be ok for my school district to have set a cookie for all school districts within the state, and that's why k12.XX.us would make sense to be in the list.


FWIW, this is the same list Facebook told[0] businesses “not” (wink, wink) to add their domain to after Apple announced all the tracking restrictions.

[0] https://www.facebook.com/business/help/331612538028890


The public suffix list is an abomination --- a useful, pragmatic, largely successful abomination, but an abomination nevertheless. The PSL centralizes and makes static a database that should be dynamic and distributed. It's a throwback to the bad old pre-DNS internet where everyone would copy around /etc/hosts files and rely on ad hoc human updating to keep host->address mapping up to date.

The information in the public suffix list belongs in DNS.


I guess the problem is there is no clear, technical definition of what a TLD is. There are many weird cases as described on the public suffix page (e.g. edu.au is a TLD unless it is catholic.edu.au, which is a TLD itself).

I believe it would be much easier to work with domain names if everyone agreed that all domain names should follow the same convention (e.g. hostname(s) + domain name + TLD type + country suffix) from the beginning.


Not just catholic.edu.au, states have their own things, e.g. schools in Victoria are *.vic.edu.au. Universities tend to not be conceptually so local and tied to the state, so they go with *.edu.au.

Same deal with government, you’ve got *.gov.au for federal matters and *.{vic, …}.gov.au for states.


That would be potentially dangerous.

This is old, but according to their site[1], IE used to use it for "Zone determination" and "ActiveX opt-in list security restriction".

If DNS were referenced for these purposes, that's how you get 0wned by accepting DHCP at your local coffee shop.

[1] https://publicsuffix.org/learn/


> that's how you get 0wned by accepting DHCP at your local coffee shop.

Code authentication is what certificates are for. What is the precise attack you're describing? "Potentially dangerous" doesn't cut it. In any case, DNSSEC addresses any possible coffee shop attack.


No, DNSSEC doesn't address any coffee shop attacks. DNSSEC is a server-to-server protocol (or, I guess more fairly, a recursor-to-recursor) protocol. Unless you're running something other than a stub resolver on your machine (if you didn't install something, you're not), all of DNSSEC's security coalesces to a single bit in the header that says "yes, I promise, as your name server, that I actually verified signatures". I hope it's clear that a coffee shop attacker can spoof a DNS record with a single bit set.

DNSSEC is full of big problems, which is why it's moribund (virtually none of the most widely-used zones have signed). But the lack of any service model in the "last mile" is probably the biggest.

If you're worried about coffee shop attacks, you're looking for DoH, which actually does directly address these attacks, which, unlike cache corruption, actually occur all the time --- which is why browsers are moving to make DoH a default.


Nothing stops you running a recursive resolver on your local machine. DoH is great too (provided you trust the endpoint), but what I really want is a chain of trust extending from the root to the individual DNS record, and DoH by itself doesn't give me that. But DoH sure is great for further centralizing control over internet name resolution in a few hands!


Nothing stops you from behaving like a DNS server, that's true. But nobody is going to forklift in error-prone new DNS security infrastructure just so a couple hundred Linux nerds (hey, it me) can have a new toy to play with. What's not going to happen is for every phone in the world to run its own recursor, and that's what needs to happen for DNSSEC to work on the last mile.

There are lots of problems with DNSSEC, not just this one. But if you want to have a discussion that involves things DNSSEC might help on the Internet, it's best not to start with coffee shop attacks, which are the problem DNSSEC is literally least suited to solve --- in fact, practically designed not to solve.

The coffee shop problem --- or, more accurately, the wide-scale equivalent problem of ISPs behaving like malicious coffee shops --- is exactly the problem DoH addresses.


That's why nowadays most web standard uses "origin" rather than "domain".


DNS over HTTPS has already solved that issue.


Some related past threads:

Public Suffix List Problems - https://news.ycombinator.com/item?id=20889474 - Sept 2019 (15 comments)

The Public Suffix List - https://news.ycombinator.com/item?id=12311530 - Aug 2016 (40 comments)

Public suffix list - https://news.ycombinator.com/item?id=9634824 - May 2015 (1 comment)

Public Suffix List - https://news.ycombinator.com/item?id=850115 - Sept 2009 (3 comments)


iOS 14 and Facebook Pixel causing increase in PSL inclusion requests - https://news.ycombinator.com/item?id=26726722 - Apr 2021 (107 comments)


The IETF WG DBOUND tried to find a better solution to this problem and did not reach any consensus. fwiw.

https://datatracker.ietf.org/wg/dbound/about/

The current way most of this is handled is via a list published at publicsuffix.org (commonly known as the "Public Suffix List" or "PSL"), and the general goal is to accommodate anything people are using that for today. However, there are broadly speaking two use patterns. The first is a "top ancestor organization" case. In this case, the goal is to find a single superordinate name in the DNS tree that can properly make assertions about the policies and procedures of subordinate names. The second is to determine, given two different names, whether they are governed by the same administrative authority. The goal of the DBOUND working group is to develop a unified solution, if possible, for determining organizational domain boundaries. However, the working group may discover that the use cases require different solutions. Should that happen, the working group will develop those different solutions, using as many common pieces as it can.


Hey, it works fine as long as you don’t think too much.


Couldn't this be done in DNS? The same way zone delegations appear in there, a way to encode what's a public suffix?

For example (I'm bad at DNS)

_suffix.gitlab.io TXT "type=public,cookies=restrict,cross-origin=forbid"

would tell everyone that remram44.gitlab.io is under the gitlab.io public suffix, and how to deal with cookies etc?


1. Nobody would adopt it.

2. clients would need to do extra DNS queries, which introduces latency


I don't know about 1. Very few people have to adopt it, people who operate public suffixes like github.io and gitlab.io, and those people already spend a lot of effort on security (which currently means getting a separate domain, getting registered in the PSL, dealing with various HTTP security headers, etc). Deploying a DNS record is not much more effort than HTTP headers, and it's easier to understand than cors-related headers.

There is a problem of number of queries. One extra is not very bad (connecting to a website already requires many queries, for CDN, ads, analytics, ...) but if you have to query all parent domains I can see how that might not scale.


Something I’ve always wondered: why is `co.uk` a TLD? What’s the story behind that?


It was inherited from JANET pre-DNS, which had UK.AC.FOO for academic and UK.CO.FOO for commercial. So instead of .edu.uk and .com.uk, we kept the established inertia and just flipped the word order. It's also why we ended up with .uk as a ccTLD instead of .gb (as iso3166 would have us use).


Thanks for that little story. I always wondered. TIL :)


A few other oddities were born out of this too. For example, major universities using two letters with a full-length alias (eg UK.AC.OX & UK.AC.OXFORD - seeing .ox.ac.uk is still the more common today), and also the British Library ending up outside of a top-level element 25 years before anyone else (UK.BL became bl.uk and british-library.uk).

A lot of what's weird in the UK's hierarchy is simply because NRS predated DNS.

(Other trivia - NRS contexts did 40 years ago, what SRV is still trying to do today. And a huge number of govt sites still serve their dns records through ja.net, which is a nice nod)


Because only US could make their own "top level" TLDs


Getting a domain listed is pretty hard.

Getting vendors to update their PSL in less ubiquitous products is near impossible. For instance, 1Password hasn't shipped a new version in years.


> Getting a domain listed is pretty hard.

I disagree. I've made two PRs [1] to that list to add domains where we assign subdomains to mutually non-trusting parties to ensure proper cookie security for these.

Both times the turn-around times were less than a week. In my first PR I even made a small mistake, because I did not read the instructions correctly. I promptly corrected it (< 5 minutes later) and the maintainer then merged my PR like 2 minutes later.

However we properly researched what the PSL does for us and what it does not before filing the PR. Also the domains do not hold anything other than customer data (similar to github.io).

[1] https://github.com/publicsuffix/list/pulls?q=is%3Apr+author%...


A friend had 5 months turnaround time in 2019. There weren't any issues with the PR, in fact the maintainers have not commented on it and merged it directly. Just after 5 months.

Meanwhile updates to existing entries were handled within hours. Since then the PSL has just been this thing that shouldn't be centralized if some people get preferential treatment in my mind.

Happy to see they seem to have gotten better at getting through PRs in the timely manner, despite the sudden iOS surge that hit somewhat recently.


Small plug for a random python tool I maintain that uses this.

Parsing domains is a pain in the ass. It can be impossible to know what is part of tld, what is a subdomain etc without a canonical list and parser.

Here's a sansio domain / tld splitter: https://github.com/theelous3/sansio-tld-parser

Usecase: you want to block all edu domains - but tlds like wa.edu.au exists - gotta parse it out.


Nice. One suggestion, your README.md references the GitHub-hosted list, but in that list itself it says:

> Please pull this list from, and only from https://publicsuffix.org/list/public_suffix_list.dat, rather than any other VCS sites. Pulling from any other URL is not guaranteed to be supported.

I'd submit a pull request but I'm on mobile.

Also, is "sansio" a commonly-used phrase? I've not seen it before. I appreciate that it means "without i/o", I've just never seen it written like that before. Not even by a French person :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: