Parse registered domain in PHP
Get the registered domain (excluding any subdomains) from a url in PHP, easy right? Its actually not as easy as it may first appear.
In a recent project I had a large list of urls which I had to compare with a list of domains to find matches. I was at first simply using
parse_url($url, PHP_URL_HOST) to get the domain from the given url. This worked fine for results such as domain.com/some-page.html but not for www.domain.com/some-page.html so I started just stripping out occurences of 'www.'. This was until I noticed results like uk.domain.com/some-page.html and sub1.sub2.domain.com/some-page.html.
Your first thought is probably some kind of regular expression to extract a string based on the dots in the hostname. You might assume after splitting the string by dots the last component is the TLD, the second last is the domain and any before that are subdomains. That's fine for simple www.domain.com domains but what about .co.uk, in this case the last 2 are the TLD's. Take a domain like activated.act.edu.au, the registered part of the domain isn't 'act', its' 'activated', 'act.edu.au' is actually the TLD.
It seems the only reliable way is to use a list. The Public Suffix List (https://publicsuffix.org/) is a community maintained database of top level domains (TLDs). There are numerous libraries and code snippets out there for making use of this list but the most popular appears to be https://github.com/jeremykendall/php-domain-parser. Usage is as simple as the following few lines
$pslManager = new Pdp\PublicSuffixListManager(); $parser = new Pdp\Parser($pslManager->getList()); $url = $parser->parseUrl('http://www.nomisoft.co.uk/articles/php-parse-registered-domain'); echo $url->host->registerableDomain;