I recently wrote a blog post about DNS-over-HTTP and DNS-over-TLS and investigated how they could help with your privacy and where the limitations lie. However since I wrote the blog post I've been thinking of ways in which the eavesdropper, government, ISPs, etc. may be able to work out the visited domain with even greater acuracy.
I previously concluded that while it is possible to obtain the IP address used to access the server, it is not always possible to work out the exact domain name used to access the server. I showed how by simply using the service's certificate one, could form a general idea of what domain was accessed. However if a wildcard certificate is used, and this certificate contains many other wildcard entries in the Subject Alternative Name (SAN) field, it would be pretty difficult to work out which domain was hit. Take Google's certificate for https://www.google.co.uk/ for example:
Good luck trying to work out which domain name was used from that list!
Hooray for privacy- or is it?
So this got me thinking:
- We know the IP address
- This IP address is publicly accessible
- We can view the information from this IP address
So using the Google example from yesterday (the IP address which www.google.co.uk periodically changes, hence why it is a different IP address from when my previous blog post was created):
So what happens if we now decide to have a look at the content on the page:
One of the things that sites do is redirect you to the landing page ('/') if you do not hit it initially. All we need to do is take the value from the Location header and parse out the domain name from the URL. What this now provides use is the domain name of the server! While this is not www.google.co.uk, it is close enough!
Another thing sites do is redirect users from HTTP to HTTPS if they hit the site over plain HTTP. Using our, let's say sensitive example from the previous blog post:
Hey look at that, from the IP address I now know you've been visiting naughty sites!
Otherthings which could be used from the pages served by the server include content on the page such as:
- Resources such as:
However I did hit some glitches which kills some of this dead:
- This very blog will not return any response when attempting to navigate to it directly using the IP address:
- Some sites will return a response, but not a very useful one:
There's a reason why attackers go through great lengths to try mask their IP address. And that is because it can often be tracked down through some means or other. DNS-over-TLS and DNS-over-HTTPS provide a degree of privacy, but I would not consider this a considerable degree.
If you want total privacy, the best solution is to make use of a VPN service such as ProtonVPN. This will hide all traffic from prying eyes (since all the traffic sent to and from your system will be encrypted), resulting in total privacy. However you should be careful and ensure that you select a reputable provider. Choosing the wrong one could impact your privacy even more negatively:
Edit 12 April 2018
Since writing this post, I've seen several people point to Server Name Indication (SNI) as a means of bypassing DNS-over-HTTPS and DNS-over-TLS. Most notably the hostname is returned over plain text. Thus this is most certainly another avenue of trying to deteremine the hostname used to connect to the server.
Another avenue which I thought about is a reverse lookup table. ISPs will no doubt have several customers using their DNS servers, or other DNS servers over plain text. They could then place these lookups into a database, whereby they would be able to work out the hostname which resulted in the IP address which was used to access the server.