DNS Working Group session
RIPE 85
27th October 2022
11 a.m.
LUCA SCHUMANN: I think I should be supposed to see the slides now. Which I am not. I hope you can all hear me. Hi everyone, first of all thank you for the invitation. My name is Luca and I am a recent graduate from the Technical University of Munich. With me today is also Mike, who will be helping answering some of the questions afterwards.
So, we were wondering if encrypted DNS can be fast and therefore had a look at the evenly standardised DNS over QUIC and it's impact on web applications.
So, how can I click the next slide? Sorry.
So, as we all know, what traffic has shifted towards HTTPS in the last decade, still DNS traffic is largely unencrypted which leads to privacy issues such as ISPs being able to derive user proposals from observing DNS traffic this is traffic by DNS over TLS and also DNS over HTTPS, the other one being more popular due to its simple integration into web browsers.
Yet, both are constrained by the round trips required due to the TCT in the protocol stack, so you will have at least two round trips to establish a connection.
Now, QUIC has been introduced as a modern transport protocol that combines connection establishment and also encryption in AA 1 or even a 0R TT 1 handshake. Similarly DNS over QUIC has been recently standardised and aims to provide DNS privacy with minimal latency, and this leads to auditor main question: What is the impact of DoQ on web performance? Good. So to analyse this, for this we used C map to scan the IPv4 address space from a machine in our university network in April 2022. We find over 1,200 DoQ resolvers with 313 additionally supporting DoH and DOUDP, and these other ones we use for our measurements. As you can see here, the red dots are article the target that we found there is a bit of an uneven distribution, so most of them are in EU and in Asia and in the US, but still, so the blue dots here are our vantage points so still we conduct measurements from all 6 continents.
We used [Selenium] with Chromium and the top 10 most popular web pages from the list. And as local resolver we used DNS proxy so it's an open source project that enables us to query upstream resolvers using DNS over QUIC, over HTTPS and also over UDP. Now for every web page, every DNS protocol and every strum resolver, we conduct measurements from all six vantage points in parallel. On the individual vantage points the measurements are done sequentially.
Now measurements are repeated every 48 hours over the course of one week in April 2022 and measurements include two navigations, the first one is the cache forming and the second one is the actual web performance measurement.
Now the cache forming is done for three reasons. First of all, and mainly to populate the DNS cache of the upstream resolver. This should rule out any influences of recursive lookups by the upstream. Secondly we locate the QUIC version and also receive and adverse validation of the server. And last we receive a TLS 1.3 session resumption ticket from the server if that is supported by the server.
.
So, onto the evaluation.
In the end, we had roughly 57 thousand samples per protocol. And for DoQ we find that also would support TLS 1.3 session resumption. At the same time there is no support for 0RTT connections. Furthermore the vast majority of servers use QUIC Version 1, and DoQ draft Version 2.
For DoH we see 99% of server supporting TLS 1.3 session resumption. And as with DoQ, also no servers that support 0RTT. Also no servers that support TFO. Also all DoH requests are using http 2.
Okay, one last thing before I show you the plots. So we manually looked at two metrics, the first one is the first content for paint vent. That is the time until the visitors visible image or text is shown on the screen. This happens fairly early in the page load which is why we expect a larger influence of DNS on this metric.
The other one is the page load time, which we define as the difference between start of the page load and unload event.
We then compute the mediums for each combination of vantage point, of resolver and DNS protocol. And this should account for the difference between resolvers and vantage points. As you have seen, the geographical distribution for instance that is a bit uneven.
And we then go on and compare the protocol mediums that correspond to a pair of vantage point and resolver.
Let us start with the first contentful paint event.
So, here we can see how DoQ in blue and DoH in green perform against DO UDP which is the vertical line here which is also the base behind for this plot. As to how to read this plot. So for example, here is those that 60% of doc measurements delayed the FCP event by 10% or less.
And at the same time, 40% of DoH measurements delay it by 20% or less. You may also notice a small fraction of DoQ and DoH queries here being faster than DO UDP and we attribute this to the missing retransmission mechanism on the transport layer for UDP. So instead your UDP has to wait for a transport lay out time out session.
And now looking at the table here, 20% of the DoQ measurements delay the FCP event by 20% or more, where as for DoH, it is 20% of measurements that Talaia it for almost 40% or more. And that leads us to a first key takeaway that DoQ significantly improves over DoH.
But now let's have a look at the page load time.
All right. So, we have prepared here a bit of a matrix. So, on this axis you can see the different vantage points. We have Europe, Asia, Africa and Oceania, and here we have the different ‑‑ four different web pages, we have which can PITA, Instagram, Microsoft and YouTube and the numbers in the brackets here represents the average time of DNS queries that are required to load this page the plots are also colour coded, so the lighter colour means a higher percentage of DoQ page loads being faster than DoH and we quickly see that it gets darker towards the more complex web pages. So, the improvement diminishes when we have more DNS queries, right.
Also, so maybe I give you a bit of time to just digest this graph, so here the middle line is basically the baseline DoQ, the purple one is DO UDP and the green one is DoH.
Maybe let's just start looking at the basic, or the most simple web pages.
Here, especially in Africa and Oceania, DoQ improves the page load time over DoH by up to 10% in the median. And at the same time, it worsens the PLT by up to 10% in the median. So DO UDP is left of the baseline and DoH is right of the baseline. So the take away is that cost of encryption is the largest for these simple web pages.
And looking at the more complex pages, the page load time for the protocols gets closer as the cost of encryption a.m. more advertises, the more DNS queries are required. So here we can see that DoQ pretty much catches up to DO UDP and DoH so there is not much a difference to see any more, which means that here the cost of encryption is the smallest.
So, our key takeaway is that DoQ catches up to DO UDP with inincreasing complexity of web pages.
.
All right. So onto the conclusion we saw that encrypted DNS does not have to be a compromise, and DoQ improves over DoH with up to 10% faster page loads for simple web pages and it also catches up to DO UDP with increasing complexity of web pages.
Please remember that we have only observed 10 websites and 300 search engineen resolvers so it will be interesting to see measurements with more pages and resolvers in the future.
This is also an ongoing work. And we are also interested to see if zero RTC Legacy holder supported at some point. And lastly, another dimension to look at is DNS over HTTPS/3 so support for that was recently added by CloudFlare DNS, Google Android and also public DNS and then we will be able to see how the added overhead of the HTTP requests response model will hold up to the plain DoQ without HTTP overhead.
And to finally answer the question that we post in the tile: Yes, DoQ makes encrypted DNS much more appealing for the web. And I hear an echo, but it's okay.
And yeah, that's it from my side. So, have a look at that paper if you are have had in a deep dive and some more findings. Also, code and data are all publicly available and we are ready for any questions that you might have.
(Applause)
SHANE KERR: Thank you. I think that was really interesting. So, I don't see any questions online or at the queue, so I'll add one of my own.
I guess I think, these results are nice, because they kind of match our intuition, right, Tor, do you think there is anything unexpected by the results that you got here?
LUCA SCHUMANN: Actually, no. I wouldn't say so. It's pretty much what you would have expected from the CRV, mike, you maybe have another opinion on that?
SPEAKER: Thanks Luca for presenting this work. I presented this work two days here here at IMC, and sorry, I couldn't be there in Belgrade with you, so, yeah, this was pretty much what we expected from a survey of the faster handshakes for encryption and transport. However it was interesting to see how those dimensions shift the more DNS queries you use, right, so the point where Luca showed the impact on handshake time for the simpler web pages, this is pretty huge impact, and on larger we shall pages this is essentially the same as before.
SHANE KERR: Yeah, I think that makes sense. I think that's an interesting take away, definitely. I see we have.
AUDIENCE SPEAKER: I just wanted to add Luca if when you looked at the resolver provides, did you see like so there is CloudFlare, there is Google, and there are a bunch of other tech corporations, did you see some kind of like diversity among the providers or are they just these big corporations that are providing it?
LUCA SCHUMANN: For DNS over QUIC, it was quite a mix and I believe, I believe there was no Google DNS involved. So, the biggest provider for DoQ was app guard DNS correct me if I'm wrong, Mike. And as it was a huge mix, so I believe most were probably also not designed to be production ready, but we are not sure about that.
AUDIENCE SPEAKER: Thank you.
SHANE KERR: I also had one ‑‑ I hope ‑‑ maybe a final question here. I was surprised to see that there is no 0‑RTT support anywhere in the Union verse, do you have plans to run an experiment with a lab or something like that where you can just do comparison of projected results?
LUCA SCHUMANN: So that's a question for Mike.
SPEAKER: So, we looked into this in a study in the lab where you compared 1‑RTT versus doc versus the other RTT DoQ and you pretty much get examine you expected to get. So in the first you send the DNS query and you directly get back the answer and of course (DoQ) this stays up like we have seen here to be even faster and be on par with DNS over UDP. So, this is also what we would expect to get have resolvers in the wild if they support 0‑RTT.
SHANE KERR: That's good to err had a. For me that's the real exciting end point is a near RTT performance for this. I guess we're done. Thank you very much, very interesting.
(Applause)
.
So, our next presenter is David Knight one of the former co‑chairs of this Working Group showing some of the recent research he has done on how resolvers treat different numbers of addresses.
DAVE KNIGHT: Thanks Shane. I work at Neustar security services as an architect. What I have got today is a simple operational story, a work around to confusing things in the registry side of the shop.
Recently, and the first half of the year I spent my time building a second edge platform for ultra DNS which launched a couple of months ago, and some of those customers have expressed concerns about how we make this available to them. A typical ultra DNS customer has up to 6 name servers in a domain and when they adopt our second edge platform, we want them to add two more. A bunch of customers have said to us well 8 is a lot, there are some registries out there who won't let you have eight name servers in a delegation. So, I looked into how big of a problem this is. And it was surprisingly hard to find data specifically on what the constraints are in a delegation per TLD registry. Maybe I am not good at Googling stuff or maybe it's hard to look for but after asking around about, Michele from Blacknight was able to point me at a public list which has metadata for all TLD extensions and I also got a list from a registrar, it's a smaller list, it didn't have anything in it and also the numbers didn't also line up, so I'm not sure who is accurate. I'd hope that the registrar is more but I have merged the list in favour of the lower numbers because that's what I was interested in chasing down.
So, in this little table here I have bucketed those by the maximum size of the delegation NS set. It looks like the sweet spot, if these numbers are to be believed, the sweet spot is 10. Around half of them seem to allow 10 name servers but down at the right‑hand side there, it looks like around a third of all TLD extensions don't allow as many as eight name servers in a delegation.
So, how do we work around this? The simple solution and the one that people used right now is to just use fewer name servers. And, you know, a naive look at that is we do Anycast, everyone of our nodes is capable of answering for any address, routing is probably going to take you to the closest node. This is fine. Why do we need lots of name servers?
.
The way that we and others do our Anycast in pursuit of a balance between performance and resilience, we don't flatly advertise all of our prefixes exactly the same from every node. We don't want that for any point in the topologieses that all of your queries will always go to the same place because if that node has some kind of problem that we can't quickly mitigate, then we have created a blackhole. So the way that we solve for that is that everyone of our nodes depreferences some of the prefixes as they advertise them, so that from any point in the topology, we try to guarantee that for the set of name servers for your domain, at least one of them is always going to go further away than it would have to. We try to bias that. So most of the queries will go to a node that is topologically closest to you but some of them should always go elsewhere so that if that node starts to blackhole traffic, you have authority server of last resort to get to to get an answer.
From our perspective, we don't care about names. We care about IP addresses and we warrant resolvers to exercise all of our different announcements strategies.
So then that made me have the idea well can we just ‑‑ if we can't have lots of names, can we still get lots of addresses? Can we give a name server a name more than one v4 and v6 address? Obviously multiple address it is already not surprising because since we deployed v6 it's been common to have a name resolved to two addresses, but could we do four or ten or 20? How far could we take it and where does it break?
.
So the main questions that we have is if we give a name server name lots of addresses, will a resolver use all of them? And do the resolvers treat those as set belonging to a name? So if one of the addresses behind a name is broken or has very poor performance, will the resolver treat all of the group of addresses that belong to that one name in the same way?
.
So, so I came up with an experiment that can he could do. I used my very narcissistic personal domain names to set that up. And gave two name servers a bunch of addresses which represent our two platforms so in ultra 1 platform we'd have lots of addresses and in our new ultra 2 platform we have fewer and so this is set‑up to kind of represent that. So what I have done is set up a test name server in the lab, it's running knot DNS and I configured it with all of those addresses and have a little bit of Python to send queries to a bunch of recursives. I have set‑up a BIND, a Knot and an Unbound, and those were just get installed the most basic, didn't touch the configuration if I didn't have to, just be a recursive and I tuned nothing, and also I sent queries to CloudFlare, Google, open DNS and Quad9 and ultra DNS open recursive. And the queries that I send are cache bursting queries with some information in there so I can easily identify the queries as I capture them on my authority server.
And in each ‑‑ as I walk through the list of IP addresses and break one of them at a time and then send out 100 queries to each of the recursors, and then capture all of this. And what I want to then count is the number of queries which arrive at a broken address, because what I want to know is: After it sends a query to a broken address, does it then use the next address or is it going to jump to an address that belongs to a set of the other name server, and when a broken address starts to work again how quickly does the recurser recover and start to send queries back to that address?
.
Now, first an apology for what is to come, I am not very good add visualising data, I should really do more home WOC on that so there are a lot of tables of numbers, I'll try to explain things and make it not too painful.
These tables of numbers show counts of the initial query to a dead address, that's the thing that I am interested in, and to read the table, the columns are the name server address where the query has landed, and the rows are each iteration where I have broken an address.
And so what I want to see, what I expect to see is a low number of queries arriving at a broken address because I would expect that a recurser will quickly detect the brokenness and stop sending queries to it. What I want to then see is in the row beneath that that the number quickly goes back up again as the recurser detects that the name server has become available again and starts using it.
What I don't want to see is blocks of addresses associated with a particular name all not getting lots of queries, because it would suggest that a recurser might be grouping them by name. And in order to fit all of the IP addresses into the table headers, I have shortened them so dot 13 is the v4 version and: 13 is the v6 address.
Now, I have done the list of recursors in alphabetical order so we are starting with with with BIND and it just so happens that BIND has the weirdest behaviour. So, first of all, BIND strongly prefers IPv6 if it's available. So I mean looking at this, I have highlighted that the part of the table that we're interested in. So, the v4 parts don't matter, BIND, if v6 is available BIND doesn't send queries to v4, and then I have also highlighted the weird behaviour. So, in the first four rows here where I break just the v4 addresses, nothing happens, because BIND doesn't use v4.
When I get to v6, then things start to look weird and frightening because it looked like oh, one of the addresses in the set of addresses belonging to NS1, or UN 1 NN is is broken an now doesn't send any queries to any of the other addresses for that name server. But that's not quite what's happening. It looks like that when BIND ‑‑ it looks like BIND holds a sorted list of the resolver addresses and when it detects that the first one in the list is dead, it jumps straight to the end of the list and sends all the queries there. And then as we walk down the rows, you see this being borne out. It seems to hit the first address, sends some queries but as soon as it hits a dead one, jump to the end, jump to the end, jump to the end. No one else does this, all of the recursors seem ‑‑ all but one but we'll come back to that, seems to have a reasonably recognisable pattern of behaviour. BIND is by far the weirdest. It's also slow in getting an answer when it that is encountered a dead address. I think it was pretty consistently taking 1,600 milliseconds to successfully get an answer once it's been timed out once.
So, moving on, the next in the list was knot recurser. And it doesn't seem to have a strong IP version preference and it is really fast to recover from a time‑out of a dead address. It pretty consistently gets an answer when it hits a dead address in 40 milliseconds, which was very impressive. And also, it doesn't seem to penalise dead addresses for very long.
Then Unbound, is wonderfully normal, it's kind in the middle of everything, it doesn't seem to have a significant IP version preference. And it doesn't significantly penalise broken addresses for long.
And then I start moving into the big public open recursives. CloudFlare, very strongly performance IPv4. Bear in mind that what the counts are here is only the counts when it has encountered a broken address, not all of the counts, but certainly in those, never uses v6. It also does something that no one else does, a CloudFlare recurser sends a retransit of the initial query using the same socket. So I see it coming from the same source port to about 200 milliseconds after the first query. No one else does that. I am convinced I remember someone talk about this once a couple of years ago but I can't remember where or when but that's a thing that they do that nobody else does.
But they also don't group anything and they seem to recover from a broken address free quickly.
Google also does a re‑try to the broken address once, not using the same socket, just a normal re‑try on a new socket and it also doesn't seem to significantly penalise a broken address, neither does it have obvious IP version preference.
Open DNS, very much prefers IPv6. But it will use IPv4 when it first encounters a broken IPv6 address. It doesn't significantly penalise broken addresses, it very quickly starts using them again. It's also very fast at recovering from a timeout in an address that doesn't work. So the Knot DNS results had a consistent 40 millisecond recovery from a time‑out, that was on a box that a sat in the same LAN as the authority server. Open DNS over the Internet is able to get an answer within 100 millisecond pretty consistently after encountering a timeout.
Quad9 a slight IPv4 preference, and it's not, you don't see it here but looking into the expect users, Quad9 is the one I mentioned earlier which doesn't have an obvious pattern of behaviour, but I think the rumour is Quad9 uses multiple implementations,nd it's yeah, I see that in the data.
Then finally, Ultra DNS, no obvious IP version performance and also doesn't significantly penalise broken addresses.
So, does it work?
.
Will resolvers use lots of addresses from a name? So aside from the address family biases that we saw in that data, yeah, they use all the addresses. And do they penalise all the addresses for a name if one of them is bad? Only BIND 9 seems to aggressively penalise anything and I think that's doing something weird rather than something intentional. And so, yeah, really we see no evidence of a resolver penalising groups of addresses for a name.
Resolvers use the name server names to populate a list of addresses that they are going to use to then find authority data, and otherwise name server names don't matter. So this seems to work.
But then we have to think about glue, in my zone I can give a name server name as many addresses as I want. But because these names are used in a delegation, I also have to often put those names into a parent zone and exercising the parent zone means going through a registry and if you remember the start of this conversation, exercising the registry is inconsistent, perplexing, confusing and so let's look into this.
EPP allows one or more address attributes on a host record. So more, great. I want more. I tried to 1212 addresses to a host objects. CA allows you to have one address. Not v4 and IPv6, just one. Your choice.
UK allows 2, you can have a dual stack name server and glue at UK come and org allow, I didn't test for the maximum, but I was able to add all 126 my v4 and v6 addresses in come and org. So if we're going to have to have glue we really want it to be in come and org. Probably other TLDs are available with such generosity of spirit around how many addresses you can have but I haven't investigated that yet.
.
So kind of to wrap thing up. This is what our optimum configuration would look like and how is the behaviour of that? I registered a few names, and UK, although I didn't include that in the test here, and so talking about optimum.
In Ultra DNS we have four different route announcement strategies for our address space, which means that an ideal domain has four name servers, each with a v4 and a v6 address. In URLs DNS 2 we have two strategies so we would like any Ultra DNS 2 user to have two DNS servers. We have a total then of six strategies which means six v4, six v6, so we have 12 addresses that we want to get into a name. And I just care about getting these addresses into a resolver, it's clear that the names don't matter now. I just want to populate that list as quickly ASIC. So, I can put all twelve of these addresses, I'll put them on all the names, so then whichever name you look up first or if you happen to get glue in your referral from a TLD, you have got every address that you are ever going to need in the first look‑up. Which seems we really can't get a lot better than that.
Now the obvious logical conclusion to this as well is well maybe we only ever need to have one name server name, which I don't think is a good idea because TLD diversity is still important. If VeriSign has a bad day, I want whatever it's called now, to be able to give me an answer, and so, you know, I still want to have at least com and org name servers for a domain potentially more.
Now, looking at what this looks like on the Internet, a signed referral didn't too big. You know, here is a look‑up of the delegation at the gTLD servers for dotcom, it returns glue for the com name server. This is signed with, I forget exactly, Algorithm 13, and this is the same size of a referral response that we get for some of our com customers who don't do this stuff.
And yeah, as I mentioned when glue is present, there is always going to get every IP address that we'll ever need in the referral from the TLD.
So, conclusions:
.
I'm not proposing this as a new best practice, this is definitely something that is going to be useful for us, but not every up has our set of constraints.
It does seem to work though.
And it doesn't seem to significantly change resolution behaviour, so, as I have said, there is a slight optimisation because one look‑up gets the resolver every address that it will ever want to use.
I think the customers should probably still try to use more name servers if they can. But if we were to adopt a practice where we just gave all the name servers the addresses they can pick and choose whatever is appropriate. TLD is probably still useful but it means that in the constrained environments they can use two and things will work.
The one worry that I have is that it's surprising. You know, what will tools do when they encounter names like this? Get host by name returns one address, do they think that's enough, are they probably testing and measuring? We can adapt our own tools, but, you know, there are a lot of naive tests out there. I'm sure anyone who has done things in the TLD space has perhaps encountered test that is don't quite understand how modern DNS works.
So, our next steps are to take this out of the lab and start doing some trialling with some customers.
And thank you, any questions.
.
(Applause)
SHANE KERR: All right. So there appears to be some small amount of interest in this topic. So, I am actually going to close the mic, because we don't have that much time for this, but you can join the queue, first of all we have a question from the meeting system which is: Did you file a bug with ISC for BIND? Which version did you test? "
DAVE KNIGHT: I don't want to admit to quite how late in the day I was putting this together but since eleven o'clock last night no I vantage filed a bug with ISC.
AUDIENCE SPEAKER: Chris, RIPE NCC, interesting talk, Dave, thanks. So, for DNSMON and by default domain MON we sort of faithfully its rate through the NT records numerate the As and AAAAs we made the decision not to treat the DNS names as anything special, we have kind of a bundle of IP addresses. So, this was actually useful that that's probably on the right track and it's interesting to see how it works in the real world.
DAVE KNIGHT: Fantastic. That's exactly what I would expect. Thank you.
AUDIENCE SPEAKER: Ed Lewis. I'll keep it short. Did you try giving the same IP address to two name servers?
DAVE KNIGHT: Networks not yet.
AUDIENCE SPEAKER: I would try that. Just ‑‑ I'd be interested today's what happens there because I think that's one thing you should try.
DAVE KNIGHT: Okay. Thank you.
SHANE KERR: I think that's a good question and I had a similar thought.
AUDIENCE SPEAKER: Great talk. So, if you ‑‑ you said that you had kind of an experiment over the last time you gave out every address to every name server. So that means the domain has kind of one IP twice.
DAVE KNIGHT: Yes.
AUDIENCE SPEAKER: Okay. So I mean it works. There is one in 12 chance in your case that the second kind of request from a resolver will also hit the same broken name server but that's about it.
DAVE KNIGHT: Yeah, yeah, but then ‑‑ my imagining of how this actually works is that the set of addresses for an authority server is Union set that just ‑‑
AUDIENCE SPEAKER: That's pretty much how all modern DNS treats it, yes.
DAVE KNIGHT: One thing I didn't mention ‑‑ I have forgotten the thought actually, sorry.
AUDIENCE SPEAKER: Lars Liman from Net Node. A question and an comment. Do you require that all your customers use something with glue.
DAVE KNIGHT: So but we have customers who want to use vanity domains, where they are using our name server names, then we control the glue and it's fine. But sometimes they use vanity names and so we don't know where they are going to need to create glue or...
AUDIENCE SPEAKER: I was suggesting exactly that as a way out of the problem if it is perceived as a problem. My second comment is regarding you said testing and we heard from the RIPE NCC they do the right thing, but for the others, just let them fix their systems, because this is actually an old thing that having multiple address records is nothing new, it's just not very commonly used. So, if testing systems cannot handle that situation, I would argue that they are broken. So, let them fix themselves.
DAVE KNIGHT: Would I say in response to that, that some of these things are performance measurement tools that list how good we are, so we are kind of motivated to make them work properly and then the other side of it is the compliance stuff, where, you know, you have done things in TLD registration space, then there is a big organisation which has compliance rules and tests to enforce them and, yeah, it's quite inconvenient when they are a bit behind the times.
AUDIENCE SPEAKER: Points taken. Thanks.
AUDIENCE SPEAKER: Tell from sales force. That's actually a really good segue into something we talked about already but I wanted to share with the rest of the room which was the flip side of this, not how many name servers you can have but how few. As a company that has a very large amount of domain names we actually prefer that park domains not have any DNS at all because that way we don't have to worry about defensive SPF or any other records that would be necessary for it. Some domains like dotcom and dotnet of course allow to have zero name servers so you can X domain, some have have I stringent testing requirements for what they expect out of the name servers to be, and you can even encounter very weird situations like there is one CCTLD that you can have zero name servers as long as you only initiate with zero name servers, if you ever had name servers then you are still required to have name servers. So the registry landscape in this regard is very peculiar as well.
DAVE KNIGHT: Yeah, and that's exactly why I am here with a work around, because registries behaviour is perplexing.
SHANE KERR: Thank you Dave.
All right. That was great. Our next presentation is about measuring encrypted DNS, and this is from Arturo and he is also going to be presenting online.
ASTURO FILASTO I should be able to share my slides. Hopefully you can see a slide now? I will assume so.
SHANE KERR: Yes, we can hear you and see the slides.
ASTURO FILASTO perfect. Thank you everyone for being here and thanks for inviting me to present. I am Asturo Filasto. I should start off by saying that most of the work that I am going to be presenting in this presentation was done by my colleague, who unfortunately could not make it here, so I'm sort of filling in in his place.
So, to start off, I guess I'll just spend a few words to tell you what OONI is about. So OONI is a free software project that started back in 2012, and our goal basically is that of empowering people around the world to document cases of Internet censorship in a way that brings about more transparency and we do this through the use of free and Open Source software and all the data we collect is made available as open data.
Since the project started ten years ago, we collected and published for than a billion measurements from more than 200 countries and territories all around the world. And through these measurements we have been working with most ourselves directly publishing research and documenting evidence of censorship, but also supporting many different organisations all around the world that use this data to carry out various forms of advocacy, policy making, journalistic investigations, as well as legal litigations.
The way through which we collect this data is a software called OONI probe, that ultimately people install on their mobile phone or on their computer, and what this tool does is it runs a series of network experiments which collect very technically rich network measurements on various forms of blocks, which can be the blocking of websites, of instant messaging apps, and we also have some performance tests.
This is not necessarily what I will be talking about today. I will be talking about one specific experiment in side of our suite, but sort of to show you what the, the sort of value chain of a measurement in general is. You know, we have this tool, OONI probe, that is run on some users network which carries out these experiments. We then take these measurements and we process them, we analyse them, we do some sort of ETL where we put them in our database, we put it in a public S3 bucket and then we make them available to researchers in law firm but then also through a web interface that make it understandable also to a more wider public.
So, I invite you to check out any of the key words that you see here if you want to learn more about it.
What I am here to talk to you about specifically is an experiment called DNS check, which basically looks into the reachability of DoH and DoT servers. It was talked about before, sort of the, that it's important to understand if we are to have a wider deployment, the performance impact of the DoT or DoH, but another thing is DoT and DoH are also used as a circumstance convention tool in that traditional DNS over UDP is trivially blockable because it goes in plain text, whereas something like DoT or DoH avoids that, and so we were interested basically in trying to understand where and to what extent certain popular DoT or DoH servers were being blocked around the world and so we came up with this experiment which is called DNS check and basically how it works is in two stages.
On the one hand it has an input which is a target server server that it is going to measure with its address that can be specified as a domain name, or it can also be enriched with some known IPs of that service.
We then go through bootstrap phase where we basically use the system resolver, like the basic DNS over UDP system resolver of the probe to discover what additional addresses may be corresponding to something like DNS.google. And then we carry out the actual experiment. And the actual experiment uses what we resolved using the system resolver, in this case what it resolved to is clearly a bogus address, it's not the actual address of DNS.google, but we use it nonetheless. And we enrich it with the other addresses that we know it to have resolved to be valid resolutions for it, and we perform DoT or lookups using all of those addresses and filling in the SNI field during the TLS handshake which is the server name encoded in the client alone. And we measure if we're able to establish a TLS handshake and then perform a DNS query successfully.
We presented back in in 2021 a paper which looked into measurement campaign examining 123 DoT and DoH services through some measurements that were collected directly by us from December 2020 to January 2021. The countries that we looked at were Kazakhstan, Iran and China. We chose these countries because we know that they implement prevalent, like they are known to implement a lot of internal censorship, and specifically the vantage points that we used were, in the case of Kazakhstan and China were VPS service that is we rented and in the case of Iran, we had a vantage point on a mobile network.
The results that we found were quite interesting, in the sense that we found in all of these countries some form of interference of, in some way, of some of the DoT or DoH services.
Specifically, the bootstrap phase, surprisingly, at the, at the time, did not find it to fail in any of the circumstances except one, which was for the DNS at come in IP resolved to this private IP that is known to be used by Iran to implement DNS space censorship.
In terms of the lookups phase, we found that when endpoints were failing or succeeding, they were mostly failing or succeeding consistently, which leads us to believe that, you know, the block was in fact some form of intentional block.
In particular, in the case of Kazakhstan, we found that between these two addresses, the 1.1.1 and the 1.0.0.1, which are CloudFlare DoT services, we found them to switch between being blocked and unblocked pretty regularly, and we saw similar sort of fluctuations in the blocking of CloudFlare in Iran.
One interesting thing that happened during the measurement campaign was that in China, this Japanese DoT service started to be unblocked for some reason in January 1st, 2021. And in this table below you can sort of see the breakdown of, you know, the successful lookups of DoT and DoH per country.
The first like sort of take away from this is that in most cases, more than 80% succeed, which is I guess a reassuring and encouraging finding. The only place in which we see them to fail a lot is in Iran, specifically in the case of DoT, of 50% of them are failing.
But as mentioned before, this is sort of looking at the corpus of 137 resolvers, so, maybe some of the most popular ones that people are most likely to be using will still not be functioning.
Another thing that's worth, and interesting to look at, is sort of the distribution of the failures, comparing them between DoT and DoH. So here what we're looking at is basically of all the measurements that suggest some form of failure, what were the types of failures that we saw?
In particular, we make a distinction between a timeout after the TLS handshake versus a timeout during the TLS handshake. What this means in fact is that you are able to send the client hello, get the server hello, and then you try to send some data over an established TLS session and it times out. That's the timeout after. And then timeout during the handshake would be you send the client hello and then it just times out.
I guess the key takeaways here are that on the one hand, it doesn't seem to make much of a difference whether or not we're using DoT or DoH. They both seem to be blocked, more or less, in the same way, given the same endpoint, which I think is a pretty interesting finding. Like, I wouldn't have expected that given that they are, you know, using different ports and to some extent also a little bit different protocols.
The other, I guess, take away is that the blocking methods in all cases, in most cases are consistent between the two protocols. So as we can see in Kazakhstan, it's a timeout after the TLS handshake in both DoT and DoH and in the case of China it's the connection that is timing out? Both DoT and DoH. . The exception to this it Iran where we see the DoT services timeout after sending the client hello. Where as DoH have a bit more of a sort of different distribution of blocking methods.
And this might be due in part due to the fact that Iran implementing censorship in a more advanced way.
And having already existing blocking infrastructure in place.
I just wanted to leave some examples of like how we can see that there is SNI based blocking, so like in Kazakhstan we saw that we would speak to the same address but we would change the SNI field, in some cases would time‑out and in other cases it would succeed, which leads us to believe that blocking was happening in that way.
Similarly, we noticed, or rather conversely we noticed another thing happening in Iran where you would be using the same address that the SNI field actually didn't matter, so you could use the same SNI speaking to two different addresses, and you would see a block in the case of one but not in the case of the other.
So, this is regarding the paper you can find more details in it. I also wanted to give you some updates on some more recent research we have been doing of things that have been happening lately.
So, for example, in recently in Iran, following the death of Masha Amani there was a wave of protests and an intensification of blockings. One the things we noticed was that they started to crack down more on encrypted DNS services, specifically in the beginning I mentioned that the bootstrap was working in Iran in all cases except for one. Now we notice that the resolution of many popular DoH services fail in Iran, and fail in a way that is confirmed which means that it is a known IP used by the Iranian censor.
So to summarise what we found recently in Iran is that DoH endpoints that previously worked are now starting to be blocked and they are using also an additional technique which is based on traditional DNS over UDP blocking of the address of the DoH services.
You can find more details about this in our report as well.
The last thing I wanted to mention is that DNS check is now in OONI probe. Whereas previously when we published that paper, the measurements were collected by us through the running of manual tasks. Now everybody running OONI probe is automatically collecting these measurements, and this chart here you can see the volume of measurements, we're now at around 6.hundred thousand measurements for more than 50,000 networks everyday so there is a done of data out there, which I encourage you to take a look at and dig into. And we did quickly a little bit of investigation into this data, just to see if there was anything new or interesting.
We found mostly the result, except the Iran case, the results to be can't with what we had already seen. Additionally, we found blocks in Saudi Arabia at a TCP level so they are basically just blocking the services of some DoH and DoH services and then when extending it to also looking at the TLS measurements we can see that also Qatar seems to be implementing some block for the DoH services of Google.
Future work. We have several improvements we want to make to this test, namely try to make it closer to what a real DoH/DoT client would look like pay parrotting the TLS stack of POP leader browsers. We would like to one implementation is that we are currently only doing this over TCP and over TCP, so it's basically that so we would like to extend it to run over QUIC and measure if something changes there.
Currently, how DNS check is deployed, it's kind of not very resilient to bootstrap failures to we would like to make improvements to that. And as mentioned, the data out is out there, so you should take a look at it and dig into it.
I'll leave you with some contact information if you want to reach out, join or slack channel or reach out to. I am happy to take any questions.
.
.
SHANE KERR: I think this is really interesting work. I am going to start off with a question that we have from our online system and the question is: "It seems that only IPv4 DO X servers were tested. Did not IPv6 DO X also tested it and the reason a censorship devices may behave differently."
ASTURO FILASTO that's an excellent point. So, we didn't ‑‑ so currently in the input targets that we give to probes, we don't have IPv6 addresses. We do ‑‑ we will, however, use IPv6 addresses if during the bootstrap phase the client is able to discover an IPv6 address. In the charts that I showed previously, we were sort of merging together IPv6 measurements with IPv4 measurements. It would probably be interesting to cult the data in such a way where you are looking at those two differently. It should be noted, though, that it might be ‑‑ because a lot of our probes are ‑‑ I mean, we have seen that not a lot of our probes are not having IPv6 support so it might be that in the case in which there is blocking ‑‑ like in Iran for example, I believe it's only one of the major national networks that has IPv6 support, and more recently, that has actually been entirely blocked. But I agree, it's definitely something that should be looked into. Like, doing a comparison between the two.
AUDIENCE SPEAKER: Have you considered repeating this experiment in Russia? They are currently actively improving the digital censorships methods and the results might be interesting because even different providers use different technologies to achieve authorities required from them.
ASTURO FILASTO yeah, that's an excellent question. I mean, we have many measurements coming from Russia. As a matter of fact, I think the largest volume of measurements we have, apart from the US, is coming from there, so the data is there. We just haven't had the time to look at it yet. But, if you are interested in digging into T please do reach out and I can provide you with some pointers and indications.
AUDIENCE SPEAKER: Hi. There is this hypothesis that because DNS over TLS has its own ports, port 853, it is easier to block it of course for irrational governments, to block it than DNS over HTTPS, and could this be one of the reasons that the Iranian government is successful in blocking DoT and not DoH?
ASTURO FILASTO well, so, what we saw actually was that it seemed like there wasn't that much difference between the blocking of DoT or DoH. Like, I'll go back to maybe here, like here we can see that basically if you sum up the number of failures you have more or less a similar distribution of failures, and we noticed that it's ‑‑ it doesn't seem like they are having an easier time at blocking DoT versus DoH.
AUDIENCE SPEAKER: And in the previous slide, you had ‑‑ okay, so the successful DoT lookups is 750% in Iran and the successful DoH lookups is 92%.
ASTURO FILASTO yeah, that's a good point. I mean ‑‑ yeah, I mean Iran yeah, you're right, that that is a good ‑‑ yeah, a good point.
AUDIENCE SPEAKER: Thank you, I just wanted to point out that maybe that's like an area that we can look at and see whether this is one of the reasons. Thank you.
ASTURO FILASTO yeah...
SHANE KERR: All right. Thank you very much. This is very interesting, I am sure we could talk more but we're out of time. Thank you for the research and the presentation.
(Applause)
.
Our next talk is going to be introduction of new DNS software.
BALINT CSERGO: Hi, I work as a production engineer and I would like to give a brief talk about how we do DNS at Meta and what's the history and I would like to do an important announcement at the end.
So, here is the agenda for today's talk. We will briefly go through the history, we will go through the different iterations of the software we have used. And there is the announcement at the end just this morning we open sourced our DNS server, that we are happy about.
So, in general, what do we need from a DNS server? We use resolver IP based maps and EDNS client based maps so we are able to efficiently direct traffic, and also these views need to be simple to generate and configure, and updates need to be easy to be deployed, plus, obviously, being a production engineer, query logs and health metrics are also quite important and also we have very, very strict latency requirement, on average it's sub‑1 millisecond. So the DNS server needs to be relatively fast.
So, this is how we do DNS. Records can come from many, many sources. These sources can be automations, as you see on the left one of them being the host dB. One new record is created or updated every time one host is installed or like the name is changing for that host. And then the configured one on the top, humans can also create DNS records. At the end of this tab, there is the DNS publish err that combines all these records into a database that will then distribute internally.
So, here is the history. From 2012 to 2018, we have been using tiny DNS, that was the choice for server implementation. But we needed plenty of patches on that first, one of them being IPv6 support because we use IPv6 quite heavily. Also the EDNS clients sub‑net location matching and the map I talked about previously.
In 2018, one of our colleagues had the idea to start innovating and basically write their own DNS server using the same database back‑end.
In 2020, we replaced our database back‑end with locks dB. I will talk about that later. And this year we are here to announce the open sourcing.
So, tiny DNS:
.
We like tiny DNS because it was very simple and efficient. It was also very, very fast. The configuration file is quite simple, very simple for automations to add it and also as I mentioned, this routing database was quite simple, it's one single constant database file that you can just deploy.
But we didn't like TinyDNS because it was very hard to read C code, not easy to on board, for example I would have had difficulties if I would be talking about tiny DNS code now in contrary to the lovely DNSRocks software. Obviously didn't have enough tests or tests at all which made working on it quite difficult. Plus, it was single threaded which have not great.
.
We looked at open source alternatives in 2018, and none of them did match our requirements either from the performance perspective or with the support.
So we wrote DNSRocks. And Go, because it's a modern language. The DNS library we use is quite good. We use that. And also we use a lot of plumbing from core DNS to make our life easier. Go is an amazing language so building software and testing it is quite easy. We have very good coverage on the software so we can iterate faster and safely. Initially we used tiny DNS data to generate the database so, we have been dropping replacement for tiny DNS back then, now we have our on tooling around that. But we have featured parity and then the rollout was quite simple.
But then time has pass and we have been living with CDB for a while and we learn things. For example we liked CDB because it's quite fast, it's basically patch up on the disc, mmap it. We know how to work with it how to deploy it. But also it had problems because it's a constant database. Every time you change something, you need to redeploy the whole database through the disc, which obviously puts stress on the SSD so we had to put our database into a RAM disc and also it greatly increased propagation time and the last thing which is also quite important it's a 32‑bit hash map so it's 4 gig is the maximum size into which we didn't fit. Therefore, we switched to rocks dB. We have had a multiple database implementations but we ended up going with RocksDB. This is also quite fast and mutable which means we don't need to redeploy the whole database any more. It's flash ready, so we can finally move back to SSD and we don't need to worry about burning the SSD out by deploying updates. And the 32‑bit limitation no longer applies.
So, how we went around it. We wrote the CGO wrap around the docks dB library. We had a back‑end support so we can switch in between CDB and RocksDB as of today and we added a portion and compiler from the tiny DNS configuration so you can populate the RocksDB database.
Obviously it came with trade‑offs bus it's not as fast as CDB but close, but on the propagation time side of things, we had a pretty significant win.
And coming to today, so, the steps we took to be able to open source the software. Obviously it was an internal piece of code. It had quite a bit of inter node dependencies for example logging and metrics collection. We switched to open source alternatives that made sense, for example Prometheus for metrics and DNS tap for logger. We finally we are on top of our dependencies, all of our external dependencies, we are not detending and forks. And there is a pretty good CI pipeline on GitHub and we have some good documentation.
I would like to thank to all of these lovely colleagues that helped us to get here in the journey.
The presentation doesn't have the GitHub URL, it's Facebook incubator/DNS, because we just open sourced it this morning and the presentation was already ready. So if you are have had in that DoQ support for example go to the reap owe and you are welcome to join in making the software better.
And time for questions.
(Applause)
SHANE KERR: Thank you. Are there any questions online? No. Anyone in the room?
So, I have a question: Has anyone ‑‑ so you did mention this at the DNS OARC meeting over the weekend so this is not the very first, I am sorry, it's not exclusive. But has anyone approached you yet about working with this software, integrating it, things like that?
BALINT CSERGO: Yes, there have been talks I have been talking about Dmitry from most master because he wanted to check out how our DNS server can perform for his use cases actually.
SHANE KERR: That's good to hear.
BALINT CSERGO: And also thereof plenty of feedback on the chat.
SHANE KERR: Cool, because I think there is a lot of different implementations in this space, and I think a company just publishing it and then manage it go on their own is not as interesting, but if it's actually interest from the community I think it's really ‑‑ more is always better.
BALINT CSERGO: We hope at least we could provide some alternative to people or still second time DNS for example those could be interested who are looking for something quite easy to deploy and manage, that's the whole idea about open sourcing it.
SHANE KERR: Great. Well thank you very much.
Okay, and for our final presentation, the RIPE NCC is going to be giving an update about what they have been doing with dance lately.
ANAND BUDDHDEV: Good afternoon. I am Anand bud he have from the RIPE NCC, and I am going to give you a very short update on what we have been doing since the previous RIPE meeting.
First, I would like to talk about the two DNS Anycast clusters that we run. The RIPE NCC operates one of the route name servers k‑root and we have a second Anycast DNS cluster where we carry all the other zones that the RIPE NCC either operates or secondaries for other organisations. And we call that the AuthDNS cluster. So since the previous RIPE meeting, the changes have been that our k‑root sites in Frankfurt and Miami have been upgraded so we have replaced them with new routers and now servers. These routers are now also capable of handling 10g connections. And we have newer servers with faster CPUs, so we can handle more queries and all that.
We also did plan to deploy a new site for our authoritative DNS Anycast cluster towards the end of this year, but this has unfortunately been delayed until mid‑next year, and this is mainly because of global supply chain issues and we're not able to get a hold of some equipment like routers. So we will be providing an update on this at the next RIPE meeting.
.
One of the nice things that has happened with all these upgrades is that the management of all these sites is exclusively over IPv6. We no longer use IPv4, and we plan to continue doing this and just pushing IPv6 everywhere we can. So, we aim to lead by example.
Next, I would like to talk about hosted DNS. So, the RIPE NCC works with our community, and members of our community are able to host either a hardware server or a virtual server, and on this we can either run an instance of the k‑root name server or an authoritative DNS instance, and the numbers have been growing, so the chart here shows that in April this year, we had will 87 instances of k‑root, and 7 instances of alt DNS and as the months have gone by, we have gone up to 92 instances of k‑root and 9 instances of authoritative DNS.
There is an app for requesting this service, so if you are a community member and you'd like to support the RIPE NCC's efforts to improve the reach of our DNS clouds, you can go to hosted‑DNS dot right.net and sign up and then request to either host a courtroom instance or an authoritative instance or both. The requirements are that you need to be able to provide a Dell server or a virtual server, and a minimum of two network interfaces, one for management and one for the DNS service. But we also support extra interfaces in case there are multiple connections for DNS servers.
As you saw in the previous chart, we have fewer authoritative DNS servers out there than k‑root instances, and we would like to request again for support from the community for deploying more authoritative DNS instances, so that we can increase the availability and resilience of this very important service of ours.
So the authoritative DNS Cloud, Anycast Cloud carries ripe.net and related zones. It also has all the reverse DNS zones of the RIPE NCC, as well as secondarying the reverse DNS zones of all the other RIRs. So by deploying one of these instances, you bring all the reverse DNS space closer to you.
And the RIPE NCC also provides secondary DNS for some smaller ccTLDs. So these also get deployed as part of deploying such an instance.
A quick word on software it diversity. At the RIPE NCC we have been operating our clouds using a mix of BIND, Knot DNS and NSD name servers. So, a big thanks again to ISC, cz.nic and L NetLabs for providing this high quality software. We use software diversity so that a bug in one particular implementation doesn't bring down the entire fleet of servers in one go.
Additionally, we also have diversity at the routing level, so we have Juniper routers, Arista routers and BIRD and FRR for the hosted DNS instance that is we run. So, we have a fair bit of diversity there.
We currently don't have diversity at the operating system level, because that is a little bit more complicated to do, but we may still consider that in the future.
And finally, a slide on Zonemaster. Zonemaster is DNS checking software, and has been written jointly by AFNIC, the French registry, and .se, the Swedish registry, and we use it at the RIPE NCC for performing pre‑delegation checks for any reverse DNS delegation changes. So when a user submits a domain object into the RIPE database to request reverse DNS delegation, Zonemaster does all kinds of checks and if the checks pass, then the reverse delegation request is accepted. If there are any failures, they are signalled back to the user so that they can if I can their name servers.
We are currently running fairly recent version, 2022.1.1, and we have a few extra hot‑fixes in there for a few bugs that we found.
The RIPE NCC contributes back towards Zonemaster with bug reports, sometimes we provide patches, suggestions for how it could work better, and I'd like to thank the Zonemaster team for all the support that they give us.
And finally, the current web UI for Zonemaster is the standard one that ships with Zonemaster, but we plan to integrate all the functionality into RIPEstat so that we have just a single portal for the checks that the RIPE NCC performance, and that way we will have a more unified approach for our users who use Zonemaster.
And that's it. Thanks for listening, and please let me know if you have questions.
(Applause)
SHANE KERR: Thank you Anand, oh, I see someone running to the microphone.
AUDIENCE SPEAKER: Lars Liman. Thank you very much for an interesting presentation. With regard to Zonemaster, my personal view is that that's an excellent tool that does a lot of test that is helps you with a lot of things but it is a rather rigid and opinionated software. Is your tests for getting a delegation solely dependent on the exit code of Zonemaster or is there a way to have a delegation done with my preference rather than Zonemaster's preference?
ANAND BUDDHDEV: That's a very good question, thank you. There are two things I would like to mention here. The first is that Zonemaster comes with a default set of policies, but these can be overridden locally and in fact we do have a couple of overrides for the RIPE database and the reverse DNS specifically. And very occasionally we get users who do have very strange situations that Zonemaster doesn't seem to like, and in that case the user can ask us to pass, you know, to force their delegation update anyway and we can do that, we can override the Zonemaster checks. But it's a little bit of manual process where the user has to open a ticket with us. But it's very rare.
AUDIENCE SPEAKER: I understand. It's should be very rare, so I just want to make sure that this isn't just too square shaped, but thank you very much.
SHANE KERR: Great. Thank you Anand. And that's it for our session, enjoy your lunch and the rest of the meeting and I'll see you all in Rotterdam hopefully.
(Lunch break)
.
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC
DUBLIN, IRELAND.