25th October 2022
CHAIR: Hello, everyone, take your seats. In a minute we will start the next Plenary session.
BRIAN NISBET: Hello. Welcome to the afternoon session of the Plenary at RIPE 85. I am Brian, this is Alexander, we will be chairing this session.
Before we begin, just exciting and I know this is your high point of your week is the PC elections that we keep going on about. Honestly it's very important to get new people in to generate the conversations and ideas and vet the presentations and all the rest for this Plenary.
We have three people who put themselves forward for the PC and there are two seats, so exciting we get to have an election. If you go to the front page of the RIPE 85 meeting page, so ripe85 [at] ripe [dot] net you'll see a nice box there regarding the PC election where you can read the bios of the people who stepped forward and vote. You need to do this before Thursday afternoon, the timings are on the website as well but you have time to do so, but please do vote and do think at a future RIPE meeting about being on the PC. And with that I'll hand over to Alexander.
ALEXANDER AZIMOV: And our first report during this Plenary will be about a growing support of QUIC in modern browsers, and how it may affect user experience. Please, meet Geoff Huston, and his report a quick look at the QUIC.
GEOFF HUSTON: Good afternoon all, my name is Geoff Huston, I am with APNIC.
You know, I was going to say it's wonderful to actually see you all here in person, and not because it's been virtual for the last few years, it's because for many years before that, the PC used to schedule me on Friday mornings. There was no one in the room. None! And coming in on Thursday afternoon I thought, yeah, people! But as I look around the room I was only about a quarter right. So, because it's only a few of us, let's get into this and share a few more slightly more cynical observations than I would make to a larger crowd.
So, this is actually a talk about transport protocols. And I start with QUIC. Now, I don't know how much of you have been exposed to QUIC, but we only ever had two transport protocols in the Internet: UDP, the transport protocol where you didn't have a transport protocol, unreliable datagrams, which, you know, the whole dismal performance of the DNS is actually because of UDP. If you put DNS on anything better, it probably would have worked well and worked better than what we have with UDP. It was kind of there was nothing in between UDP, which unreliable datagrams, and TCP, the full weight of reliable flow controlled streaming algorithms.
You either had too big or too small, and the whole reason why DNS is such a suck all at performance, is the fit to UDP is about as bad as you can possibly imagine, the only thing that's slower is TCP. TCP
Now, it's not just DNS that suffered, because if any of you have gone through part of the paradigms of networking, you didn't just have reliable streaming and datagrams, there are something right in the middle. Remember remote procedure calls? Remember this idea of I suppose you could say reliable datagrams, but it's more than that, it's actually multiflex transport. I do multithreading in code, but I can't do multithreading in TCP. Well I can try, but it does head of line blocking.
I can't have a whole bunch of simultaneous transactions to somewhere else sitting inside the one encryption and the one connection state because TCP doesn't support that.
And so, when you look at QUIC, and you go ah, well, it was just a way to lift transport out of the kernel and put it into the application. If you just wrap all of that TCP in UDP, and send, if you will, UDP control sockets to the platform and run your own TCP. Well that's QUIC, isn't it? No, it's not. It could be and that's the simplicity view. But inside QUIC is this phenomenal amount of really advanced engineering around multistream support, multiflow support, sharing a single encryption state, but then having independent flow control.
QUIC is the protocol we needed 20 years ago. And quite frankly, refining a 1980s transport protocol, TCP, was always going to be a dead prospect eventually. There is only so many ways you can diddle the ACK flow and try and extract performance out of TCP. It's a 1980s protocol.
Just like v6.
Yeah, right! I'm get into trouble for that.
So, when you sort of look at the stack and think well it just mashed up the bits of encryption end to end and the bits of TCP and it put it in the one bundle, that's actually leading you way astray. In terms of network engineering and creating protocol engineering, QUIC is actually a leap into what we need in the same way that multithreaded programme languages is what we need for today, and so QUIC is fundamentally very important.
It's also one of the few protocols that kind of go there is nothing to see here if you are a network operator. Nothing. Everything is nipped. So all those TCP control blocks you like to fiddle with as network operators, squish the window down to make it go slower, what window? You can't see it.
So all of a sudden, this stuff is treating the network as an undistinguished commodity provider, because that's your real role, get used to it. Because everything else is being lifted up. It's actually been lifted up off the platform. Facebook doesn't have to wait for Apple to release some new version of TCP and some new version of IOS to make things go better if it ran QUIC, because QUIC can happen in the application level. It has its own control.
So that's good. I have got control. I'm not waiting for the network, I am not exposing anything to the network. I am running in user space. This is brilliant. This is what we have always wanted.
So, if that's the case, and Google have been working on this for about ten years, the real question is: How well are we going?
So, this is the text around, you know, that last rant. QUIC is actually multistream TCP in a shared congestion state. It uses standard TLS, and is heading towards TLS 1.3 with nipped SNI, server name indication, a bit of encrypted server name indication is yet to come but everything else in QUIC is TLS 1.3. It's only three packets to start sending data, but you can send data in the first three packets anyway.
And you can leave a session state sort of static and resume again at the first packet.
So all of a sudden you have got what we call zero RTT. This stuff is modern. This stuff is what we have been waiting for forever.
So, all the rest of this stuff, session integrity, sure you can do streams that are reliable. A stream can be one packet, a stream can be tens of millions of packets, each stream inside QUIC has its own flow control. It reacts to the network on a shared state of encryption. But other than that can be treated independently.
No head of line blocking. You can do a whole bunch of things in parallel, which is what the web world really, really really wanted.
And everything is invisible to the network. One other thing too. You v6 people, this stuff works like a phantom over NATs. It really, really works. Why? Because a connection is not the IP addresses. When a server and a client in QUIC start talking, each of them give the other a 64 bit connection ID. It doesn't matter what the source address is, if I present the same connection ID to the other end, it's me.
So, I can run a NAT that changes my address every round‑trip time. And it doesn't matter. It's still me because it's the same connection ID.
So this stuff ‑‑ well, actually, that was important anyway because NATs were built for TCP. The SYN packet created the NAT state. It held it open until it saw an iPhone or until it got bothered, when it saw the iPhone it tore it down. When I am running UDP what do you do as a NAT? 20 packets, okay, gone, and of course they do that, and because the IETF refused to standardise NATs because they thought NATs were too smelly, every NAT behaves differently with UDP, every last one. Even different models from the same vendor, on the same model from the same vendor bought at different times will behave differently on its binding behaviour. QUIC decided that's okay, I'll make that irrelevant.
So QUIC is address agile as well. Like I said, for v6 it's kind of oh, you mean we are getting used to NATs. Oh, yes, we are getting bloody good at accommodating them. Oops!
So this is what the Internet needs today. It probably needed it ten years ago, but this is what TCP should have been a long time ago, it is an amazingly good algorithm. Worth spending some time looking at the middle of it. I haven't got time, you haven't got time. None of us have got time. Let's move on to what I was looking at.
Because, it should have been easy, it should have been so easy.
Now, there are versions of servers out there. There is a certain amount of history here about the tyranny of Open Source and the opinionated fights just below the surface that make some open source projects stall like crazy.
There is a piece of work to try and put support for QUIC in OpenSSL. And it's a damn fine idea, because guess what? Everybody uses OpenSSL. Yes? Yeah. But they said, oh, hang on, we're about to do OpenSSL 3.0 or some major version number, we're going to wait until we release that to get around to doing QUIC. Fine. We all hold our breath and wait, out comes version 3 or whatever, you are going to do QUIC right now. All you need to do is add the crypto libraries, QUIC implementations is already out there. No, no, no says OpenSSL, we're going to do our own top to bottom implementation of QUIC. How long is that going to take? Oh, about three years. What?
So, if you are waiting for OpenSSL to do this kind of work, there is about another three years coming down the pipeline.
So, when we put up this server with nginx, we had to use Google's boring SSL to get the right support for QUIC because it's not going to do it until 2025 or whatever.
There was a number of ways to actually make a dialogue in the web, say let's do this QUIC thing, because what you'd like to do is avoid this whole next framework because you don't live long enough and neither do I. So you'd like it to be a quick handshake, literally. Can we do QUIC? Now, the way Google solved this was actually second use, so they embedded I can do QUIC in the content that was delivered. So the first time you hit a web server and you got back some content from that server name, you'd have inside the HTML an old service command: By the way, I can do QUIC, h3, HTTP/3 on port 443. Now it was other magic incantations in the past but the idea was you put that signal in your content, so the client then goes oh, QUIC, the second time they come to your server, they use QUIC. Fine. It's kind of backwards compatible, no great change.
But of course over in DNS land, there is this theory that the DNS is so much more than a name resolution protocol. It's the thing you use in order to create the entire session establishment. And so if you look at that area of service definitions in the DNS, you now start to see some pretty curious things going on where the alpn, the application level protocol, is now embedded in the DNS value. And so, with one query, you can actually figure out what to ask, what protocol to use, how to ask, all the bits and pieces you spent a lot of time actually discovering is now one query: One answer. Fantastic. Although if someone can explain to me why there is a v4 hints field in that area of the SVCB record and why everyone ignores it,'d love to know. But I am digressing.
Now, that DNS model of doing it in the alpn field is very fast, and it's usable first hit. Because you look up the name, basically I want to go to this website, tell me what protocol I should use, the DNS goes this person can support QUIC. Yeah! Alpn equals h3. So that's the way you set it up. If you are prepared to run a beta version of nginx with boring SSL, you can do it right now. If you are waiting for OpenSSL, go back to sleep.
As I said, there is this whole issue of first fetch/second fetch. All the stuff that is actually doing the DNS look‑up happens immediately, otherwise you have actually got to do a second fetch in order to flip over.
So, we started up the experiment using precisely this. Found, as you can see there, about 1% of users use HTTP/3, which is QUIC on the first fetch. And around 3.5% of users were using HTTP on the second fetch.
Bloody low, isn't it? It's really low. 90% of all the browsers we see out there are Chrome. You know, if you ever want to talk about a strangle hold on the market, look at the browser market and the answer is Chrome. There is no other answer out there. Dear old Safari makes up the rest and there are a couple of people still running Firefox, God bless you! So, you know, Chrome has done QUIC since I don't know, Methuselah was a young lad, they have been doing QUIC forever.
So, there is that blog entry back in, you know, 2020 but it's actually earlier than that when they started doing it, so I'm seeing 3.5%, whereas, in theory, you are all running up to date Chrome, I should be seeing 90%. Interesting!
So, what does everyone else see? CloudFlare. It's not 90. CloudFlare see 0. That's way better than 3%. But it's not 90. So it's kind of oh, even CloudFlare's numbers, much better than the ones I was getting, are still kind of whacko. If QUIC is everywhere, and, you know, everyone uses Chrome so yes it's everywhere, why aren't we actually seeing it in the wild?
So, I thought that when they said second fetch, I believed them. But I was wrong. So, what we did after some discussion was to go: Let's get the users to fetch it seven times. Surely to God, out of those 7 fetches I'm hammer home somewhere you can use QUIC. And surely you know this will flush out QUIC use.
No. Absolutely not. Because, here is this issue of signalling, and the new application behaviour, HTTP/2 does persistency, so if you are fetching 7 times from the same server, it doesn't take it down and build the next connection. It goes hang on a second, I'll just wait and see if you are going to come to me again. Oh you did. Oh you did. And so, the original TCP TLS session never gets torn down. So, this doesn't work because HTTP/2 is too good at being persistent.
But if you read the documentation in NGINX, there is this parameter called keep alive session which seems to be something to do with HTTP/2. Fine, don't. Set it to zero. Don't keep these sessions alive. Just tear it down, I want to see QUIC. Yeah? No. Worse.
Now, again, nothing, no QUIC whatsoever. And the way it works is the implementers at NGINX were much cleverer than me and they said the whole thing about HTTP/3 is it's a better way of doing keep alive but you just told me you didn't want it. So you are telling me you don't want QUIC at all because if you don't want keep alive you really don't want QUIC so I'm not going to do it. I thought, okay. Sigh! Not zero. Now, I always thought there is only three numbers in computing, 0, 1 and a humongously big number. No other number matters, right? So zero didn't work, let's try 1. So, you know, up comes 1 second is. Is this going to be any better? Here is the graph that we sort of tried here when we set it to zero it plummets down to the there is no QUIC to be seen. And then at 1, it sort of works but you notice the blue line, the QUIC on first fetch doesn't work any more.
Now I am doing persistency for only one second, I have got it in the DNS. Surely to God this must be working. But it isn't. Nada. No signal, nothing.
57% on repeat fetches. So there is 7 times with the 1 second session timer seems to have burst through the barrier, all of a sudden now the session gets torn down, the old service directive works, I can see QUIC. By the way, that's Chrome, okay, that's Chrome.
So, you know, this looks good. But that first fetch, which is Apple's Safari, isn't working at all, and it's both IOS and the Mac, and even Chrome and Firefox were finding it weird that they do a bit of QUIC, they do a bit of TLS and TCP and that flipped between the two states. So we started sort of looking at this, and found, you know, it's kind of curious America isn't that good, Canada is a bit crap, South America has got a huge amount of QUIC use and so does a whole lot of Africa. This is not a normal kind of map of technology adoption. In fact, it's almost quite the reverse. And the countries that were doing it, an awful lot of European countries were actually high in QUIC in this kind of thing. But you notice there are two columns, the first query column was all zeros. No one ‑‑ and there is a lot of Safari out there, 12% of world, no one was doing QUIC. So there is something about Safari and our server, something. And so we started looking under every single rock we could find talking to both the Google folk and the Apple folk and the folk who had been working on the quick standards going, what the hell? And the answer is 1 is a really, really bad number. Dikes straw was wrong, there is another number and it's bigger than 1. And part of the reason why is that at 1 second is sort of goes I am bringing up a session, 1 second has elapsed I am going to tear it down again. You never actually got to do anything because the keep alive value was actually an entirety of the session lifetime, not the idle keep alive. Who would have thought a thing called keep alive was actually live time, not keep alive time, we didn't. The thought value of 65 seconds seems to be way too long. That's going back to the original problem of persistency. So we thought 20 seconds. Why 20? Less than 65, more than 1.
Got to have a number. Here is a number: 20 seconds.
And all of a sudden, you know, the signal comes back from Safari, and again, it kind of looks like Apple has got a big market around areas inside Europe, which is, you know, not surprising, they are an expensive device, and less so elsewhere, so that QUIC on first use is now up to a phenomenal 16% on the Isle of Man. They all must have iPhones there. This is kind of a map where Apple is. Where is the other one, which is the Chrome map is subtly different.
And so this is the bigger one. When we set the keep alives to 20, we got back into this Safari thing. But it's not 12%. It's a whole lot less.
Now, there is this new view of the world from the application giants. You don't just deploy code and see what happens. You actually monitor your deployment and you have your instances of your browser talking to home base all the time. So what you are actually seeing with this relatively low 2 to 3% is, Apple going not everyone who gets the signal gets to use QUIC. I am going to do one in N and everyone else, even though we both know QUIC is out there, you are not going to run it. And that's why the number is actually persistently low, that you are actually finding, an Apple aren't the only one doing this, that the application is now doing direct feedback from deployment and actually controlling behaviours as part of the built‑in application behaviour. So it's not just I am running your code. It's the code that you have let me run is talking to home all the time, controlling the way I behave as an application.
Welcome to 2022, I guess.
So, there are some questions about this and I'm going to race through and I think I have given you most of the answers so you can shout along to. Who is doing it? What are the MS values? Because it's UDP and it hates fragmentation. If you fragment you die, no other way around it, what's the failure rate? Does everyone like UDP over 443? And is QUIC faster than what it replaces?
So, who is doing QUIC? On first fetch, it's all Apple, it's all Safari. Yes, we knew that.
On the second fetch it's all Chrome. Yes, we knew that, and whatever Firefox is left on the planet and yes there are some people still running boxes that report they are running Windows. Love to see who they are. God knows, I thought they are all dead.
So, anyway, that was what we see on that QUIC client profile and of course in the browser world, obviously Chrome, Safari, and Firefox takes a bit edgeway, Firefox is meant to be triggering on the DNS records. Yes, there is somebody running Edge, well done, and someone stills runs Opera, well done.
So what we see is who is doing it? Safari does it on terms of the DNS. Chrome is still running the stuff from four years ago. It's second fetch and if you look at the CloudFlare numbers from a standard web server perspective and what the web content looks like today, second fetch only happens in these parameters less than half the time.
So this whole thing about saying when you come back to the second time we're going to use QUIC, a minority of folk ever come back the second time. So, content switching, if you think that's a foolproof method, you are kidding yourself. It's not a foolproof method for switching it on because folk don't bang away at the similar server with a degree of regularity that someone might have assumed, that's why the Chrome numbers were so low in CloudFlare.
The packet size distribution: Yes, QUIC is conservative. 46.6% of what we observed keep their packets at 1,200 and lower. Which is even more conservative than v6, it allows a head up plus a bit more. So QUIC packets are small, quite deliberately.
A bit more at 1250, a bit more at 1252. A few more die‑hards up over 1350, well done them, but on the whole QUIC packets are relatively short because they don't fragment.
Connection loss, what have you done with your firewalls? I would have thought UDP port 443 would have been no, not now, not ever. And the answer is, no, it's fine. Absolutely fine. 0.24% drop rate on QUIC connections. So I see an IP coming packet, you sent me a packet, I send you back a QUIC packet saying hi, let's do this dance and 99.8% of the time, it bloody worked. It had me jaw dropping as well. As I said I thought you security guys are better than that. Evidently you are not. Just let it through.
Is it faster? Well, your machine has a clock, I have a clock, we all have clocks. Let's just do a really, really crude measurement and ask the client's browser to measure how long it took. You kind of go, well, that's crap, isn't it? Yes, but if you do it 20 million times it actually becomes a decent number, because, you know, noise just gets absorbed into a sheered number of measurements. Here is the distribution we're seeing, the elongated lapsed time to do QUIC versus the elapsed time to do TLS and TCP. QUIC is faster most of the time, up to 250 to 300 milliseconds at times.
Now this industry has been spending millions, tens of millions of dollars to shave 1 millisecond off a connection time, 1 million second. And this thing is kind of going yeah, 50 milliseconds, yeah, 100 milliseconds faster. It is just blindingly, you know, faster. It's good.
Cumulative distribution: Two thirds of the time it's just quicker no matter, what it's just fast.
So those are the answers. If you weren't participating attention you can look at the slides. And that's the end of it. If you want to see the ongoing stats, there is the URL there, I should have done a little QR code but I am lazy, so if you can't read and type go and look at the slides later and it's time for questions. Thank you very much.
BRIAN NISBET: Thank you. So we have time for ‑‑ a little bit of time for questions, we do have one from chat, which is from Kurt Kayser, who is apparently a spy or a private or something: What level of packet loss does QUIC tolerate?
GEOFF HUSTON: It's all encrypted. It tolerates loss the same as TCP tolerates loss and it's the flow control algorithm that you select. So, yes, this is a reliable flow control algorithm, it recovers from packet loss, absolutely. And as you know with TCP, and the early days of rollout in mobile, once you get loss rates of around 10%, abandon hope, it just starts to stutter, and as far as I can see, our flow control algorithms haven't improved much, so my guess is it's the same kind of behaviour, but now you can't see it because encrypted. One thing QUIC does flow control on packet numbers not stream offsets. So the beauty is a retransmit is obviously a retransmit and so part of the problems that TCP has is in recovery is when an old packet reappears going hello, I am here, doesn't happen in QUIC. Because packet numbers go this is a retransmit, you know, treat me as such versus this is the original so I suspect it might be slightly more robust because of that.
BRIAN NISBET: Okay. And I don't know, maybe you would have gotten more questions on a Friday morning.
GEOFF HUSTON: It's one more the usual record on Friday,
BRIAN NISBET: Whatever time of the day you are on, thank you very much.
Right, so our second presentation this afternoon is from Rinse Kloek and it's all about fiber to a random part of the European countryside.
RINSE KLOEK: Thank you. Four years ago I did a presentation during an NLNOG meeting, it was called Fiber to the Farm, you see a picture here, this is the farm where I was born 42 years ago. It's still in the family and my brother took it over. But I am ‑‑ he has to milk the cows every day at six o'clock but about the fact that, four years ago, when I did the presentation he got his fiber to the home connection, so well I still had 20 megabit DSL connection, he had a 1‑gig bit fiber connection. He was living on a farm, I was living in an urban area and he had a faster connection, but luckily the things are changing in the Netherlands and that is what my presentation is about.
The first topic about the current status of the fiber in Netherlands. Second topic is about the techniques I used, the PON technique and the P2P technique. The third topic is more technical with PON. Second of the four topics is about the activation of PON devices. And the automation of the installation of the PON switches and maybe we have time for some questions.
I am a freelance network engineering working for 20 years in the ISP business. This is my fourth RIPE meeting, the first one was in Amsterdam, so Sander Steffann, he introduced me while I was working together with him at a service provider to the RIPE meetings.
My current project is at DELTA Fiber, this is a Dutch company, a merger of of two companies, and DELTA Fiber has a big ambition to build 2 million fiber to home connections in the Netherlands. They have also two big investors investing more than 2 billion in the come year, you can do something with that.
This is a map of the Netherlands and this is a map of the fibre penetration, what you see is that the most fiber rollout is in the eastern part of the Netherlands, and that's the more rural part where you have a lot of small villages, farms, not the big cities that are in the west. And especially you see some hot spots around Eindhoven in the south. This is famous for the university where a lot of optical innovation is going on, so maybe that's the reason that there is a lot of fiber rollout started.
Then something about fiber to my own home. So, last year, I finally got ‑‑ they started rolling out fiber to the home in my city. So, in June 2021, they started building the backbone network laying the buckets and the fibers to the POPs. Second picture they arrived at my driveway, and they were bringing the fiber to my front door. They were using some special drilling machine, a torpedo so they don't need to open the driveway but they just put it and it drills under my driveway and it comes up at my doorstep. The third picture is my doorstep and the day before Christmas they arrived and they planned to bring the fiber inside of my house but it was a small issue that I have 1 metre doorstep with a lot of concrete underneath, so they can't drill under it. I asked them can you use the existing power pipe, the duct where the power pipe is laying in and bring your fiber inside with the pipe and don't bring it in my living room that something my wife doesn't like. So, yeah they said this is possible but you need to drill a hole in the power pipe. I drilled a hole in it, they tried to bring in the fiber. They didn't succeed the first time but after a senior engineer came, with more experience they managed it, and one month later in the last picture you see my fiber unit and finally starting of in January I had my fiber to the home connection, so I also had 1 gigabit Internet fiber connection.
This is something out of the network technology that I used. The start of fiber rollout, they used point to point technique. That's a very common technique. You have a CPE and NT network terminator in the customer side in your home, and there is a switch in the central area POP and every customer has its own port on that switch. So it's a fairly easy technique, you can just provision the configuration on the port and you know what customer is on what port and the switch is connected to an aggregation switch or router.
At the PON technique that is used more and more in the Netherlands you see on the downside of the image, you still have the NT, the OLT on the customer side, but instead of the switch you have splitters in the area POP. Those splitters are light end up by the OLT, the OLT is the optical line terminal, also kind of central switch but that switch is not located in the aerial POP but in the central POP. The splitters get feeded by the OLT and they split the signal in this case first 1 to 2, then 1 to 32, we have a total split ratio of 1:64. So with one active port on OLT you can connect to in this case 64 different ONTs and they are in the customer Openstack.
Why should you choose the PON technique instead of point to point? I have four topics. First one is the flexibility in layer 1 network. With point to point you are limited to the maximum distance that the optics point to point can reach, this about 10K so you have to place a switch every area POP where the customers are within a range of 10 ecosystem width. XGS‑PON optic can reach up to 40 ecosystem and especially one project I had in a large area in the Netherlands, we had a lot of farms far away we could reach every customer with just one active device in the whole area, so it was very economical.
Second point, cost per port, I think is also one important point, because you have ‑‑ you need less active ports and optic is a very ex pension I have part of the network. It's a very cost effective technique. Point to point with every customer you have an active port and that's the most expensive part of the network.
The third point over subscription, that's a topic that point to point is a bit in favour because with point to point you have a dedicated port for every customer, mostly a gigabit port, sometimes a 10 gigabit port so you don't share it with your neighbours. With PON you share the capacity, which is ‑‑ if you have a split ratio of 64 customers per port and you have 10 gigabit, I think that's fairly enough for the coming next years.
And last but not least, power usage. With the current rising power prices I think that's also in favour of PON. With PON you have less active ports, optic is about 1 watts that you use, and if you have 2 million connections you have 2 million watts of power you continue using, with PON you have 60 times less power needed so with the current rising price of power, it's very in favour of PON technique. So you could say PON is a very green technology.
It go about the optical part. The rollout of fiber to the home is the user gets a simplex fiber, so it's not duplex, it's just a single fiber. You need a special technique to do upstream and downstream over a single fiber strain. So, common way is to use different wavelengths, for point to point and GPON they use 3010 for upstream and 4019 for the downstream, so you have different wavelengths where you don't get interference.
But there is a drawback, DELTA Fiber has currently point to point network and they co‑employ G PON but you will get some issues for example if you roll out G PON and you have currently also point to point in the same region, for example, if if a customer connects a point to point device to a GPON network they use the save wavelength and they will get interference, because point to point will send continuous light on the fiber and PON doesn't like that. I have seen that happen and all customers on that splitter, if you put a point to point device on a G bogon network you will get impact on your neighbour. So that's not something you like.
XGS‑PON uses different wavelengths, so XGS‑PON they use 1270 and 5070. There are different wavelengths. You can safely deploy XGS‑PON, you could connect a point to point device on the network and you won't get interference because it's different wavelengths and they don't generate impacts.
The newest technique in GPON uses also a different network. In the future you could deploy GPON and GPON can reach higher speeds, you can deploy next to the current technique and could you even do it on the same fiber. So you could do XGS‑PON and GPON on the same fiber and depending on what equipment the customer has, it will get GPON or XGS‑PON.
Something about upload and download. The upstream XGS‑PON, uses some time division multiplex technique. The reason for that is you have ‑‑ if every device would send at the same time frame you would get problems. So, during the start‑up phase of a new ONT, device there is some arranging mechanism. The OLT does the some measurement the latency between the ONT and the OLT and it else it the ONT what are his time frames so every ONT has its own time frames that is allowed to send traffic so in the picture you see that ONT 1, 2, 3 sent in different time frames and because they are doing that, the traffic will arrive sequentially at the OLT.
The downstream side is the whole different story, it's just broadcasting. So if ONT wants the packet, OLT will put it on the packet and it will arrive at the same time in the splitter. So ONT will see the traffic but the traffic has some header and in that heard it the destination of the ONT, the traffic is need to send, to and the ONT knows hey, this is my traffic I'll grab it on the line and if the traffic is not for me I will drop it, and the ONT will send it back to its CPE, the device that is connected. But that's not the only thing, because you could sniff, before the ONT and you could see the packets of your neighbour. That's not something you would like, so, next to that we have also implemented an encryption technique, so, ONT when it comes online it sends a key to the OLT, OLT uses its key to encrypt the traffic so only encrypted traffic is sent down the stream and the OLT has the key to decrypt it, only the traffic for you, you see it because you can encrypt it.
The XGS‑PON, the name says it's 10‑gigabit upstream and downstream.
Something about activation process. The point to point is very easy. The customer connects the device at home. You see the switch port coming up and you know this is this customers, and the servers are ready for the customer.
With PON, that's not possible. You have a shed medium. So if a customer connects a new device, they are sent a so‑called alarm to the OLT. In the alarm there is the seriously number of the device that are user connected. So, we as provider, we manage the OLT and we see that alarm coming in. And before we see that, we have sent the ONT to the customer with the seriously number and we register the seriously number in our database with the profile the user needs. We see the OLT alarm coming in, we recognise the seriously number, we know this is this customer we send the device to the customer and we can provision all the services to the customer.
So, and PON is essential to have a good logistic process, otherwise if you send the wrong device to the wrong person, it will get not his service but the service of his neighbour, for example.
Last but not least, we have in the Netherlands, free modem charge so every user is allowed to buy his own device, so if you don't like the device, DELTA Fiber is sending you can buy your own ONT, but the thing you need to do is you need to hand over the seriously number of the device to us so we can register that device and we can link it to your subscription, and hopefully if the device honours the standards, it will get online and will get the services.
That's something about automation process. Delta fire has a bib ambition to install a lot of customers, and for that they need to install hundreds of OLTs and they are the central device that are installed in the POP. And those are not installed by ourselves but there are contractors that do that. We need to create an idiot proof process of automation.
First we have the Nautobot. The nice thing about that is it's a form of networks that you have the option to create so‑called job modules and they are easy wizards for the contractors to create a new OLT device in the CMDB system. So the contractor only needs to say I want to install it in this location with this host name. That's it. All other things are automatically created so the whole device infrastructure, interface language, is created by this wizard, we have templates for that.
So, for example, after a week, the contractor goes to the area POP, to the building, installs the OLT, wrecks the whole stuff, connects the fibers and the only things he needs to do is change the state from Nautobot from planned to installed. After putting it to installed, some automation is started, something I created in the project, and the automation connects to the API from an Nautobot, those this is this OLT, it comes online, it grabs all the information about VLANs, templates, all kinds of stuff. Next to that it connects to the API of our OLT vendor it interfaces with API and all API calls are done so the configuration is created. The only draw back of this system was that we still need to do some CLI scripting that was for the software that needs to be sent to the ONT. So the software, the ONT gets is installed on the OLT, but the base network system didn't have an API for that yet, so I still had to do some CLI scripting.
Then some tales from the trenches. Some things we discovered. The first issue we discovered was we had one ONT that didn't come online and didn't connect very well so I thought maybe I have to send a reboot to the ONT. In PON you have an option to resend with a specific serial number it and only the ONT with that serial number needs to gets rebooted. So I'll send a signal but after sending a signal all ONTs got rebooted on that port. What it appeared to be was that there was a bug in the ONT that the ONT did look at the serial number, it saw the reboot coming in saying oh, I think I need to reboot, so all ONTs, regardless of the serial number, got rebooted. So that created some impact. But I send it in the ticket to the vendor and they fixed it.
The next thing we had a new ONT, it was pretty new, it was an XGS‑PON ONT. And one thing we did was we implemented an ACS server over the ONT and every hour it sends some metrics to the server and it was stored. I put in some nice graphs of it and one metric was monitoring was the memory free of the ONTs. Memory free of the device at the customer side and I saw, if you take a look at the picture, you see some lines going down every time. So I had some lines going down and going up, going down, going up and going down and it appeared to be memory feed still gets lesser and until the ONT got rebooted. There was a memory leak in the software on the ONT. I also filed this in a ticket to the vendor and they fixed it. Now every ONT is stable regarding to the memory usage.
Last but not least I had a colleague that had XGS‑PON optic from some China website. He installed it in the lab and after he did that and connected to a splitter all ONTs on the splitter went down. So, it appeared to be the XGS‑PON optics didn't fully implement the standard or was a faulty optic but it created impact. So, we learned from that always check before you connect a strange XGS‑PON device on a network because you can create impact on your neighbours.
Conclusions about the projects. XGS‑PON technique is very mature technique. The OLT, so the central part of the network, the OLT that we installed in the POPs is very stable. I did a POP testing with all kinds of features like v6, source guard, VLANs, you name it, everything worked out of the box, so hardly any bugs, most issues were on the ONT RG side is the box that's installed at the customer side. We had a lot of issues, we have filed a lot of tickets especially on the RG side, so the more the side that is doing the wi‑fi. We saw a lot of issues and currently it's pretty stable but still with every firmware update we need to do testing to make sure there are no issues.
Next to that you can test everything in a lab but always if you are going production in a new version you will have versions you didn't see in the lab, monitoring, it will help detect strange issues, so we monitor every ONT, even we do some sys logging to do some investigation if we have issues.
Automation is key, we created automation systems so the whole insulation of the OLTs, the network is almost fully automated. You want to be sure that every device, because we are planning to install hundreds of devices, you want to be sure that every device has the same config. So making word books for engineers is not the best thing. You have to make it as idiot‑proof as possibly that he only needs to do the things he needs to do and no extra configuration.
Last but not least, I have a screenshot of some testing I did in the lab two months ago, was with a new device from another vendor and the vendor promised this device can do gigabits symmetrical. I was not a bit sure if he was lying but I did a test and this testing is done on a device with some net address installation, I did testing with some professional equipment and I could get to 8 and a half gigabits. It's not fully 10 because we have some error correction in place in the PON network, but I think 8 and a half is pretty nice for a consumer grade device.
This was my presentation, if you have any questions.
Please speak a bit loud because ‑‑
AUDIENCE SPEAKER: Hi. So, did you consider that with the PON and, if so, how come it's not in the slides or the WDM PON, so if the customer has a different wavelength?
RINSE KLOEK: It's not really considered. It's mainly about, it's not very common yet in Europe, the DWM PON. It's also about cost, cost effective. You can do, for example, NG PON, I think is done in one or two countries but it's very expensive per port. Currently, the... is in the XGS‑PON. But NG PON the next technology will use some WDN options so you could do more wavelengths over one fibre, so I think it's the next phase but it could take some years will be some WDM PON variant. But there are different variants and it's not clear what's going to be the best or the most cost effective.
AUDIENCE SPEAKER: I have two questions. First, attenuation at 1217 on standard G 652 single fiber is pretty higher compared to other. What's the reach of your system regarding that? And the second question is that manufacturing of lasers on 1217 mm is problematic and from our experience, lifetime of that kind of lasers are not very, very long. So, what are your experience for that?
RINSE KLOEK: About the first question, the optic sends with 6dB strength, so it's very high. And it is sensitive to, and I think minus 29dB, so you have attenuation span of 34dB, so it also depends on the split ratio, so it it's higher, you have a higher attenuation, but the current design is, you know, we can go up to 20km easily and we have a lot of room for extra easily. It's not a big issue for us, you can go up to 40 kilometres but we don't have customers that far away so we choose to do not that far away connections. So that's not an issue right now. And we are also a small country, the Netherlands, so that's not an issue.
Second question, it's just rolled out for one year, I don't see a lot of optic issues right now. I think one or two optics died, so they were replaced. Yeah, so we have to see if it's doing what you are saying. So currently it doesn't seem that a lot of optics are failing, but I think that is more something about four, five years, so we can't say that right now.
BRIAN NISBET: So, a question from the chat, from Antoine, who has no affiliation but himself. So a question on the power usage comparison between point to point and PON. Since the ONT is on the customer premises and gets power from the customer's power socket isn't it just a shift from where the power is taken from. Lower power cost from the network operator but lower power cost for the customer. The ONT is in the customer premises so it shifts where the power cost is is the question. So it's ‑‑ is it cheaper for the network operator but more cost for the consumer?
RINSE KLOEK: Yeah, true. Yeah, that's ‑‑ I think the ONT on the customer side, that is always the case. Yeah, if you have XGS‑PON, if you have gigabits and not 1 gigabit. 10 gigabit will use more power, that's true. But you will get more bandwidth. And ‑‑ why, you will need something at the customer premises. So, it's not ‑‑ point to point you also need a device, and if you want to do point to point with 10 gigabit you will have the same power usage if you do PON with 10 gigabit. So it's not about point to point or PON that is using more power but it's more if you go from 1 gigabit to gigabits, that will use a bit more power, so 10g point to point optic just uses more power but it's not that much, I think it's between 1 or 2 watts so I think that's not the power increase for the customer. I think more the increase is if the customer installs more wi‑fi extension for example, that will cost more power for the customer.
BRIAN NISBET: Okay. And the, there is one other question from Maximilian Immink, which is: Do you still use PPPoE?
RINSE KLOEK: We, as a provider itself, don't use PPPoE but we also do wholesale so we have some customers that they have made an announcement last month and they are going to PPPoE, so we have built a network that can support both, DHCP or PPPoE, that are the two main techniques, we as a provider do you say DHCP but if a wholesale party wants to deliver services using PPPoE it's fully supported in our network.
BRIAN NISBET: Okay. And that seems to be it for the questions. So, thank you.
ALEXANDER AZIMOV: Now we have a coffee break, after that we will have two BoFs, and please, don't forget to rate the talks. It helps the PC to form the programme.
BRIAN NISBET: Of course do not forget to vote in the elections please, which helps the PC to produce the programme. Thank you all very much.
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC