RIPE 85

Archives

Plenary Session
28th October 2022
At 9 a.m.:

BRIAN NISBET: Good morning, how are we all this morning? After the slight ‑ of the dinner last night. Good morning to the Friday morning plenary session of RIPE 85 as we head towards the end of the meaning, hopefully a very successful for you all, along with Fernando we shall be chairing this session and first up we have a talk on unbiasing Internet measurements, the finding from Pavlos Sermpezis.

PAVLOS SERMPEZIS: Hello, good morning, everyone. I am a researcher from the Aristotle university and today I am going to talk to about a project that we have which is called AI4NetMon and it's topic is how to unbias the Internet, the project is funded ‑‑ was funded by RIPE NCC andracy, the academic programme funding last year, and the work was mainly conducted by our lab, the data and web science lab, but I would like to give special thanks to my collaborators Emile Aben from RIPE NCC, and LARs, you can find information about the project and our code on the links there but I have some more links in the presentation.

So, what is the AI4NetMon project about? We had two goals. The first goal was to find if there is bias in the Internet measurement platforms and how much bias there is there.

And the second goal is to reduce this bias. So with respect to the first goal, in this talk, I will first give you a brief introduction about the concept of bias, what it is and why it exists and give you some main results about the bias that measurement platforms have. And I will present you some online tools we have built within our project.

However, you can find more material and more information in our previous talk in the previous RIPE meeting or you can listen to our podcast on RIPE Labs or our article there.

With respect to the second goal, about how to unbias Internet measurement platforms, we try to do it following two approaches: Either by careful selecting vantage points or by recommending where to deploy more vantage points. And in this talk I will present you some main results from what we have found and some online tools you can use and play with. But also, there have been some ‑‑ two very nice presentations yesterday at the MAT Working Group that are in related topics from how to select vantage points, so I definitely recommend you to check these talks as well.

So, let's go to our first talk. How to quantify bias in Internet measurement platforms:

And why, at first, we think that the Internet measurement platforms for RIPE Atlas route give us a view so they are like a window to the Internet, we see routing things we cannot see in our network and we use these to measure the Internet. However in practice the window they give us to the Internet it's not exactly like this but mostly like a stained glass window, and why is that? Because we don't have vantage points everywhere, some parts we can see them and some parts we cannot see them and some parts we see them better than others so they are behind other colours. And a couple of examples is location bias we all know RIPE Atlas probes, there are a lot of them in Europe and there are less in other continents and this is location bias and the different type of bias is topology bias, it's in the route collector projects because they peers ‑‑ their peers tend to be networks that have a lot of peering connections and this is because the route collector projects are mostly focused on IXPs so there is some topological bias there. And in fact, there are many dimensions of bias one can think of, we have considered several of them, location bias, topology, connectivity or network‑type bias. For each of these dimensions we calculated how biased the infrastructure is, so we found a metric, the bias code, which is a metric that takes value from 0 which means no bias, to one, very high bias. In this plot we represent the bias across all dimensions.

Let me explain. Every radius of this plot corresponds to different dimensions of bias, you can see its location, network size, topology, etc.

The colour clients in the areas correspond to different measurement platforms, the Orange one is RIPE RIS, the blue one is Atlas and the green one it's radius. So, when we have high bias, the line it's far from the centre; in terms of location for example, you can see the line is 0.2 so we have 0.2 bias. If at some dimension we are close to the centre it means we have less bias.

And there are different observations you can make here, you can take out previous talk or some main findings are that the route collector projects are more biased than RIPE Atlas and especially in terms of topology.

So, this thing, this plot we provided online in this observable notebook, you can go and play and check by yourself in the detail, and what you can see here, you can see again the same lines, the same curves of RIPE Atlas and RIPE RIS and what you can also do is put your own list of ASNs, so for example, you select some RIPE Atlas probes and you want to check ow biased this set is and you put the list of ASNs there and it draws you a line for your set or you have a custom measurement system and you want to see how biased it is, are you can do it. Or even you can dob it for list of networks that are not measurement platforms, so for example, an idea is if we would like to check how biased the networks are that are ranked higher in ‑ is there bias there? Are they large networks or small networks, is there location bias there? You can try it with any set of networks, it doesn't have to be measurement platform.

And also you can have different other options so for example you can select the dimensions of bias you would like to see because maybe not all dimensions maybe of your interest.

And on top of this we have created an app I to make things easier for you, so you can do a request to the CPI and get the bias core for Atlas RIS, route views and the route collectors of RIPE RIS individually or your customer said I have here some examples so you can see how to poke the API.

And what the API gives us a JSON with a bias dimension and the values of this set along these dimensions.

Now, just a final thing: Just be gentle with our API it's alpha version, if you encounter any problems or have some suggestions reach out and we are going to improve it in the following period.

Now, let's move to our second goal, how to unbias Internet measurement platforms and we said we can do this either by selecting subsets of the vantage points which we call sub‑sampling or by extending the infrastructure. Let's go first with sub‑sampling. What does this plot shows? On the X Axis is the number of RIPE Atlas probes we select. On the Y axis is the bias core, the lower the better because we have less bias.

So, if you see the green line in the middle, it's some experiments where we selected randomly some RIPE probes, as we can see the more probes we have the less bias we get. On the top line we can see what the bias when we ask RIPE Atlas interface to allocate to us monitors as with the automatic algorithm and we observe that the bias is higher there, probably this is because there is some load balancing algorithm in there. So first step to do is just select randomly yourself, but still this is not enough and you can do some things better, we will show you later. Here it's the bias for RIPE RIS route collector, its dot corresponds and on Y axis we have the bias core again, the lower the better, the on the X Axis is the number of peering ASNs its route collector has. What we can observe here is that having more peers does not necessarily mean lower bias. We can see some route collectors with many peers that have higher bias than other route collectors with 20 peers, let's say.

However, an important observation is that route collectors that are multi hop, they are less biased and this is because they get access to more diverse set of networks.

So, now how to unbias measurements: We have designed some sampling algorithm and we test it and I won't get into details, you can see the documentation, I have all the details about how the algorithm and how we checked it but I just want to state a couple of main findings.

We found by carefully selecting 50 peers from RIPE RIS, from different route collectors, a good set let's say of 50 peers, can decrease the bias of the entire RIPE RIS by four times so there's room for improvement. And the same holds for RIPE Atlas with 300 probes, carefully selected, with algorithms, we managed to have almost zero bias so there is an indeed room for improvement there.

Now, the second approach, let's deploy extra vantage points to decrease the bias of the measurement platforms. We calculated for its ASN if we added it in the infrastructure and RIPE Atlas or RIPE RIS how much the bias would decrease, and the more it decreases the better ‑‑ the higher is ranked this ASN and based on these metrics we provide some recommendations, we have two online tools that you can go and check and play with the data.

So, we present a rank list and this is taking into account all ASNs. However, we may want to parameterise this list, we didn't want to give you a list without further explanations, so what you can do is filter this list and select, for example, networks in a given region or a given country or have a customer larger than X or networks that are connected to IXPs or in PeeringDB. So you can do a lot of filtering options based on what are the criteria or the cost of adding an ASN in the infrastructure, and you still see the rank list after this filtering.

Another thing we calculated and you can find in this online tool is that when we deploy extra vantage points we then have only in mind the bias criteria, we also have how much they improve our visibility of the Internet so the bias it's not our only objective. So, in this plot, it's in had the on‑line tool, we see on the X Axis how much bias is decreased by adding an ASN, its dot corresponds to an ASN. Values on the left means that we decrease a lot bias. On the Y axis, it's our improvement metric, we calculate it in improvement metric on how closer to the edge of ‑‑ to the edge of the network ‑‑ of the Internet we will be. So the higher the Y axis, the better. The best option would be to be values on the left, the X Axis and on the top on the Y axis. We see there are not that many networks there but still we can see there are different and we can still select vantage points that both reduce bias and, at the same time, can improve our view of the Internet, the completeness of what we see.

So, summarising:

What we have done in the AI4NetMon project: First we quantified the bias, and we believe this is a very important find finding because in this way we can raise awareness about the issue of bias, there are some expert users that may have thought of bias or were aware of but may have not thought of all dimensions and maybe there are some other non‑so expert users that may have not considered this.

So, what we suggest you is when you do measurements, always take into account that there is bias. Going to our tools you can check what types of bias you have and these, hopefully, can help you to better interpret your results, to not get into wrong generalisations or some pit falls.

With respect to the second goal on unbiasing the platforms, our analysis have shown that there is a lot of room for improvement. Hopefully, the recommendations we do, it's first step towards reducing bias, so for example if you see outside there is a board where with most wanted ASNs for RIPE Atlas, we have already conducted the guys so they can take into account our findings in order to find other most wanted AS numbers that could reduce bias.

And I think that's really promising because some other people have started working on this, it's on the MAT Working Group that I mentioned before, and hopefully in the next period some of us, our group or their groups, will have some solid results about how to decrease bias in our measurements, in our infrastructure, and, that would be good for the community.

And with this, I would like to thank you and I would like to ask you if you have any feedback, positive or negative, or any inquiries or any suggestion, etc., yeah contact us or have a question here now and thank you.

(Applause)
.

FERNANDO GARCIA: Is there any question for Pavlos? Please state your name and affiliation.

GEOFF HUSTON: APNIC. I am still left with a number of kind of dangling questions I don't understand I think what you believe is not biased, and there were so many ways in the Internet to think about what is unbiased. If I'm talking about averages, then the average user is probably Chinese or maybe Indian and if I take those two together that's 40% of the user population, and it has no Atlas probes so Atlas is beyond all hope in getting an unbiased view if that's your target. Or are you trying to say that every single one of those 70,000 ASs, 75,000, is equal to every other? And again, that's not really true because 65,000 are stubs with one or two people in it, the other are large, so what do you define as not biased? And why? The other thing, too: 90% of all users use these odd little things called mobile phones and as far as I am aware there's no Atlas probe near it so again is Atlas hopelessly biased because it has nothing to do with average user behaviour. ? I am just left dangling behind this what's your 0 point, what's unbiased for you?

PAVLOS SERMPEZIS: That's a really nice question that hopefully I can give you more insights to this. Starting from your second question, yeah, RIPE Atlas is hopefully ‑‑ it cannot get representative in terms of network type because we always have this thing that it cannot cover mobile phones. And so RIPE Atlas will always have a bias in this dimension of network type, but maybe another dimensions we don't have. And now let's get back to what is representative ‑‑ what is unbiased. Unbiased is to be representative, so indeed the average let's say Internet user is in China, so the good with measurement infrastructure is that it doesn't have only one vantage point, so we don't talk about only the average guy, but if we have ten networks in China and one in Serbia ‑‑ if we have 100 networks in China and 10 in Serbia and we have eleven probes we should put or for an unbiased result we should put 10 and 1 so it's representative.

GEOFF HUSTON: Why don't did you just wait the results by the degree that you need extra weight on some measurements? So if I have only got one probe in China and 500 in the US and I want to talk about the average across the two why don't I either divide the US numbers by 500 or multiply the one Chinese number by 500 to get them ranked up? Why do I need more probes, why don't I just wait the numbers?

PAVLOS SERMPEZIS: You can do ‑‑ you can wait the numbers, so our framework is quite generic, so we did first approach, the way we calculate the bias core, you can do a lot of things, we can parameterise it and one thing we don't node and need to know is how exactly we should parameterise it in order to get the optimal results and maybe the parameterisation and optimality may be different for different use cases, you may need to do a different parameterisation. For this thing you have to do something different. But yeah, this is more complex and it's one of the research directions we are filling because that's only the first step.

GEOFF HUSTON: The takeaway I get, is that unbiased actually depends on what you are trying to measure, it's not an abstract fact independent of what you are trying to measure.

PAVLOS SERMPEZIS: Exactly. That's why our online tools we try to do as much parameterisation as possible so you can go and play ‑‑ play with the results depending on the use cases because there are hundreds of use cases we cannot cover all of them.

AUDIENCE SPEAKER: It was natural that Geoff would partially cover my question so my question had to do, you use a bias metric between 0 and 1. Can you share or do you have documented anywhere the exact algorithm of how exactly you calculate that?

PAVLOS SERMPEZIS: Yeah, yeah, it's in our read me docs, so even the statisticians will understand, divergence, all the other guys think it's between 0 and 1.

AUDIENCE SPEAKER: Developer on the Atlas team. I am wondering if you have a minimum and maximum number of AS, is one probe enough and the other verse, is 100 probes too many? Is what is the range in order of representation for probes?

PAVLOS SERMPEZIS: We didn't look at this so we considered that if an AS has at least one probe we know know it's covered, for some cases it can be wrong but in our initial investigation we wanted to consider as much dimensions as possible, and because all these dimensions, the data we have for these dimensions are available we don't have smaller granularity how ASNs are splitted within the intra topology, that's why we started with at the AS granularity so we considered its network having a probe, it's covered.

AUDIENCE SPEAKER: Thank you.

FERNANDO GARCIA: Okay. Thank you, there's no question.

(Applause)


Next presentation Konstantinos Zorbadelos

KONSTANTINOS ZORBADELOS: Hello, everyone. It's always a pleasure to be here. I am Konstantinos Zorbadelos,s working as a lead network architecture in CANAL+ Telecom and this is a presentation about a practical network automation tool so from the academia, the academic presentation of Pavlos we will go to something very practical so hopefully this will be interesting to you.

So more or less the outline of the presentation, first of all of course we need to have a problem that we are trying to solve here. Then, we will describe according to the problem what our design goals, what were our design goals with the tool. Then, we will give a presentation and an actual pointers to the tool, which is Open Source. And then we will switch a bit the discussion to some operational considerations, what can ‑‑ what actually issues, or not, by using such an automated way in traffic engineering.

So, what is the problem here? Suppose you have an IP network with multiple points of presence, geographically dispersed, many POPs, transit providers, peers, internet exchanges, private interconnections, whatever, of course you also have in these cases varying costs in transit capacity, submarine capacity might also be involved and as we all know in traditional IP networks, it is actually inbound traffic that is dominant in the network. So, we need to have a way to optimise the incoming traffic streams and distribute them among available capacity. That's number one, optimisation in economics, we would say.

But on the other hand, we also have emergency situations, we have situations when we need to divert traffic. This is first of all a security incident, it DDoS attack where we need to immediately do an action, or we could have a failure before my current role I had very little to no experience about submarine cables, I thought there were pretty stable, oh boy, do they break. That's a thing.

So, this is more or less how CANAL+ Telecom network looks like. We are a small company, but we serve the French overseas territories, and as such, the network is, I would say, global. From a small island in the Indian ocean, to the French islands we need to cover all of that, so we have POPs in various locations, plus a lot of P and Is and trance its.

We have done our best to document our network in PeeringDB, of course, as good citizens, so you can see we have various transit providers in various locations, and plus a lot of presence in various internet exchanges to accommodate that, so a very small company, I consider it a very interesting network, in general.

So, what were the design goals of introducing traffic engineering in our inbound traffic with an automated way?

So the main point is manual configuration on the routers is very cumbersome, especially if you manually configure the network, you will definitely have inconsistency in configuration here and there, so the whole operation generally is error‑prone. On the other hand, there might be a lot of routers that are involved in an incident; you might need to do an action to many routers at a time so realtime reaction, let's say, is not possible if you do it manually.

So, we would need a tool where the configuration could be performed by network operators, but in an easy way, or it could also give an app I or an automated way for a programme without human intervention to do the traffic engineering. Ideally the tool should be network vendor neutral because in many networks, there are a lot of vendors involved. So, we need to do the traffic engineering quickly in general, do it reliably, do it without any errors, optimise our economics and provide some automation mechanism to react realtime in extreme cases of failure, either security incidents or failures in cables or whatever else.

So, here is what the automated tool does and how it looks like. Of course, we are talking about BGP here because BGP is the only exterior protocol out there. It is already used extensively for traffic engineering using various tricks, they would call it hacks, or whatever, and the tool's purpose is to automate the BGP announcements of a network, to various peers, and this affects how traffic flows inbound.

So, our solution, the design:

There is a central configuration point. There are some sources of truth, the sources of truth are actually databases, it's a very nice fancy term to use instead of databases. So they represent the intended state of the network, how our peerings and our announcements look like, and out of these sources of truth, the tool generates a standardised BGP policy configuration and we are talking especially about the outbound policy in this context here.

The idea is that we tag the prefixes we announce with BGP communities and this tagging affects the policy, so the only thing an operator needs to do is to actually think what kind of tags needs to put in a prefix, tag it accordingly and force the state.

And hopefully the tool gives all the necessary flexibility to do the traffic engineering tricks operators do.

So, we based our design in ‑‑ we tried to use standards as much as possible. So in terms of tagging, of community tagging, we utilised the BGP large communities. Large communities is a rather recent, we would say, within quotes, but, okay, 2017 RFC, and it gives you a functionality to express policies with 32‑bit ASNs. Large community is actually a 4‑byte integer that looks like the community I have on the slide. It overcomes the policy limitations that you would have with 32‑bit AS numbers and very nice thing together with RFC8092 that defines the large community attribute, there's a peering RFC that gives some very nice examples of policies, of how you could utilise large communities to design policies in your network. So, we actually have an IETF blessed way to create policies, we utilised that in the design of our tool.

So, according to this information on RFC, the second number inside the BGP large community is actually a function identifier, and generally speaking a communities are split to two different categories; you have either informational communities, where, for example, you tag a route depending on its type, if it is a transit route, a peering route, whatever, where you learn it from, the geolocation that you learn it from, all of that is tags that gives us some information about the route and you can have also action communities so by defining, for example, here this community with 40 as a function identifier, you can express the policy of do not announce a route to a specific peer.

So, the supported traffic engineering actions in the tool are actually actions depending on an individual peer or a geographic location so you can choose to announce or not announce a prefix to a peer, prepended, 1, 2, N times, or do the same thing in an entire location, either choose to not announce a prefix in a location or announce it, prepend it, whatever, all the tricks that the operators do.

What do we use as sources of truth in the automation tool?

Very well known Open Source tools. We use net books as our IPAM system so this actually holds our prefixes and and in there we have our BGP announcements so all the announcements plus their tagging with large communities are in there, and then we utilise peering manager to document all our peerings, all our eBGP peerings with third parties, either P and Is, internet exchanges, transits, whatever.

Now, as I mentioned, the purpose of the pool is to get the intended state out of the sources of truth and generate the necessary configuration for the routers, so configuration regarding the BGP announcements and the outbound BGP policies, and then force it in on the routers. We have, again, a bunch of very nice tools, Open Source tools in this area, we chose what we think is a combination that is used from a lot of other players as well, so the whole thing is based around Python, it's Python‑centric, we use NAPALM as the main configure management engine salt which provides a lot of very interesting features the development was actually based on these Open Source tools.

There is a GitHub repository, I spent a lot of time trying to document what I need so, it is published on GitHub. This repository actually provides a simulation, a network simulation and we are talking here about an actual vendor IP network, so it is not, let's say, some sort of academic pool. We tried to simulate a production IP network as close as possible to reality. We used Docker containers for that and Docker compose and we utilised goBGP in our top polling to be able to test the tool. The current implementation of the tool supports currently only Juniper routers but hopefully the ideas of the tool are general enough so they can be ported to other vendors as well. And especially if you can contribute and provide extra vendor support, that would be highly welcome, of course.

Now, this is how the tool looks like from users' perspective, let's say. We document all our peerings on PeeringDB in a web interface. We go to the IPAM system and we tag specifically the announcement we are interested in with specific large communities, and then with one command we apply the state to the network and magic happens, hopefully without errors.

How did the production network roll out, because we currently are using this tool in our production network?

There was a gradual deployment inside our AS so we did it router by router and location by location, which was nice, which was very controlled in general, up until now generally the deployment was smooth, and the tool currently handles our 260 peerings in six diversion geolocations and the six internet exchanges we peer. Hopefully, it can scale to much bigger numbers, generally speaking we consider our network rather small to average, compared to big networks out there.

The experience:

So, the implementation, I wouldn't say it had great difficulties. Most of the time was spent in design, thinking what are the needs, trying to accommodate in the design on possible cases, be flexible enough and stuff like that. So 90% of the time was actually low level details thinking, plus preparing the network in order to introduce the tool. Of course the operations now require a paradigm shift to us in all automation cases.

Now, having an automated tool is a very nice thing in general, but there is also the other side of the coin; there are some considerations in operating your BGP announcements in an automated way. So, what are these considerations? .

In general, I think all or most of the participants in here are good manners citizens on the Internet. They appreciate security, they appreciate the clear documentation of route policies, contacts, and all of that should be documented in IRR databases in general. But what do we do about on demand announcements? If you need to react to an event and your entire address space, you need to do an action in a very specific portion of your address space, what do you do? Do you pre‑provision everything in IRR, all routes, all possible route and Route‑6 objects? This is not practical, most of the times.

And of course, if you try to update the IRR on the time you need it, it's too late, you might have ‑‑ you are not agile enough to be able to respond to incidents. And on top of that, we have the next generation IRR, we have the RPKI with its radios, which is another source, of course, of validation.

Now, there is a very nice attribute or not so nice, as we can see, inside radios, it is the max length attribute and (ROAs) can give us the next flexibility for traffic engineering, but in as in all things in life nothing comes for free, there are security implications with using too flexible ROAs, let's say; you are susceptible to what is called a forge origin prefix or sub‑prefix hijack, and there is a very nice RFC, a very nice informational RFCthat got its number, if I am not mistaken, within this week, so I had to adapt a bit the presentation, it used to be a draft at the point where I was writing the presentation but now ‑‑ it got its number and it is a best current practice document, so it actually gives a discussion and an overview of what are the implications of using the max length attribute in ROAs, and of course these kind of forged origin prefix or sub‑prefix hijacks cause greater harm for non‑announced address space and the general idea and the recommendation out of this is that you should use minimal ROAs and your ROAs should reflect the current operational state of your BGP announcements. Very nice, in general.

What are our experiences with transit providers since we use many in various different locations?

Now, of course, all respected transit providers care about security but they also very varying degrees of flexibility so there are providers where you need to do complete manual communication in each and every case where where you need to do a traffic engineering action. All the way to others that provide you with ability to announce more specifics or do it on demand, whatever. And there are others in between where they generate their filters automatically but not realtime, let's say; it is an action that they do a few times per day or whatever.

From our side as customers, of course we require the greatest possible flexibility. We want to have the ability and the control to do whatever we want with our announcements. So we need to find a way in between.

Now, whatever automation mechanism you use, of course a very big thing is that you need to monitor your BGP announcements, and you need to monitor how your announcements look like from the outside, from the Internet, and ideally, also, how your announcements are configured to your routers. So, for the first case there are various services out there, free, we all know and love RIS live, let's say, and for number 2, the best option is the BMP protocol and especially its variation about adjust and RIB out, this is protocol is very nicely supported in at least one Open Source tool that I know of, and theoretically if you monitor your announcements from the outside and as configured on your routers you should be able to notice ditcheses, create alerts and all that stuff and be able to work. For us it is currently a work in progress, we work together with very nice people behind ARTEMIS, they have a Working Group called cold BGP but there are very nice Open Source solutions out there, there is BGP alerter, a tool that can provide monitoring and alerting realtime.

So, this is all currently work in progress for us.

Now, which brings me to the ‑‑ a to my last slide and the discussion points.

Okay. We know the security implications in RPKI ROA max length but what do we do if we actually need flexibility and operation agility? It is actually the hijack attack which by the way RPKI was not designed to address this type of attack specifically. Is it a blocking point? Should we just create very, very strict ROAs and lose all the ability? Should we have very flexible ROAs and be agile? I think personally the solution is somewhere in between, as always, except from security incidents because in security incidents you never know where you will be attacked, so in there, I think there is ‑‑ there is a point for discussion, to say the least.

So, I don't know if any best practices out there for operators and transit providers, specifically on how to address the dynamic and on‑demand announcement issue, but, of course, we are here to discuss, hopefully this is an interesting point for discussion, and something good would come out of it, best current practice document or whatever. With this, I managed to do it and leave three minutes for questions. I am really sorry, thank you very much.

(Applause)


BRIAN NISBET: Cool. Thank you very much. Apparently someone running to a mic to ask a question.

AUDIENCE SPEAKER: Very, very interesting, thank you for sharing. Sorry for my voice, it's from yesterday. Your solution is heavily based on BGP communities and mostly on large communities which is nice. I would like to ask you about your experience after rolling out this solution because from my experience it looks like not so many people always honour the community so you have expectations but sometimes this doesn't work. So, have you roll it out? Have you tested it in real life? Have you see people honouring the communities and doing what they promised?

KONSTANTINOS ZORBADELOS: Thank you first of all for being here after a rough night. So, the thing is that this thing affects your outbound announcements, so all you need to do is have support for large communities in your own network, not to other third parties. And to be honest, the approach we currently use is that these communities have semantics in sight, RA S and we delete it and clean it to our announcements, so we just enforce the policy internally, the communities help the tool to do the right thing in the announcements but on the outside these communities do not appear, this is something that is internal to RAS.

AUDIENCE SPEAKER: Okay. So they don't go to the transit provider ‑‑ they ‑‑

KONSTANTINOS ZORBADELOS: They don't go there.

AUDIENCE SPEAKER: So it's a signal inside S you stop the announcement and you do more specific ‑‑

KONSTANTINOS ZORBADELOS: Correct. You can prepend or not announce and prepend either to one peer or to an entire location, etc., etc. For example, if I want let's say on demand to stop my announcements in a specific location, to the US, for example, with just one tagging and one CLI command, I can enforce that.

AUDIENCE SPEAKER: Alexander, thank you for these fascinating talk, and I just have a comment for your question ‑‑ a question about max length. So, it's BCP, it's just a suggestion, and of course you have a trade‑off, for prefix that we have never seen and flexibility, and it's up to you what you choose, so you have chosen to, for the flexibility as far as I understand. For my network it's also true and it's fine.

KONSTANTINOS ZORBADELOS: Thank you so much, yes. I think just to complement a bit this, currently my ROAs are too flexible and I think there is some room for improvement there. But as you say, I totally agree, it's a trade‑off, yes.

AUDIENCE SPEAKER: University of Twente. One question, because okay, you can use communities and prepends and everything to control where are you actually receiving the prefixes but so you have a lot of different possibilities in case of when you lost a link, for example, do you have some manner to forecast the impact of one policy or another?

KONSTANTINOS ZORBADELOS: A very nice question, thank you. I think this has to do with your monitoring plus capacity planning tools, whatever you have, so I guess this is a discussion for NetFlow, a lot of NetFlow and whether you can have simulated tools to show you how traffic will flow differently. Ideally, you could have such tools and you have pre‑calculated the scenario in advance for various failures and then be ready on the point of failure, let's say, to apply the correct policy. Very interesting.

AUDIENCE SPEAKER: The question is ‑‑

KONSTANTINOS ZORBADELOS: If we have ‑‑

AUDIENCE SPEAKER: Yes.

KONSTANTINOS ZORBADELOS: No, we don't but generally speaking, our network, I would say, is rather simple. Our capacities are somewhere in the capacities we serve are very scarce so we don't have a lot of options. If a cable breaks here, there is only one or two other alternatives, nothing else. So, we just try and see on demand, but from previous failures, we know more or less what we can do and where is the limit of what we can do with our capacities.

AUDIENCE SPEAKER: Thank you.

RUDIGER VOLK: Great stuff, I'm kind of regretting that we didn't, that we didn't cooperate in our previous affiliations. Well, for the next link stuff, I would like to say oh, I'm really happy to see someone finding good use of it, and for the dangers actually, the dangers are one could kind of think that any ROA that is in the RPKI system, that is not backed up by an actual route as an invitation for someone to attack, and whether that is because of many routes, or many prefixes generated via max length or it is some carelessness offering something else, how the max length thing, okay, may be a little bit more interesting, next topic the communities, I remember the times when people were bothering me because my network did not provide external communities, quite obviously, and, in particular, with the large communities available, there is plenty of room of doing the information and action communities, also for outside, an interesting question there is how to communicate your signalling system and I guess we will discuss what has been done on that off‑line

KONSTANTINOS ZORBADELOS: Yes.

RUDIGER VOLK: And well, okay, I stop with that. Thanks.

KONSTANTINOS ZORBADELOS: Thank you for the observations. The ‑‑ this kind communities, of course we use it, as I said, internally in our network and our customers can also use it. The best way to communicate them is properly document them somewhere in a public location.

BRIAN NISBET: Okay. And on that note, thank you very much.

(Applause)


So, our next ‑ first lightning talk of this morning is on ‑‑ it's the Internet in Ukraine, 200 days of resistance. Unfortunately, the planned speakers Svitlana Tkachenko cannot join us this morning, but wonderfully, Oksana has agreed to speak on their behalf. So thank you very much.


SPEAKER: I am speaking on behalf of Svitlana Tkachenko who has problems with health, she made excellent presentation and I am sure that you have to see this presentation, I am not technical expert, I can't answer technical questions but I will do my best to give you any impression regarding the situation in Ukraine.

First of all, today is 247th day of full scale aggression against Ukraine and it's year ‑‑ in its year of war.

On February 24th, we awake enfrom bombing, it was unbelievable, but it happened and since that we had huge experience, very unexpected for us.

Svitlana Tkachenko works at Hostmaster company, this is registry of our .ua CCTLD. I can't comment a lot of all this information, but I have to say that on December 2nd we will celebrate 30th anniversary of .ua and usually Hostmaster organise very interesting conference with the most progressive information and messages and all developments.

What happened with .ua after the beginning of full scale aggression, first of all they did their best to secure their people, as far as I know they have evacuated their stuff to the western part of Ukraine, they secured all information, all equipment, and they met a lot of financial problems but, as far as I know, they managed them rather well.

This is pictures from our Internet infrastructure. I have to say that key he have are all Internet and mobile equipment was just destroyed, for example a sister of my very close friend lives in Buccia and did not communicate ‑‑ did not have any communication tools with her for 35 days. She just wrote paper notes and gave them to all of them who can ‑‑ who managed to evacuate and some of them called on the phone when she wrote and said that she alive.

I don't know what to say because I don't understand what does it mean. It will be in e‑mail and maybe you can send your questions and she answer it.

Before ‑‑ from the beginning of full scale aggression, we met problems with Internet connection and mobile connection, only unoccupied territories and use help was Starlink, we received 25,000 of terminals and this was the best way to restore immediately connection in the occupied territories. Maybe you know that Elon Musk even cancel for Ukrainian representatives then it was ‑‑ another development of situation but now, we, again, do not have to pay, it seems to me for three months or something else, we were using Starlinks.

After October 10 we met huge problems with connection. On the whole Ukrainian territory because our energy infrastructure was nearly destroyed for 40% on October 10th and then on October 22 and in other days, so damage for our energy infrastructure is huge and that is why our operators can't deliver connection.

Now, this is the picture of our place. Usually we have lighting, how to say, sport which is seen by a lot of kilometres from Kiev but now this is the picture of after our blackout. The only advantage just now is diesel generators and representatives of our Ukrainian generators asked everyone to help with these generators.

What would, I like to say, it's extremely important for us users to see how Ukrainian telecom reacted to the full scale aggression. For example, we have three mobile operators which in peaceful time fiercely competed with each other, but on much as they launched nationwide roaming and each user can call without any additional fee to other operators and use Internet services from other operators also.

Ukraine received a huge amount of help and support from international community. I know in person about the Global NOG Alliance, and to thank them, thousand of thanks to all of you and to them and I know about smaller initiatives which are not so popularised but it's about ‑ I know in person nearly about initiatives which collect money, equipment from abroad and sells them to every operator, each operator, each, how to say, military division and so on, thank you very, very much.

Keep Ukraine connected, it's a very well known initiative and, again, thank you very, very much.

And please, diesel generator, and maybe some final slides. I would like to comment, be part of the global Internet community, Svitlana Tkachenko are fellows and we highly appreciate RIPE not only for financial support for building very nice ‑ for us because all airports in Ukraine are destroyed, we need train or buses to go from Ukraine but even if this is fantastic we spent five hours on Ukraine ‑ on the shelling and it's not easy for us to come and deliver to you all the information regarding situation in Ukraine.

This is e‑mail of Svitlana and if you have technical questions write to her and I will be happy to answer not technical questions.

(Applause)


BRIAN NISBET: So do we have any not technical questions? We have brief time. No, I think, thank you, again very much for that.

(Applause)


Our second and last lightning talk of the session, Daniel Wagner, telling us about the wonders of actually doing things with P4, I think is ‑‑ the exertions, indeed, thank you.

DANIEL WAGNER: Good morning. Nice to see you and so many smiley faces that early on Friday. I am a researcher at DE‑CIX and today I am going to talk about two year journey of setting up bleeding edge technology at IXP.

First of all, what is this experience talk aimed to do, it is here to tell you about what it took to set up the hardware and software stack of P4 switch, if you wondered about how this looks like, this might be very informational for you and especially with regard to the process itself because there is much trouble you might avoid when you follow carefully. I will not explain what P4, I will not show any code, I won't evaluate anything just to show off the performance of it, what you could do with it, it's all up to you to look up the source code yourself, our evaluations, it's all there. I do not aim to bash the hardware window, we had trouble with our P4 language itself.

Maybe a short motivation, why are we looking into it and what do we do with it? Basically P4 enables programme plane availability, lots of advantages, but still comes with some restraints. You can build your customs protocols and solutions especially for us at the IXP this might make great sense. You name it, do everything you want with it, basically.

So what do we need to get this started:

I will show you the whole process of all of this with this lovely timeline, just to keep basic track of the timing and how long it took things to fall into place. So, somewhere in February 2020 we started with purchasing these switches and they got a couple of weeks later delivered. Coronavirus kicked in and it was hard to get physical access to the machines and it took some months to get the first hands‑on with that. We have got those switches and it's not like you put power in the one hand and packets or transceivers through the other, it's not the case. You fire up the switch and it's not like showing up, nothing, and so we were pretty much left without a clue how to use these devices but luckily there's this help desk for us to help out. First ticket here we go. Hey there I am looking for some documentation on how to set up P4, can you please help us? And the help desk said sure, we have a got a couple of scripts for you, here is the link to the knowledge base, you are happy and welcome to look this up and you need to get some barefoot software development environment kit from barefoot, from another vendor so they do the P4 things itself and the hardware window we were talking to they designed the packet for the whole switch where it is living in.

So, I did all of that, and by running the Python script I found out on the switch that there is no Python installed in open network in Linux, so ONL is shaped by the hardware vendor itself and I thought this should have a reason so they shipped switch with particular open operating system but the switch ‑‑ the scripts they design for it do not run on it so I said I can't run your scripts, what should I do? They said if you don't have any special reason to use open network in Linux, use uBuntu, just UBuntu and no further information on that. With the timeline I got in July all of these software pieces that I allegedly need in August ‑‑ I went ahead and tried change the operating system that was installed on the switch which was quite not too easy because of some reasons, it's not easy to install another operating system on a switch using a consul interface because they don't have any, I don't know, VG 8 power to look at what is actually happening. Before I did that I tried figured out what Ubuntu I have to use myself because the help desk wasn't too informative about that so I was looking for a compatibility list and there was none and I took a Ubuntu image I had, and why doing the set‑up of this I found arrows in the knowledge base and reported it back to the help desk and these fixes that I provided were silently integrated into the knowledge base without any paying me credit for that but here we go. Then the script could run, great. Do I now have P4? Well I don't know, I don't know how to test this, so another support ticket. Hi support, I found out there is a mismatch between what your software says and what I have physically inserted into the switch, maybe do loop and see what happens. And apparently this software is supporting something completically different from what I have set up, I said there is a mismatch in your software between what I have done and what the software reports and was done. They said okay, ports 1 to 16 are always in the ready state because of gearbox it's good to know and you might install switch extraction layer. Okay, so I installed another piece of software and I found out that helped out nothing. And the support said, can I maybe have access I want to see what you are doing there? So here is the timeline again.

I got access for the help desk on to our switches and after that the company was bought by some other company. So the hardware vendor went into mergers and acquisitions, the ticket was never closed. Okay. It happens. But we have like more things to help us out because we have friends, they have the very same exact same switches and same hardware vendor, how did you do this? We have a problem it looks like that and we said yeah, that's exactly what we have too, great. So any progress? No they don't have it too. Interesting. The days passed. In March I said I don't want to wait anymore, let's fire up our networking department, ask these guys can you please help us us out with our particular switches and we have got problems and a colleague of mine was so kind and borrowed me a light metring device, you can insert into the transceiver and see what is actually happening and apparently all the parts are in re‑set state, why is that? They should ‑‑ they are not supposed to be in re‑set state, they should be ‑‑ the transfer receivers they reported very highlight metres. So, I took the documentation, which is quite nice, so with the low level details and I debugged the registers where the actual information about how the switch should use, make use of all of these ports and I found in the documentation that there is register which is a reset negative so this is a low active register meaning if it's a logical 0 in it the reset state is enabled and I thought okay I have got the address and my SPI driver, see if the switches are ‑‑ if the parts are freed from the reset state and reading from it, so that didn't happen, okay, maybe I was too naive on that.

In July, a couple of months later a new help desk comes to life, so I went ahead and wrote another support ticket. I was very confident in writing this. Your support package puts all ports in reset state so the switch is unusable. I was very confident. Let me check. Too many some time. Have you seen our new support package version? I haven't, thanks for that. I tried to install it, all the scripts are running, and apparently this port support package deals with modules that are just incompatible with my kernel. We got more precise asking okay, what OA are you using? I have some uBuntu lying around, have you seen our OS compatibility list? Oh, okay. So I installed a uBuntu version which was officially now compatible 18.04 LTS with kernel version 418. Now we see the stats are fixed, all right, cool, the ports 1 to 16 with gearbox are still weird, I don't use them, I go with other ones, there are plenty of them. And loops that I configured they came up, so operational and administrative up, which is great, so okay I have lights, apparently it appears to be working but for the links to the servers where I am doing my stuff, injecting packets and see what happens, remain down. The same day got an e‑mail from our friends said have you seen new POP package, it appears to be working, I did that I guess. As the links to the servers still remain down, I have got another support ticket. Well they remain up/down, so operational down, do you have any idea on what transceivers I am supposed to use maybe? Have you seen our new transceiver compatibility list? Yes, thank you. And then they by themselves, once again, they followed up on their own ticket. I said, ah and you may play with the layer 1 parameters on the switch. Okay, thank you. We got the right transceivers, I played around with the layer 1 parameter and thank you, that appears to have been working. The link to the server is up/up. I could learn doing P4 code and debugging and writing and I would blow up the scope of this talk. Then in March 2022 we have got our proof of concept running with simple ICMP ping, took about two years. That's a summary.

Now today we have got this more or less research operational I would consider it to be, with the OS uBuntuy 18.04, exact versions as well as the software development kit and this switch abstraction layer I do not use it, maybe you want to take take a photo of that combination, that's how it is deployed today and working and the documentation it was evolving while we were struggling and reporting on arrows on it and it might be a better option upfront to the wedge switches which are used by the larger hardware vendors instead of going for the ones that we had. This is a graphical visualisation of how it looked like in 2020, this was quite a mess, we didn't know what to do and throw everything in and see what happens, things are pretty much more tidied up so it looks more nice.

Last thing; what do we actually do with that, is, we pro toe typed a whole P4 pipeline for IXP solutions and there is a publication at sea come at a.m. /STER come to where I have been talking about this. With this information I am happy to conclude my talk and if you have any questions feel free to stop by and drop a question, thank you very much.

(Applause)


FERNANDO GARCIA: Thank you. Any questions? No. No questions online also?

BRIAN NISBET: No.

FERNANDO GARCIA: Thank you.

(Applause)


BRIAN NISBET: So that's it for this session, we would remind you all to please rate the talks so we get more information on what you all like, thanks to all our speakers this morning and the GM results will be in this room in 10 minutes or so, and then we have the closing plenary session starting at 11:00 so thank you all very much.