25th October 2022
WOLFGANG TREMMEL: Grab a seat.
MASSIMILIANO STUCCHI: Hello, everyone. It's 2 o'clock, this is Swiss time to start, I live in Zurich, so let's start I am MAX and I am here with Wolfgang and this is the first session for the afternoon on Tuesday, plenary session. Welcome.
WOLFGANG TREMMEL: I would like to say something, yes, please. First, there is a PC election still going on so if you are interested in had joining us at the Programme Committee, just put your name forward, send an e‑mail to pc [at] ripe [dot] net and also please rate the talks afterwards.
MASSIMILIANO STUCCHI: If you would like to stand with one of us at the next RIPE meeting in front of everyone else and embarrass yourself, please run for the Programme Committee. Anyway, today we are here for a very interesting session, we have first of all Ivana Golub who is going to present about the work she has been doing at GEANT and all the latest and greatest, so please, thank you.
IVANA GOLUB: It's a pleasure being here with you and having the opportunity to present our work, so I work for super computing and networking centre and today I will present network technologies and services development in the GEANT project. Let me first set the scene with a few information about the GEANT project and we will look at different topics that we are working from the perspective of its maturity and readiness for use.
GEANT is association on ‑‑ of European national and research networks, and its vision is to ensure equal access to infrastructure and resources to scientists and education and research institutions to, to all infrastructure and resources in different services. It does so through several different ways, one of them being through the GEANT project, which is a part of European Union Horizon 2020 research and innovation programme. This is currently seventh generation of GEANT projects which is called GEANT 43 and it ends by the end of this year, but then we have next one scheduled to start from the January next year for two years.
This is the project structure, it is a relatively big and complex project which has nine separate sub‑projects and the one that I will be presenting today is what we call work package 6 or network technologies and services development project which I am leading together with my colleague Tim from, some of you might know from his IETF IPv6 work.
In our work package or in our project we have more than 30 different topics that are organised in three groups. One is network technology evolution. The second one is network services evolution and development and Maria Isabel is here who is co‑leading this second project and monitoring and management. So, I will be presenting these topics and I understand that not all of you are coming from research and education environment, so for you, this might be of interest either as production services which are ready to use or, for example, as some work that we have completed that you might learn some interesting points from that maybe are not your first topic to work on or just to take as an information.
So, I will be presenting individual work from the perspective of their maturity. We have currently seven production services, but I will also present for topics that we are currently working on, research and development from the perspective of its usefulness for our community and some work that is so far completed. More information about any of these things you can find on our WIKI and you will see this URL copied in several of our ‑‑ of the slides.
So, when I say production services, what does it actually mean? It means that each of these services are run in an operational environment, so it is not like it is just developed and it works somewhere but it really works in an operational environment in somebody's network.
Also, this has passed several sets of independent audits from different teams, including co‑security and quality audit, IPR check, GDPR, then also service definitions so it's clear what the service actually is. Cause benefit analysis is in place and also business development and roadmap and here is the list of these production services, it's perfSONAR which I will not talk too much about because Andrijana and Katarina will have a presentation about it right after this one, performance measurement platform, service provider architecture A R G US and time map.
To say that perfSONAR is well‑established toolkit for monitoring development and support is done within the global team, with in the university and University of Michigan from United States, R and P from Brazil and GEANT and this map represents the current deployment for us known installations although there are much more.
PerfSONAR is also installed on a second small node and distributed in had Europe and we call it performance measurement platform which we find of interest because that allows our users to explore the performance of the GEANT backbone and at the same time experience perfSONAR on small nodes so the links at the bottom, the first one will provide you with the link to where you can see the current status of these measurements and the second one provides more information.
Network management is a service is a platform for network management which provides portfolio of network management and monitoring applications and allow you to set up a per user secure network monitoring infrastructure. This can be useful, for example, for some projects that have some equipment that are bought specifically for that project so as such might not fit into any existing monitoring systems primarily because of domain administration and not that much technical overview and in that case, it might be useful to have your own instance just to monitor that set of the equipment.
Here, you can see the list of applications that are currently made available through such network management as a service platform, some of them being well known like Zabix, InfluxDB etc. But also some of them from our project like perfSONAR SPA.
Wi‑fi MON is a network monitoring and performance verification system that actually works in two ways, one is when it's installed on hardware probes and another one is when it is used for Crowd Source measurement, so we have also here in the room ‑‑ somebody who is working in the team, so any additional information you can either ask me or her. Some of the features, it supports IPv4 and IPv6 and wi‑fi MON analysis server is also available so you might be able to folks some actual performance monitoring and not maintaining components additionally, and it is suitable for monitoring which is very important in our community.
Service provider architecture platform is a platform for digital business and service management. On the right‑hand side, you can see a set of components, it has a service category, service and resource inventory, CRM, it enables order management, service orchestrating, service activation, etc., and it is ‑‑ it allows flexible service management and components base scalability. It is also compliant with the TMF, its APIs are compliant with TMF, and also compliant with open digital architecture. Saying that, this picture on the right‑hand side actually presents the architecture of SPA when mapped with the TMF open digital architecture so in case you are familiar with the OD A, this might actually be easier to understand the architecture of SPA. We have also made available two aspects of SPA, one is SPA inventory in case you would like to see how SPA inventory works and to assess if it might be of use for you.
And another one is SPA for the e‑line service, it is in production for GEANT connectivity services so to set up and manage circuits without much of administrative and manual work.
Argus is a tool for alarm aggregation and correlation and we found it useful in situations where operational teams might have several sets of tools, each of them have some alarming system and in order not to be forced to have a screen per tool, Argus allows them to gather all such alarms into one system and then choose whether they want to present it on screen or send it to the operational teams via ISM, e‑mail, SLAAC or any other travel ticketing system. If you are interest to learn more we will have Argus info share on 28th November.
Last but not least, time map service is new one from two weeks, it allows backbone per segment latency and jitter monitoring. It is actually software that allows you to gather measurements from your core network routers, in this case it is Juniper and and presented in a better map way like presented on this picture, and the architecture of timemap system is also provided here on the slide.
We also are preparing three new services for production, which means that they are undergoing all of these preparations that I have mentioned earlier. They are router for academia research and education, GEANT for lab and income academy. RARE is an Open Source routing platform that is used to create a network operating system for commodity hardware like a combination of a commodity hardware with Open Source software that you might use for some specific use cases which we are calling white box. RARE uses FreeRtr on a control plane so in the documentation and in all the other references we very often mention this as RARE free router.
So, RARE uses P4 on its data plane and at the moment it has just free router on its control plane but covers different data planes like BMv2, DPDK and XDP and we are aiming at exploring all possibilities that could provide similar approach and we are trying to integrate this with RARE.
RARE has a lot of features, and this list is being constantly updated as the development is ongoing. There are a lot of things related to interior routing protocol, data plane forwarding, external routing protocol, link local protocol and network management and on this page you can actually see how it looks like on the WIKI page, so there is a complete feature list but like 20ish of similar screens and also for each of these categories, you can see the current status, supported platform and more explanation about what exactly does it include.
After RARE was implemented or developed we were looking for a possibility to deploy it in really real operational environment so we bought four switches that we distributed in four cities in Europe on the GEANT network. In Amsterdam, Frankfurt and Budapest and we foresee two possible ways of how these might be used. One is to install RARE and actually perform testing on the RARE software, but another is to use these switches as primarily like bare metal switches that for clean legislate environment so anyone might be able to install their software and use on these switches so that is the primary usage of the GEANT P4. But very soon after, it was implemented we received a lot of interest from people to connect to such to GEANT P4 lab, so first starting from European NRENs like switch, HEAnet, but then also we received request from R and P in Brazil and star light in the US so this is actually how this looks today, so it is really ‑‑ it has become a global lab with more than 20 locations. The picture on the right‑hand side of the map is the one that is set up for super computing demo that will take place in November in Dallas and here is a use of different use cases, how this was used. Apart from the tests that were ‑‑ that were done as a part of the GEANT project, one of them including topology monitoring with BGP LS, we had also project with the federal university in Brazil who has tested their solution called PolKA for segment route and for IPv6 flow label and with Juniper AMT multicast solutions.
Last but not least network e‑Academy was for automation and virtualisation in our environment, and we are doing so in several aspects, one is the training programme where 25 modules were created and published on glad eAcademy portal, this can be done using either social media accounts and it covers different topics at different levels of complexity, so this training map actually presents which path one might take either related to a specific training area or to the complexity level of individual materials.
Another area is architecture analysis and mapping. We realised that it might be much easier to understand somebody else's architecture if it is looked at from the perspective of some reference architecture, so we have chosen TM Forum Open Digital Architecture, the picture that you can see on the right‑hand side, and also performed mapping between that architecture and any of other architectures of our interest, starting from European NRENs and a few networks like SPA, 5G. So if you are interested in how your architecture might look like mapped with the please let us know.
Then, another thing that helps us understand each other better is the terminology in the area of OAV, the first version was published and it is available online, but we are currently preparing the second version which includes the terms from AI and the maturity model. Maturity model is work that is completed recently, it was published last week, which ‑‑ and it is self assessment survey that actually helps you recognise the current state of your organisation or network from the perspective of OAV and to be able to recognise the areas where future progress might be possible. There are four dimensions that are covered: Architecture and technology, processes and services, vision and strategy and people and organisation and each of them has several sub‑dimensions and for each of those, there are six stages defined, starting from none, ad hoc, use case based, integrated, proactive and self. So it would be great if you would be interested to take the maturity survey yourself. It's available on the link in this presentation or contact us for more information.
Last but not least, a few areas where we have ‑‑ we are currently exploring its importance for our community, it's network telemetry, quantum key distribution. First work is related to P4‑based flow monitoring where a software is developed to analyse the code and analyse system capabilities from the perspective of a number of flow and packets that can be processed in the network and the flow accuracy. Results are currently being summarised, it will be published at the SIG‑NOC meeting in Paris in November and followed by the white paper.
In‑band network telemetry using data plane programming, several points are developed in source, transit and sync node including FPGA, DPDK and BMv2, it is run over a production network in tests in six cities over five countries, and results are summarised in white paper and they will also be for share this year.
Optical time and frequency networking is assessing the possibilities in use cases in solutions to implement time and frequency services in European NRENs and so far results are also summarised in had these white papers and past info shares where you can actually see how such services are implemented in individual European NRENs.
Quantum Key Distribution: There will be a BoF this afternoon which much more information about either what is done within our GEANT project or within PS and C where I come from, and some activities that we are currently working on are ‑‑ knowledge sharing in different areas, testing on simulators, building the community, etc.
Some work that is completed so far, which means that some tests are done, it is documented and the work is parked for now, so we do not plan to work on this at least in this project. The one is white box where we looked at different use cases that might be of interest for our community, to use white box switches with some Open Source hardware, use cases in scope were like customer premises, equipment, internet exchange point and data centre.
Data transfer nodes, we run a survey to see what might be of interest for our community in that area, perform some tests on hardware tools, optimising DT N configuration and with this we have closed the work.
Campus Network Management is a service, this is becoming a hot topic primarily in the Nordic countries where they are being requested to actually do such services for their universities instead of universities managing their networks themselves. So we have created a service definition checklist for those that might consider providing such services and also held a number of meetings where individual organisations presented either their use cases or their solutions for this particular service.
As I said, this project finishes by the end of this year, but we have already prepared the next one, the next one starts in January next year and will last for two years. Most of the topics that were presented today, apart from those that I mentioned apart for now, they will all continue, but we will also have a new work like deck lab which will summarise the testing facilities we are providing and incubator which will allow us to start any new work during the project time.
So, I would like to mention two communications collaborations groups that we are very much involved with, the first one is global network advancement group which is a community of research and education network professionals worldwide that are working together to align their resources and achieve efficient global interconnections. Their work is done in several working groups, auto GOLE/SENSE, GNA‑G routing work, GRN connecting offshore sutdents, and the two that we are communicating much more are data intensive science for the RARE work and they also participating in the global P4 lab and network automation, they have already accepted our terminology document and we are working together on that academy.
Another one is special‑interest group, network operations centre or SIG‑NOC, and Maria Isabel is chairing the steering committee, so this is an open group for network operators to exchange technical and business orientated information, knowledge, ideas and best practices, so it is similar to NOGs or insofars that RIPE is also support (N O Fs) focusing on networks but not limited to our next in person meeting is 16 and 17 of November in Paris.
So, I have listed here a number of upcoming events that we are either participating to ‑‑ in with our topics, or co‑organising, and Branimir will talk soon from Croatian NOG, we hope relatively soon it will have both of the pages. Then, SIG‑NOC, GNA‑G community meeting, two such, one in the morning and the other in the afternoon to accommodate as many time zones as possible. In band network telemetry info share, Argus info share, Quantum Internet hackathon that Vesna will talk more about this afternoon and we are organising this with RIPE. One thing I would like to mention is that we plan to somehow celebrate together the World Quantum Day on 14th April next year, so if any of you are interested either to participate or to co‑organise with us, let us know.
This is the last slide for today, so have some time for the questions. So on this page you can actually find a lot of information about our work including all the white papers that were mentioned and code repositories and other documents so please, check and contact us if you have any of the questions and I hope that I have left enough time for a few questions. Thank you very much.
WOLFGANG TREMMEL: Okay, thank you. Are there any questions? It does not look like it, so ‑‑ there's one in the queue: How would you like ‑‑ it's from Christian from ‑‑ how would you like Atlas probe with your probes? I think that's perfSONAR
IVANA GOLUB: It can be for perfSONAR, most of our organisations or participating organisations and NRENs already have RIPE Atlas probes so this is already implemented in individual national and research networks. The question about ‑‑ and also as you mentioned perfSONAR, perfSONAR is also implemented, there is one node in RIPE, so we are collaborating with the RIPE on that. The difference between maybe Andrijana will talk more about that later to understand for you better the difference between RIPE solutions and Atlas probe and anchor and the perfSONAR, they are similar in a way that they both aim at providing more information about the network itself, in the case of perfSONAR you might implement it within the domain so nobody needs to know about the measurements that you are performing and it can be let's say purely single domain solution. But it can, as well, be deployed as ‑‑ and already is deployed as worldwide solution crossing ‑‑ and also in a multi domain network.
WOLFGANG TREMMEL: Okay, thank you very much. Any more questions? All right. Thank you.
The next two ‑‑ I think two speakers, we start with Andrijana Todosijevic and she is talking about perfSONAR 20 years monitoring Internet performance and we also have a second speaker, right? Katarina Simonovic. Todayed to Hi, everyone. Todayed to it is national research and educational network in Serbia, I work for GEANT where we support perfSONAR that is a tool that I am going to talk about today.
In the picture you can see the global map of research and education networks together with interconnections of GEANT network that Ivana mentioned in her presentation that span not only Europe but the whole world so it is a complex ecosystem that involves national research and education networks as well as commercial operators and involves multiple networks that interconnect and however they are owned and operated by different organisations, so GEANT collaborates with national research education network, in order to perform high performance connectivity.
So this is ‑‑ this heterogenous ecosystem has to work seamlessly from one end to another, so when two users or two researchers would like to translate some data from one point to another, a network is transmitted through multiple networks but issue arises when the performance ‑‑ when there is a problem in the network and when performance is lower than expected, so we need to have a tool that can help to efficiently ‑ the network in such complex environment, so for that purpose, we ‑‑ I would like to introduce perfSONAR to you which is a tool that can provide such measurements between organisations.
So, perfSONAR stands for performance service‑orientated network monitoring architecture and it is a collaboration of multiple institutions that gather in order to build a tool and architecture that can test network performance and can help users, researchers and operators as well and can give some answers to what they can really get from the network.
So, it is very easy to detect a problem in a network when, for example, a network interface is down or a cable is cut but it is not so easy to detect a problem when performance is degraded and lower than expected without no obvious reason. So perfSONAR is the tool that can detect such problems so‑called soft failures and help us fix them.
I already mentioned that perfSONAR is a collaboration of institutions who gather around this development so I am very proud to say that our partners are coming from different parts of the world so we have partners from United States such as ESNet, University of Michigan, Indiana University and Internet 2 and the nationalal research and education network from Brazil and GEANT community, last but not least.
So, when you first encounter perfSONAR, you will look for something that is called perfSONAR toolkit so this is a single installation and you can easily download it. Moreover, we ‑‑ and employments provides image, it is single deployment instance and it is ISO image that contains a custom deployment of CentOS operating system together with tools and services. Moreover, there are other installation options available as you can see from the slide.
It is very easy and straightforward to install one instance of perfSONAR. However, perfSONAR is an implementation, an architecture of multiple distributed instances and nodes, so we support concepts of meshes and mesh is trying to coordinate multiple nodes in order to be able to run a test measurements between them.
So here you can see in the picture, you can see the map and location of perfSONAR node distances that are registered to the lookup service directory and lookup service directory is a special service that Katarina is going to talk about later on, and you can see from the picture that we have more than 2000 reg straighted distances worldwide, however since users can choose whether they would like to register any instance of perfSONAR or perfSONAR into the lookup service we assume that we have almost that many so more than 2,000 private nodes that are not visible in the map.
So here when we are at meshes you would like to present two use cases of perfSONAR. So I have chosen the two problems that happen between the same two points, between Institute For Astronomy, University of Hawaii and Queen's University Belfast in Northern Ireland. They are related to the same perfSONAR mesh and that is owned by Atlas project. So Atlas sends for asteroid terrestrial impact alerting system, and I memorise it, and it involves a researchers that are monitoring the sky through telescopes and two of those telescopes are located at Hawaii and this project involves a large data transfers from Hawaii to Belfast. So you can see that this distance from Hawaii to Belfast is 11,000 kilometres long with a roundtrip time of 180 milliseconds so when the data is transmitted it traverses multiple networks starting from University of Hawaii, through Internet 2 and pan‑European GEANT network and in the end some local network at Queen's University of Belfast. So what was very helpful in this situation is that Atlas project had already implemented the mesh, the perfSONAR mesh so they had time history of the data and they could see fluctuations in data over time. So, I think last year in October researchers reported that something is impacting the data transfers down to let's say 100 kilobytes per second. So, they experienced some slowness in the application work. So they established the path of the data and then they investigated the packet loss. Why packet loss? Because, you know, probably that TCP traffic with high latencies is sensitive to losses. So, then they notice the steady increase in packet loss over time for a couple of days. After that, they investigated the throughput and you can see from the slide that they saw some significant drops in throughput over time, and finally, they realised that one of the eight aggregated links between London and Birmingham was faulty so then the network engineers decided to remove this faulty link from the aggregate and the throughput was back again and data loss disappeared.
So, other use case is related to the same two locations, however I think it happened last or two years ago, when the researchers reported that they couldn't keep up with coaching the data during their measurements so they first searched for the throughput problems, from Hawaii to about Belfast but not only that, they saw that significant drops in throughput from Baltimore node to Jisc Slough perfSONAR node. This is what you are looking at, is the of the Atlas perfSONAR mesh. In this intersection you can see that average throughput from University of Hawaii to Slough was about 3.3 gigabits per second, which was okay, given the fact that the distance is too long. So, the network engineers and network operation centre couldn't immediately solve the problem. However, when they took a look at the throughput they, again, noticed some significant drops from Hawaii to Slough node and at right park you can see some peaks, I assume they tried to tune the TCP traffic a little bit. However, they failed. And they also noticed some packet losses between GEANT perfSONAR node to Jisc node.
Then they decided to contact GEANT network operations centre and network engineers realised that there was a faulty optic again in aggregate link of six 100 gigabit links so between Janet London and London Powergate, so they decided to remove the faulty link from the aggregate and throughput was back again again and packet loss disappeared.
So, I think that was it from my side. Now, I would like to hand over to Katarina and she'll talk more about the architecture of perfSONAR, our current development and development for our future release.
KATARINA SIMONOVIC: So thank you. I am Katarina and I am also working for Serbian academic network and I am also the member of perfSONAR team and I will continue on Andrijana's story. So this figure is presenting the architecture of perfSONAR service so as you can see, it is not Mondaylytic, instead it is consisting of a few building blocks which means that this is an extendible architecture and that you can easily add some new components or some new functionalities to it. So if you have a look at the bottom of the figure you can see the tools we are using to run the measurements over the network, and I am not going to talk about these tools, I guess that you already are familiar with it but what I want to say that you can specify any of it to run the measurements over the network and that we have some mechanism that we are calling plug‑in so that you can easily extend tools options if you are interested in some specific measurements.
So, the next component is pScheduler, which is obviously used for scheduling but also making sure that paths don't overlap, also making sure that all parameters are set correctly, making sure that all policies and all server limits that are set by you are fulfilled, and if you maybe don't want to allow some networks to run the measurements towards your node. So, we are collecting the data from the tools and then we send it to the archiving. We are having a few ways of archiving but maybe the most common one is something that we are calling Esmond. On the next slide I will show you how it work and introduce you to some new ways of storing data. Next to it this orange box is presenting something that we are calling pSConfig and we are using for the meshes that Andrijana mentioned in previous slides so this component is used for support that complex configurations and on the top of the architecture we have this visualisation platforms. It is visualisation layer so here we have some tools that you can use to visual lies different types of parameters, we have graphs, toolkit user interface which is a user interface for single instance and we have pSConfig, MaDDash and finally this green box is something we are calling it lookup service and this component can help you to discover all perfSONAR nodes because every node has the possibility to register to the lookup service so in that way we have the all information about all the nodes that are creating the perfSONAR community. So in this releases of perfSONAR, the default archive is Esmond. Esmond is something that our developer developed and also maintained. It is based on two different types of databases, we are using PostgreSQL and Cassandra which is for storing the measurements if you install perfSONAR toolkit or data management you are getting this type of archive.
So, we needed to replace this database because it is getting very difficult to our developers to maintain it. Also, the stability of Cassandra database is not good and moving to a next version is a bit challenging ecause of the way that the database was integrated within perfSONAR code so our developers wanted to move to a tool that didn't needed ‑‑ that they didn't have to maintain and because a lot of energy is going on in maintaining archive, so ‑‑ and now we have so many tools that can do that by themselves. So ‑‑ so, the obvious choice was the Elasticsearch and also different tools in elastic stack. The Elasticsearch is now widely used in research and education community and also is well‑maintained. So we are using the Elasticsearch in the back‑end, then we are using log stash for taking data, protecting and for then to the Elasticsearch and then we are using a different visualisation platforms for querying the data and then display it into various ways. So Kebana comes with the Elasticsearch but you can use Grafana as well.
I said Elasticsearch but last year the company behind Elasticsearch decided to change licensing model so in that case, it was very difficult for the Open Source projects to use their products so instead of that, it was decided to move on to the OpenSearch, which is like the Open Source for the Elasticsearch so in the 5.0 list of the perfSONAR we are using this database.
And here, this is like the maybe piece of the new software, and we have, first, the tools are making some tests and measurements, then pScheduler with HTTP arc Ivor is sending this to the log stash and log stash protecting it and sending it to the Elasticsearch where archiving is happening and to make it work with the existing perfSONAR components, our develops are developed something we are calling Elmont which imitates Esmond API, what exactly he is doing he is translating queries into queries that can be understandable by the Elasticsearch.
So, I was talking about the major changes in the back‑end but we also have some changes in the front end as well. So we have various pScheduler improvements and some new plug‑ins, toolkit user interface is improved, PS config web admin was also changed and improved and we have some optional packages, if you want still to use Esmond or maybe to start using Kebana.
Here, I wanted to show you that we also moved lookup service API to the Elasticsearch API and now it's very easy to integrate Elasticsearch API with Elasticsearch data source plug‑in in Grafana.
And here I wanted to show you the higher of the information that are stored in the lookup service. So, each node is stored in the lookup as the record and each record has various types of the information. For us the most important information is from type of hosts and maybe the ‑‑ our visualisation tools are using these type to visualise statistic information of usage of perfSONAR.
And here I wanted to show you the part of the new public perfSONAR dashboard, so as you can see, it is consisting of lots of panels and we have information about hosts services, also some general information, also we have implemented some filters so it's very easy to search the information that you need and also this is like public dashboard so here below you can find a link to it and maybe check it later if you want.
So, here is the worldwide map of all the countries where registered node resides and also, you can see here the number of nodes per country. And I think that this is our last slide here. I wanted to show you the way that you can use the perfSONAR public dashboard and here I wanted to show you the node that ‑‑ and the mentioned in one of the use cases, so this is the location of Slough node and there are some information about it.
So, thank you very much for your attention, if you have any questions or comments, be free to ask.
WOLFGANG TREMMEL: Thank you. Are there any questions? Please use the on‑line question tool or make your way to the microphone.
AUDIENCE SPEAKER: Christian Petrasch, DE‑CIX. I have one question. For which reason do you have shown the X stack and not a column based database? Can you tell us a little bit about that.
IVANA GOLUB: Most of such choices are based on ‑‑ so most of such choices are based on the request from community. The map I have shown that I think similar map shows how perfSONAR is implemented worldwide and it is also, it depends on our partners because having their cases and how it is spaced. What we have realised is very much used within our community any of the components so the idea was to enable integration with that tool ‑‑ tool stack but this should not be limited just to that, as perfSONAR itself is relatively open architecture and also pScheduler welcomes any APIs etc. So it can be easily extended and similar communities like WCG they might as well have their own implementations which can then be linked to something else like Grafana, not necessarily Kibana, etc.
WOLFGANG TREMMEL: Thank you very much.
Please do not forget to rate the talks. Yes. So now we are moving to the second part of the session that is about lightning talks and the first one is from Branimir Rajtar who will present about how the funding of the HR NOG went.
BRANIMIR RAJTAR: Let me tell you how the latest NOG in the RIPE family was founded. So the creation NOG, NOG. HR. So, first a little bit of self‑promotion from my side, so I worked for Croatian telecom for seven years, mostly on the jobs of developing of the database services, so both fixed and mobile. After that I co‑founded a 5x9 Networks, a small company in Croatia and currently I am the president of NOG HR. A bit of background of the company; this will become relevant a bit later on. So the core team in the company has over 100 years of telco experience. This is mostly working foretell co‑companies and currently we are developing software based telco products so NFV and similar stuff, like BNG, CGNet and active performance monitoring tools.
So, how do NOG HR came to be? So one of my colleagues was scrolling and actually talking about people in IRC when he was contacted by a friend asking if there's any interest from our side from the community in Croatia to start a Croatian NOG. We had a discussion, we had a call with ‑‑ and we decided to contact colleagues from other companies to see if there is something we could do in Croatia, and there's actually ‑‑ here is the part when the 100 plus years of telco experience kick in, since the community in Croatia is relatively small and more or less people know each other and people fluctuate between the more or less same companies in Croatia.
The first meeting was on 14th April, I actually had to look it up in my calendar, there were around 15 people from various companies in the meeting, Vesna also came to present at RIPE and they did a pretty good job actually, so in the end we decided there is a community of people who want a NOG, who want a place where we can discuss issues, where we can see new stuff and we can do presentations of best practices and basically exchange experiences.
It actually came at a really good time because before Covid the only place we could meet was a Cisco conference, I am not sure if I am supposed to mention companies, but Cisco had its yearly event where it's not actually related towards ISPs but more like enterprise customers but with only conference in Croatia where people from a networking industry could meet, to exchange experiences and basically appear together. So we decided actually what weigh wanted to do first and foremost was to do conferencing so we have to local conference where we can exchange ideas and be up to date with latest technology trends. So we decided at that meeting that we should prepare a statute and we had to ‑‑ when we want to prepare all the papers to register an association because we ‑‑ since we wanted to do a conference we needed to make it official.
The founding meeting was 27th June so 11 founders actually from 9 different companies were there so all the major ISPs were there, Ericsson, Cisco, local computing centre and we denieded on a name, NOG HR and the next step to host a conference. We thought of having a picture right after half the people left, you know. So they didn't even stay for beer beer, which was a tragedy, actually.
So what does it take to start an association, a nonprofit organisation in Croatia? So, in Croatia, you can do most of the stuff online, so you have like smart personal IDs where if you have a baby you can register it on‑line, like register ‑‑ you can get married online and do all sorts of stuff but if you want to start an association or organisation well, no‑can‑do. There's a lot of stuff that needs to be done, so a lot of registries where you need to apply your organisation to, so first of all you have a registry of associations where they look over your statute, where ‑‑ whether you are founding documents and review it to tell you it's okay or not okay, you need to get a stamp, that's really, really important in this time of day, you need to get VAT ID, identification number, which is absolutely different to the VAT ID, it's a unique number unless, you have registry of real owners and nonprofit organisations, you need to open a bank account and hire an accountant. You need to set up a mailing list and I really want to thank Sander Steffan, I don't know if he is here, I saw him in the IPv6 panel but thanks, Sander, wherever you are.
The first conference is scheduled for 10th November. It will be a half day conference from 13 ‑‑ from 1 p.m. to 4 p.m at University Computing Centre. It will b half in English and half in Croatian, for those of you who understand Croatian are welcome for the whole day, otherwise stay half a day. We decided to go with format of six presentations so it will be 15 minutes each with five minutes Q&A, where we want to have a broad spectrum of teams since not only ISPs will be there but also enterprise customers so we want to give really good overview what is being done in the community, in the worldwide community and we want to get a feeling after the conference what are people interested so we can plan for further conferences.
None of this could be possible without RIPE so thanks to Alastair for this, you can see the agenda on the right‑hand side, this is I think you know some of the people on the agenda, who I also really want to thank for their help. But we wanted to cover really our broad ‑‑ broad themes and this is the starting point for next conferences.
If it all goes well, if the interest persists, so the idea is to have bi‑yearly meet‑ups or conferences, we are not actually sure how to call this not really a conference but not really a meet‑up, if somebody has a better idea or proposal for a name, please feel free to look me up after the talk.
The idea is to get to as much people involved as possible, so currently there's only handful of us enthusiasts doing this, pushing the registration and organising the web, organising the bureacracy and the idea is to get people interested in the mailing list, get people working on a Programme Committee for future conferences and to see basically if we can keep this going.
Yeah, hopefully there is interest so people are really generally interested and we hope to have a really lively discussions and really lively conference.
And this is all from my side. Thank you for listening.
WOLFGANG TREMMEL: Thank you. Are there any questions? Please walk to the microphone or use the on‑line question tool.
AUDIENCE SPEAKER: Why do you call this NOG dot HR, I first thought Hungary or something?
BRANIMIR RAJTAR: Because of the domain, the because according to Croatian law we have to have a Croatian name we did it it in a way that translation to English it's network operators' group and we can have the domain NOG dot HR so that's why is the name.
MASSIMILIANO STUCCHI: I think I have an additional question, have you thought besides the mailing list to use some other means of communicating, some more direct? I mean, there are network operator groups who use telegram or MAT most so NOG dotify uses MAT most ‑‑
BRANIMIR RAJTAR: We had a discussion but in the end ‑‑ people were fed up with different means of communication, so we couldn't agree what was the common denominator so we said okay let's do a mailing list so everybody uses and we will see it from there. If the discussion goes lively I would like if it could transfer to some other platform which is more realtime but let's see how it goes.
MASSIMILIANO STUCCHI: Is the mailing list ‑‑ are the mailing lists archives available?
BRANIMIR RAJTAR: Actually Sander is hosting it so it's RIPE mailing list so I'm guessing it's ‑‑ everything is supported.
MASSIMILIANO STUCCHI: Thanks. I don't see any other questions and we don't have any question from the on‑line participants so thank you very much.
WOLFGANG TREMMEL: The next speaker is Lefteris Manassakis from Code BGP and he talks about ARTEMIS that is Open Source tool for detecting BGP prefix hijacking.
LEFTERIS MANASSAKIS: Thank you. I will present to you ARTEMIS, an Open Source tool for detecting BGP prefix hijacking attacks in realtime. It is a tool that has been founded in the past by two RIPE NCC community projects in 2017 and 2019. A few things about myself. I used to be a researcher until last year working for the foundation of research and technology in Greece and last year we started Code BGP which is a BGP observability company that is turned out of ARTEMIS Open Source project. And the theme that has worked ‑‑ the team that has worked on ARTEMIS in the past, ARTEMIS was originally a paper that was published in networking in 2018, and it was combined work, collaborative work between the Foundation of Research and Technology in Greece and the research group of CAIDA in the US, in San Diego, and you see the team of researchers, developerers, designers, that have worked in the past or still maintain ARTEMIS currently. Currently the organisation who is maintaining ARTEMIS is called BGP and we do it as a contribution to the community.
To give you a high level overview what ARTEMIS tries to achieve, we provide realtime BGP observability, advanced BGP hijacking detention and I will explain what I mean by advanced in a bit, it's an open on‑premise tool meaning the operators need to download and install the software on their own servers. It provides custom UI so that you can use it via web interface. And it's a modern software stack that I will show you a slide for it in a bit.
This slide here describes a high level view of how ARTEMIS works and I will try to explain what we see. On the left side, we see the various data sources that contribute data to the system. These are public BGP feeds like RIS live, the RIS live that most of you you know about, also BGP stream which is a combination of RIS live and also route use; it's a project that is maintained by CAIDA.
The operators also can configure and contact their BGP session, their own routers to the system. And the operators on the right side need to configure what we call the heart of ARTEMIS, which is the configuration file, Yaml file where the operators configure their autonomous system, their prefixes, their neighbours and some specific BGP policies that can also be configured such as, for example, no expert policies.
What ARTEMIS does is compare the configure file against the data it receives from the public feeds by the monitoring module, takes the module, compares the data and if it finds discrepancies it raises alarms. The type of hijacks that can be detected are exact prefix type of hijacks, what we call Type 0 and 1 in the ARTEMIS paper, the type 1 hijacks are basically when a hijacker falsifies the AS path and presents himself as a neighbour which is an advanced type of hijack that RPKI cannot protect you against, and other types of violations, some ‑‑ and attacks.
The user is notified, we are slack, e‑mail and syslog and can be notified by using the web UI.
This is our software stack. It's a collection of loosely coupled micro services running on top of a cluster. For the back end we use Python, and for the front end, React. And we use quite a bit large collection of other tools for our stack that you see here.
In terms of features, ARTEMIS was presented twice in the previous RIPE meetings, the last one in 2019, since then we have introduced a lot of features in the back‑end, I would highlight a couple of them. One of them is RPKI validation of hijacked prefixes. And also the ability for the users to extract the data using graph QL and rest APIs and by using Grafana, for example.
In the front end, we have developed a brand new front end using Reactjs and Mongo and we are currently developing a mobile application for the users to see some basic functionality using their own mobile phones and get alerts on their mobile phones.
For the roadmap, we have quite a few things in place, like future plans. One thing that I would highlight is we will create a script that will provide the users the ability to install ARTEMIS with one command so we make it more easy for people to install it but we have a lot like as plans and we would like the feedback from the network operator community of what they would like to see ARTEMIS do in the future. We know ‑‑ we are aware of 25 organisations that are currently using ARTEMIS, but we do know that ‑‑ we don't know all of them; some of them are Taiwan IPses in the US, large corporations but we keep learning about new companies that currently are using ARTEMIS and if one of you is using ARTEMIS and we don't know about them, please reach out to my colleague and myself while we are in this RIPE meeting because we want your feedback, what would you like ARTEMIS to do? What do you like about ARTEMIS, what you don't like about ARTEMIS, what is the use case that you use ARTEMIS for? And for the ones that are not using ARTEMIS, give it a try, go to the URL and download ARTEMIS and take it for a spin.
We provide also a demo in this URL in the demo, you can use it with guest accounts to see how it looks like. And also tomorrow in the Open Source Working Group, I plan to do a hijack of one of the Code BGP prefixes live on stage to show you how ARTEMIS works in practice. If any one of you is interested you can join me in the Open Source Working Group tomorrow to see how it works. Hopefully, it will work, we will see.
Thank you very much for your attention, and I'm happy to take any questions.
AUDIENCE SPEAKER: Have you thought about monitoring trance its, AS transit?
LEFTERIS MANASSAKIS: This tool is only hijack detection tool, it does one thing and tries to do it correctly, so when you say transits, you mean that you want ‑‑ we want to monitor the AS path between ‑‑
AUDIENCE SPEAKER: Yes
LEFTERIS MANASSAKIS: ‑‑ between downstream and upstream of the trance its, no, this is not currently what ARTEMIS does but it's quite hard cookie to crack, I would say, as you might imagine, what you asking, it's a bit difficult, because a lot of data you need to take into account, but we can discuss about it, how it could be done.
AUDIENCE SPEAKER: But the data is the same because the routing data is the same and just to search ‑‑ it is additional work.
LEFTERIS MANASSAKIS: Yes, the volume of the data is big, that's what I mean.
AUDIENCE SPEAKER: About 80,000 routes every five minutes, if I think ‑‑ yeah. Okay. Thank you.
LEFTERIS MANASSAKIS: Thank you.
MASSIMILIANO STUCCHI: I don't see anyone else in the queue, and we don't have anyone online. So, any other questions? Thank you very much.
MASSIMILIANO STUCCHI: While Wolfgang changes in style now, he will have the the presenter style rather than Chair style, he is going to talk to us about RFC9234 and how roles in BGP can be leveraged.
WOLFGANG TREMMEL: I love BGP. I am going to talk about a new RFCtoday and the nice thing is one of the RFCauthors is also here, Alexander, and this lightning talk should just give a short overview what this is about, there will be much about and in‑depth presentation in the Routing Working Group.
So, I am getting a bit of feedback here but never mind.
You all know that, you are a customer, you are connected to a transit provider and you see kind of everything, the whole BGP table, and then you set up peering or set up a second transit provider, you set up BGP and, oops, you forgot filtering. You do not want this to happen, but to be honest, it happened to all of us. So, let's talk about BGP neighbours, let's talk about types of BGP neighbours.
Well, we have transit providers and they have customers, they announce everything and the customers just announce their own and their customer prefixes. We have peers, peer and both announce to each other their own and their own customers' prefixes. And of course, we have at internet exchange, we have route servers and clients and the route servers announce what they know and the route server clients announce their own and their customer prefixes so we have kind of defined roles in BGP peering, and we have parings which basically are fixed so only some combination of parings are actually allowed, customer to transit provider, provider to ‑‑ peer to peer and so on, I think you get the picture. If these peering exist why not tell BGP what you actually are, so configure in your BGP session my role in this BGP session is a customer, and the other side configures my role in this BGP session is, I am a transit provider. And then you can do ‑‑ that BGP do some magic and actually check if these roles in a BGP session match. So, your role and your neighbours' role in BGPs are checked. If the paring is not valid, well there are two options, either you put up a warning in your log file or you simply do not establish the BGP session and that's called strict mode. This RFCallows both of them, you can say, okay, well, I have no idea of the other side really configures that, put in my role, but give me a warning if it doesn't match or you know the other side configures it properly, a strict mode, I only peer with a peer and I only get transit from a transit provider. Both is possible.
So that all which is defined in RFC9234? No, it's not, there is more. Remember what we want to prevent, we want to prevent flooding, that stuff we received from a transit provider goes to another transit provider. What do we do? If we receive a prefix from a transit provider we marked it with only two to customer, that means prefixes received from transit we are announcing only to our BGP customer. How does BGP know who is the customer? Well, we have configured it. So on the session to another transit provider or a peer, we simply block everything which this OTC flak set. You might say I am doing this for ages using BGP communities, but who has misconfigured communities not before? It happens. This basically takes care of an automatic flaking from prefixes we receive from transit providers to not announce them to other transit provides or peers.
The only two customers attribute, it's a little bit more complex, it also carries an AS number and so with that it can be checked in had ingress and in egress when receiving or announcing prefixes, in ingress if OTC is present and the AS number in it is knotty equal to the sender's AS, simply disregard it. Also, if the sender is not a customer or route server client low it away and if OTC is not present and the sender is a transit provider of a peer or route server, set OTC with the sender's AS number. It's not very complicated, especially it's very easy to configure, and if you see new players entering the market, configure BGP for the first time and I think that takes a lot of stress off operators suddenly configuring a new BGP player. On egress, similar, if OTC is not present and the receiver of the prefix is a customer peer or root server client, set the OTC with your own (‑‑ if OTC is present do not announce the prefix.
Want to know more? Come to the Routing Working Group. And that's it. Thank you.
MASSIMILIANO STUCCHI: We don't have any questions online.
WOLFGANG TREMMEL: Asking your questions in the Routing Working Group.
MASSIMILIANO STUCCHI: Yes, but we also have some time. So, so no questions. So you were clear enough or there will be many more at the Routing Working Group. So thank you, Wolfgang, for that. Now, you can come back here to Chair and, with this, we close our session today, don't forget to rate the talks and then you still have eight minutes to send ‑‑ to send an e‑mail to the Programme Committee to get into the elections, if you would like to join us. So, you still have eight minutes during the break, so we will see you after the break in 30 minutes, 38.
LIVE CAPTIONING BY AOIFE DOWNES, RPR