IoT Working Group
27th October 2022
At 11 a.m.:
SANDOCHE BALAKRICHENAN: I am one of the chairs for the RIPE IoT Working Group.
PETER STEINHAUSER: I am co‑chair of the IoT Working Group. As Sandoche said we have a pretty packed agenda so let's let's jump in. Before we start I want to thank our scribe for today, Aoife, I hoped you have spelled your name correctly. Thanks for the technical support. Thank you so much. So this is actually the slides for Michael.
This is our agenda for today, so first we start with housekeeping, then we have three talks, one will be done remotely from nick allot, he couldn't come to the RIPE meeting this time. Very quickly, housekeeping, as I said, we should cover the meeting minutes from the previous RIPE meeting, I hope you had a look at it in advance.
SANDOCHE BALAKRICHENAN: Is there anybody who needs to update? Do you think it's okay, we can validate minutes.
PETER STEINHAUSER: Looks okay, great. So, also a quick note. We have a code of conduct for the RIPE meetings which was approved so please everybody get familiar with the code of conduct and act accordingly. So this is also something we should say. And very important for the Q&A sessions after the talks, go to the mic, say your name and affiliation and ask your question, so everybody knows who you are and who you are speaking for. Okay. For remote participation, please make sure that you use the QA window, not the chat window, to ask your questions. And for the rest, as usual.
Then let's start with our first talk done by Michael Richardson from Sandelman works.
MICHAEL RICHARDSON: Hi, so I am stuck to the microphone, I am here to talk to you about an IETF document ‑‑ IRTF document that has some relevance to IoT and trust and actually for that matter, to RPKI, any RPKI enthusiasts in the room? They probably went somewhere else. We have one, so you might care about this as well.
So, a little bit about the talk. I am going to tell you who I am, why we are here, the challenges about this problem and some of the results that I have so far and then what I really want to you do is to line up the microphone and tell me why ‑‑ I would rather you did the work. I have been involved in the Internet since the 1980s, I have been involved in a whole bunch of start‑ups and I rate RFCs regarding security and IoT on boarding. Well, it's interesting so I was sat through the metrics Working Group and I think it's really important, you know, if you tell me this quote here, tell me how you will measure me and I will tell you how I behave, this is this is attributed to he will eye, I recommend them very much and some other books were written in the same style and these aren't dry business books, these are novels about people who have add ventures who have scheduling in software and there's things could you read at the cottage and not realise you were learning so that's something. I am not here to talk about their things, I am here to talk about my document: The taxonomy of operational security considerations for manufacturer installed keys and trust anchors. There is the QR code. You need to take out your phones and put that on your reading list and you also notice that the URL is at the bottom of the slide now so you have no excuse for not knowing where the document is. That's what it looks like and I am going to tell you why I came to write it in the first place.
You know of course the problem ‑‑ the answer to all security problems is cryptography, right? Let me just go home now. That's the answer. There's some stuff there, you know, and someone will tell you why ‑‑ what 256 is better than 128 and the reality is that's completely irrelevant because what it matters is about how you trust and who you trust. So let's just say you are the principal of this Japanese school and this thing shows up on your side walk one day, the original IoT device is a coke machine and this one has legs and arms and apparently does facial recognition and knows what drink you like. You have a couple of questions about the safety of this device as it's wandering around your schoolyard, okay? You are not actually sure it, it might be a human inside but you really don't know. What if it has a refrigerant leak, it's going to poison the kids, maybe, you might be concerned about that. You might want to know something about the recent maintenance facilities of the software, okay? You might want to know has the refrigerant micro controller been updated lately to the latest specs. Is there something that says it's been inspected lately and there's no leak, how would you know that. It's not your Coke machine, it's someone else's Coke machine and even if someone else, what are you going to do, what about the facial recognition, your students have some privacy, is the facial recognition running locally in a container or does it send it all up to the Cloud somewhere? How can you know which is with? Can you get some kind of a statement about? Can you find out? Mostly you can't because these are not open to you to inspect.
How can you even know if the key was any good, this attestation that says there's no leaks, how can you know that was a good key and reasonable?
So, what do we typically do? You have these private keys and they relate to certificates and you worry about the safety of these things and how they are doing. Okay, so we can put this one into a security element or TPM of some kind and we can have some better assurance of the physical safety of this key. On this end, we can put it into a hardware security module and say okay, everything is good now. Well, maybe, okay.
So probably that's a stupid way to arrange things, probably actually you should have two keys one of which you keep safely away in a vault and one which you use on somewhat regular basis in the factory. Imagine if you have a factory that's creating devices and every time it creates a device it creates a new identity certificate, you have to have the key that signs that certificate, the I Def ID online. So that's what it might look like.
So, first question: What is this architecture called? Any ideas? Any suggestions? Sandoche?
SANDOCHE BALAKRICHENAN: Public key infrastructure.
MICHAEL RICHARDSON: What is that tells you there's number two there or number 3 or 2 or 0, it turns out no one has a metric and that was the whole point and this was led to this document. One this questions is what is this? There might be a Trust Anchor so we all talk about in IoT and in fact it's in some Etsy regulations about the need for software updates and it needs post quantum cryptographically signed software updates because if there's a quantum event we are going to use new software and you can't use the broken algorithm to update your software. What do you know about this key? Who has access to it? Who can issue a software update? Or maybe a question in a more I amy thing of kind: Who can update your RPKI is another question? And how would you explain that to the board of directors of the company? Okay. And more interesting question: What if those people get wiped out in a tsunami or maybe they are on the wrong side of the Russia/Ukraine border and maybe pandemic means they can't travel and get together again. What happens? You have to ship a critical software update but a bunch of your people have no power, have no house and they were required to do this. So what do you do? And how do you know that situation, how does that guy back at the school know whether he can get a software update for a refrigerant on a timely basis even if there was a tsunami that happened, shall we just accepted that it's okay for autonomous cars to not get software updates when there's a tsunami? That would be an interesting take.
So you go to add some questions, right? And can anyone see the animal behind that thing? Do you know what it looks like, do you know the story about the people, they all look at the elephant and all came up with different things because they were all looking at a different part of the elephant? And none of them saw the whole elephant. And this is the case we have. You don't get to see the manufacturer's whole story, they won't show it to you because they are scared, because their lawyers are scared, because they have actual business reasons why they don't just publish it so you get little bits here and there and matters of things of what they may have and the rest is hidden behind a non‑disclosure agreement.
The first I think that this document does is it says that architecture you saw there, okay, this is a PKI level, this one is 3, there are 3 levels in this thing so it has a PKI level of 3, not, we would do it counting from 0, 0, 12, which would be nice because a self‑signed certificate has no trust and therefore it should have zero trust but it turns out if you talk to some people that that's not easily communicated and so we will say we will count from 1. This one has a level of 3. The end certificates are at Level 3, therefore that's something new. You know how to talk about the diagram and the previous thing and this is what this is is about. The other question you might ask: How was that private key generated? Did it happen on the device and then we did some kind of certificate enrolment protocol? Did it happen in the factory and we loaded the private key into the device through, for instance, it could happen during testing when it's on a bed of nails test? Or there's a more complicated situation where it turns out you outsourced part of the problem to the CPU vendor and they put a 256 bit random key in it or maybe a puff and you were told there's this magic key in this device which you know about and the device knows about and you can come up with the same private key, one in the factory and one in the thing. If that sounds complicated, it is, if it sounds suspicious then yeah, I am a little bit suspicious too but it seems okay in many case. What is it even called and could you at least tell me which one you did? When you do get that device, that refrigerator, that Barbie or smart lock for your front door you maybe would like to know how that key was generated so you can know what faults are there. And I will just say there's a Dan Bernstein paper from a decade ago about a flaw in the Taiwan smart card system, that was current at the time and it turned out the private keys weren't really random determined this by looking at thousands and thousands of public keys and realised that there was a pattern and that he was able to do this to derive essentially all the private keys because the random number generator in the factory, because they were using B, was not properly initialised, I think they replaced all the keys, it was time to do it anyway. Who has control over private key. You take a key and do [a] shimeer secret splitting, SSS it's often called and you get essentially maybe five keys so if you remember grade 10 or 11 high school math you can remember you have two equations and two unknowns but if I write down five equations in three unknowns, then I can have a situation where I only need to solve ‑‑ have three of the people present and I can figure out what the three unknowns are and I can put the keys back together and it doesn't matter which of the five show up so I could divide it by five‑pieces and I can have three people present to unlock the key or it could be four of seven or it could be six of nine or something like that. Okay? Now you have to ask your question: Okay, which of those nine people, who am I going to pick? How many on each continent, maybe a question you'd ask? Who can know who those people are? That's one of the things I think probably I don't think they should reveal that should be under NDA, I don't think me as a customer need to know who actually possesses those keys, okay? Those people who possess them are under threat, they are under threat of essentially some kind of extortion that would force them to divulge the key in a way, how many do you need to threaten? If it's six of nine you need to threaten six people in order to get the key to be revealed. Should we even talk about the number 6 or number 9? I don't have a conclusion on that, that's ‑‑ I think that we could safely reveal three, 6 minus 9, there are 3 spare people that, would be a reasonable thing and you could also say there are 3 spare people and I can reconstruct the key on two continents, maybe. You could tell me what they are so I can know if a tsunami matters. There's a place I am much less sure of what to do and I would like feedback of what you think. There's a real business continuity question here: If you lose the key not because a bad guy stole it but because you lost the key when the building burnt down, what do you do? If you have embedded that key in a million devices that you have shipped, you have a million dead devices, now you are going out of business. Or something bad is happening, you are not getting security updates or something is going on, right? That's actually a bigger risk if you think about it to the business than the bad guy getting the key that's why we have bad security because you think of the risk benefits of the whole process, or people refuse to DNSSEC sign their stuff because they are afraid of what if I lose the key? Now, I am screwed, right? So this I think applies to all of those kinds of stuff.
I went through most of these pieces already here.
What my goal is you could hire an auditor who could go and find out and sign the NDA and find the details behind there and they would reveal to awe set of standardised metrics, vendor one has a PKI level of 3 and their resilience number is 4 on two continents and I could find the same thing about a different vendor, they have a PKI number of 1 which means they just have a key that they just keep in a drawer, I guess, and USB key in a drawer and they have a resilience number of 0, if they lose it they are toast. That might be okay, that may be an okay acceptable level of security if what we are talking about is a talking doll, it would be a completely unacceptable level of security if for an autonomous vehicle. Which is better? I am not making a point, until we have a may of measuring it we are not going to be able to say which is better or appropriate. That's about it.
Questions? Come to the mic.
SANDOCHE BALAKRICHENAN: We have a lot of time, please say your name and affiliation.
BENEDIKT STOCKEBRAND: Talking for myself. You missed something.
MICHAEL RICHARDSON: Okay.
BENEDIKT STOCKEBRAND: The number of auditors.
MICHAEL RICHARDSON: Ah. So I can have different auditors ‑‑
BENEDIKT STOCKEBRAND: If you want to complies your system I would go for the auditor if there's only one.
MICHAEL RICHARDSON: Okay. How ‑‑ but the auditor is simply telling me whether the organisation is following their processes.
BENEDIKT STOCKEBRAND: When I buy the stuff and the manufacturer doesn't.
MICHAEL RICHARDSON: Oh, so, the manufacturer ‑‑ I see, the manufacturer buys the auditor ‑‑
BENEDIKT STOCKEBRAND: Whoever, yes, yes. And when it comes to the numbers, there's something else: We have certificates and we have certificate authorities and I don't know how many tens of hundreds of thousands of them, but we know from experience that these don't always do their job, so there we have really the problem if any one of them screws up, basically the entire system is broken and as such you might say it is. And on the other hand if you say we have one CA which people really take care about, there is some government or whoever applying whatever pressure they need, do you think that there is sort of a sweet spot where we are ‑‑ the numbers actually make sense and beyond which they start to get counterproductive?
MICHAEL RICHARDSON: I think it's important first of all, to your first point ‑‑ or your second point, we have web PKI CAs that are involved in the CAB forum and those are one side of things, that's what your web browser trusts. Mostly to a large extent in the IoT space we can't use those, the lifetimes that are enforced are too short, the policies are wrong, so in most cases manufacturers for the purposes of producing a birth certificate or an I Def ID they are talking about spinning up their own private PKI in their factory and doing that and they are talking about the the opposite which is the trust relationship where the software updates happen, we don't need a public CA for that. You could go to one and hire digi‑cert to run that for you, but you don't have to, so in that sense, the number of CAs that we are going to have in the future is not, you know, 37 public ones but thousands and thousands of private ones and so that's what I'm really concerned about, is that part. I think that your point about the ‑‑ the public CAs and they can screw up and issue Microsoft.com which happened, we know and that whole certificate transparency problem that's an interesting problem but it really only applies IoT devices that exclusively call the Cloud using a public anchor and it's not always the case, you don't have to use a public anchor for that.
BENEDIKT STOCKEBRAND: Thank you.
SANDOCHE BALAKRICHENAN: Do we have any other questions?
MICHAEL RICHARDSON: Don't be shy. So, the URL is still at the bottom there, I would really love your comments, I really need your comments, essentially what I was told is that there's no point in going forward with this document if it doesn't have a very widespread review so please read it and even if you say it looks good to me, I would love to hear that, especially in a public way, so thanks a lot.
SANDOCHE BALAKRICHENAN: I have a question here. So we were talking about the private PKI and the certificate authorities. There is work going on in the Working Group at the IETF where they are looking at how DNS could be used as a public key infrastructure for IoT. So, my question here is that whether, with the certificate authority infrastructure and the DNS as the PKI is it possible to reinforce the auditing for IoT?
MICHAEL RICHARDSON: No, I don't think so. So, what we are really interested in is not how does the public key get to the relying party, which is ‑‑ which X509 is through a series of certificates and CAs and subordinate DNSSEC records, what we are really interested here what is the care that the owner of the private key took to make sure the private key stayed private, so in the device if we are talking about dance with client side ‑‑ client side TLS authentication then the private key is going to be in the IoT device, somehow, right? So, you know, the stupidest way is just sitting there in flash and you can just pull the flash out and read the private key out. The other side of it which is way, way more complicated is there's an FTPN inside the CPU and it has the key buried three levels down and only the maker of the CPU could ever break it or you'd have to take the ‑‑ one at a time you'd have to take the CPU apart and literally sand the silicone off, that's what people do, and then they take a picture and send some more and they can recover the data that way but beyond ‑‑ that's two levels of completely opposite levels of security, right? And so what we are really asking is, tell us where you are in the spectrum, right, because I don't want my medical key sitting in flash. On the other hand I don't want to pay top buck for my talking Barbie that doesn't really need that much security, right? Any other questions?
SANDOCHE BALAKRICHENAN: Thank you, Michael.
(Applause)
The next presentation is from Peter Steinhauser.
PETER STEINHAUSER: Thank you very much. I hope you can hear me.
SANDOCHE BALAKRICHENAN: You can take ten more minutes if you want.
PETER STEINHAUSER: Let's see how it goes. First of all, thanks a lot, Michael, for your very, very interesting talk, this is about trust and on boarding IoT devices, I am going to talk about is a bit more on the management level so I mean, MUD is something we talked about many times in the RIPE meetings already, in talks, in private conversations, and what I am just going to talk about is another approach to solving the missing S in IoT, I think everybody is familiar with that famous talk a couple of years ago. Let's have a look into this this.
First of all a disclaimer, so the focus here is more about home, IoT, not industrial IoT. Of course the technologies underlying could be used for industrial IoT as well in any case. Anyways.
IoT we still have the same challenges. We put a device on the network, the management software still has life of its own, the device is talking to some places in the Internet, we have no clue, the user has no clue what's going to happen so those devices are penetrating firewalls, establishing random connection with some servers somewhere and randomly go crazy, even. I think most people don't want that, right?
So, many years ago, some very smart wise people came up with some ideas and said okay, let's just go and define what a device is supposed to do on the network level. So, they were talking about how to structure this information, which format to use and of course the source of the device should provide the file, makes sense, and they called it MUD. Well, a nice idea. The name actually was not so misleading because of our, well, this is how it looks like today. Okay. Just kidding. It's actually it's really great idea to do it that way, the format is very simple, it's human readable kind of a JSON syntax you can transform it into different formats so very flexible approach and we have IEEE standard for it so something everyone can rely on but we still have the chicken and egg problem here. Because at the end, where is the incentive for the manufacturers? We have MUD, everybody could use it, but which device is shipped with MUD files in the home IoT space? Because everything here is about cost so we don't have regulations from the governments, they are conversation with them, they are conversation with the carriers, with the ISPs, but nobody is really willing to do something about it. So there is no unified approach to this concept. And maybe, well, let's try to do something else.
Now, I am switching to OpenWrt. So OpenWrt is not really in the focus of the RIPE but it became, over the years, a de facto standard for CPEs and home gateways. You don't meet the name very often in the industry if you talk to chip set makers they talk about their BDK, QS DK, if you look into it it's open [[inaudible]] keep in mind per year we estimate 200 to 300 million devices with OpenWrt or ‑‑ are being shipped. And so maybe this could be a point where we can introduce some technology that can help.
So, as a kind of a side work in the tip open wi‑fi project, a little piece of software was developed, it's called Unet ACL so people familiar with OpenWrt know that these daemons macro services start with a U for micro, net for network and ACL access lists.
So, the unit ACL was built to perform a couple of tasks. It can do client detection using DHCP snooping or static configuration, it can do client Mac/IP tracking and can also if a client is not recognised, unregistered, can enforce that the traffic is discarded. It can enforce per Mac bandwidth limits, full traffic accounting, traffic limits, etc.
The whole technology is built upon enhanced packet filters, the advantage of using E BPF you technically have zero ‑ cost, especially on small devices this is a very important factor and it's a very, very flexible concept. Originally, the goal of Unet ACL was to do captive portals, the best example is I am not sure if you are familiar with this, we have could he have have an AP for 15 years, it was something everything everybodies using, the whole implementation was a little bit outdated so even on a gigabit line you maybe get 40 megabits throughout which is not really satisfying. With Unet‑acl you get, gigabit line the theoretical 900 megabits per second roughly which is what you want to want to achieve. It can be used for parental controls and the developers came up with the idea we could also do something about this for IoT. So Unet‑acl is able to track interfaces and then apply rules and classes that can be attached. So with these classes, you can rewrite the egress, the outgoing interface, you can rewrite destination Mac, add FW marks and remove VLANs.
So, those client mapping rules then can be used for any protocol/port, destination, IP, etc., any combination, and if you look a little bit about what you can do with it, just switching back, so protocol/port, destination IPs, DNS snooped, this actually looks like a MUD file and a MUD file you have the set of descriptions, what a device is supposed to do, what is supposed to talk to, so why not using the technology of Unet‑acl, to work with MUD?
So the implementation currently for Unet‑acl is in works, it's ongoing, the first step is that you can say okay, we have a certain device class, we have certain device and we have a matching MUD file you simply use the MUD file, it applies the rules. So the device can only work in the context that was originally ‑‑ originally defined in the MUD file. But there is also an important other aspect of it, because Unet‑acl also can do the monitoring and the monitoring part is a really, really important, we had some conversation with ISPs about network health and if you did look at the latest attack vectors, botnets we are using, the critical questions is always getting aware of something going on. So, ISPs and carriers tend to do central management of devices and also OpenWrt or OpenWrt based projects like in purple or in Tip, they have central management and central monitoring telemetrics. And imagine there is a new attack vector for certain IoT device time and now the CPE recognises there's something going on that's not meeting the specification, what the device is supposed to do, then it can send a telemetry event and say there's something weird going on. If that only happens for one customer, not a big deal, but what happens if the same thing happens on hundreds, thousands of customer places? Then, the network operator can take this information and say okay we have attack vector, we have to look into it and act and not let's say after two or three days when the network starts to get a bit flakey then they start investigating it.
So, this is one of the capabilities and another capability is also that devices that are not registered in a network, you simply discard the network traffic so the device is there, it's kind of in the network but it cannot do anything. Okay. This is something, I would ‑‑ I will talk with Michael about it very shortly, we need some more discussion because of course in the device on boarding phase, something needs to enable the device at least to communicate for a certain time, otherwise the device says okay I cannot reach my gateway, I cannot reach my control server so it just suspends activity and says okay, start over again. All right.
Anyways. With that we are closed, but still no cigar. Because as we ‑‑ as I said in the beginning, the challenge here is, the chicken and egg problem, so we need MUD files but we don't have them, especially in the home user field. So, developers said okay let's do something we call MUD auto learning. There are several projects who are already doing this but we wanted to bring it into OpenWrt so it's available to everybody who is using OpenWrt as a base. Technically it's quite simple. So Unet‑acl is monitoring the traffic pattern of unknown device, for defined time frame, this is still something we are working on, what time frame is necessary, we will connect with other research facilities to find the right parameters for that, nevertheless after the learning phase, you have output of what the device was doing in this time frame, and this, then, you can export as a MUD file or in any other file format you want. And this you can then use as a foundation to build databases of devices. Is this the ultimate cure? Of course not. I mean, this is technology, which can be used for something but, at the end, what we need is a community, it's people who say okay, we want to use this technology, we want to collect data, we collaborate together to build those device descriptions, fortunately there are making people working on that so I am positive that this could be a step forward.
Maybe also as a sidenote, since this technology is getting into, for instance, the Tip open wi‑fi project there are at least some industry interest to get those things accelerated.
And that's my very, very little, short intro into U E. So questions?
SANDOCHE BALAKRICHENAN: Thank you, Peter. So are there any questions for Peter?
MICHAEL RICHARDSON: Tell me a bit about how ‑‑ how long you are watching the traffic and how do you come ‑‑ become confident that you have seen all the behaviours that you are going to see?
PETER STEINHAUSER: That is the real tricky question about it, yeah.
MICHAEL RICHARDSON: Is it possible to relearn or what happens when there's an an alert?
PETER STEINHAUSER: At the moment the concept is about, let's say we, for instance, we monitor the device for a day and what happens during this day is used as, let's say, initial collection of information.
MICHAEL RICHARDSON: Right.
PETER STEINHAUSER: I mean, the idea is just to have something to start from. So even let's say a home user, he gets his new, the famous example actually, the ‑‑ his new Samsung TV and he says well, does this Samsung TV need to talk to Facebook, really? I just want to watch TV, right? And one approach here is to say, okay, the user can say okay, this device does a lot more things than I want it to do so I start the ‑ learning mode, then I get a user‑friendly representation, let's say talks to Facebook, does this and this and the user can say okay, I don't want this, I don't want this, this ‑‑ but this would be an individual approach. But of course you are totally right, in case the behaviour changes, I mean the behaviour change can have a good reason, and this is ‑‑ this is actually the, in general, for MUD files and for this device description the interesting part: How can we maintain them? It's not only about creating the files, but we have to maintain them, we have to see how, what happens, iterations, parts can change, addresses can change, there are many, many things can happen.
MICHAEL RICHARDSON: So that's the user focus, the home user focus side of this, but you are also doing work on the ISP side of things, you said something about ISPs would get alerts and that scares me, I think it scares, I suspect it would scare ISPs because false alarms are so expensive and so do you have any ideas where ‑‑ how that's going to evolve?
PETER STEINHAUSER: ISPs ISPs, frankly, speaking is a very, very interesting field, I had several talks in the last weeks and, the bad news is, most of the ISPs still have no clue how to approach IoT devices. They have no clue on the technical side how to do things, what they want to achieve. On the marketing side they don't know how to monetise it, so to me, frankly speaking, it looks like we should tell them what they can do with it, which is not our job actually, it's not our job.
MICHAEL RICHARDSON: Someone has to have an opinion is what you are saying.
PETER STEINHAUSER: Exactly.
MICHAEL RICHARDSON: We might as well form an opinion and they can disagree but at least we are having a conversation at that point.
PETER STEINHAUSER: Right, right. It's a really good topic, Michael. I would suggest ‑‑ I mean we did in the past the first BCOP document of the RIPE IoT Working Group, and it was a good starting point, like a collection of ideas, things, things we have, but maybe we should really do a second iteration of the document or maybe another document, which is like focusing on ISP use cases, to give them an idea.
MICHAEL RICHARDSON: I agree with you. I was going to say exactly that, maybe we should have a new document so we are on the same thought here. Wonderful.
MARTIN WINTER: Sorry I missed some of the earlier discussion, can you tell me what's the current state actually if someone wants to try experiment of this, in what shape is something available these days?
PETER STEINHAUSER: So the Unet‑acl itself is already there, I can point you to the ‑‑ the IoT part is currently in implementation ‑‑ I think we will have a first running version at the end of the year.
MARTIN WINTER: Is there experimental code that's going from your own experience?
PETER STEINHAUSER: This is still under implementation, but I can connect you with the head of development who is doing this part of work. I think it's going to be available in the next six or eight weeks.
SANDOCHE BALAKRICHENAN: Any other questions for Peter? So I have one for you. From AFNIC. Do you have an idea how ‑‑ what is the percent am of IoT devicesing that being implemented using OpenWrt.
PETER STEINHAUSER: IoT devices using W R T, I haven't heard about those, because I think the resource requirements are way too high for a common IoT device. I mean it would be possible to run surveillance camera, for instance, the hardware would be powerful enough but for the typical type like power plugs sensors, etc., OpenWrt is not the right software platform.
SANDOCHE BALAKRICHENAN: It would be used in the gateway end?
PETER STEINHAUSER: Yes, it's in the gateway, that's the main purpose.
SANDOCHE BALAKRICHENAN: Okay. Thank you.
PETER STEINHAUSER: Thank you.
AUDIENCE SPEAKER: I work at surf. I mainly hear you talking about ACL so source destination IPs. Have you taken any DNS resolvers in place? Stein the concept of Unet‑acl is also covering DNS‑based requests, so you can also say okay, we are all learning what the device, which DNS address is the device contacting to create the profile and also to enforce rules.
MICHAEL RICHARDSON: May I also answer that? So the question of devices that you want to write MUD ACLs on DNS names is a complicated one and there's an IETF draft for that and I would love your comments on it, but fundamentally it amounts to you'd better make sure your IoT devices do not do DOH to Google or Cloudflare because then you can't learn what names they looked up and, more importantly, when you look up the names in the ACL, you might get a different answer than they did, and then your ACLs don't work and you get false positives and the ISP bitches and MUD gets turned off. So that's kind of scenario, we wind up back where we are now, right?
So, this is sort of the, you know, the negative side, it's wonderful for personal privacy and it's unclear if my ‑‑ use local because that's way more private than anything else, the question is what can we do to make local DNS so reliable that the IoT to vendors would never think about going outside of the home for DNS, right? And well ‑‑ you know, I think there's a bigger discussion there but it does impact us, right, and you can't do malware ‑‑ most of the malware, what are they called? Network nannies that look for malware on your network most of them are based upon looking at the DNS requests you know certain requests are for malware command and control channels and when you see that you know the device has been compromised but you can't see that data because it's all encrypted to someone else, then there's a problem, right?
SANDOCHE BALAKRICHENAN: Thank you,Peter, and a round of applause for Peter. The next presentation will be remote, it's from Nicholas Allot, the security IoT gateways highlighting the need to address them.
NIKOLAS ALLOT: I hope. First I will start with apologies, I had hoped to be there in person but various travel constraints made that ‑‑ that wasn't really possible.
Hopefully what I'm going to present today is going to dovetail quite nicely with what Peter and Michael have presented on and in some sense the entirety of what I am going to talk about some sort of glue that sits between some of those problems. I am from inquiring minds fundamentally what I'm representing today is the sort about the collective work of a number of organisations that have been running under the many secured projects that sits under the Internet of things security foundation. There's approximately about 15, 20 organisations affiliated to that at the moment, some are ISPs, some route vendors and some technology designers. Fundamentally the subject of my topic is security IoT gateways and what I am going to try and do is dig into some of what the challenges are and the approaches we are taking and starting to take to try and solve those.
So, in summary, the many secured project takes the contentious approach that device security is good but gateway security is better and that's for one very pragmatic reason: There is a a lot of very poor quality IoT devices already out there. You can come up with some sort of ideas for creating the end point, some of them have the ‑‑ one excellent method for doing that but you have got to be practical in terms of when that's going to hit the market and also managing the transition phase between having some IoT of low end points and some are high end point addressing. We see addressing the gateway important as ‑‑ it ‑‑ it's certainly complementary to the end point strategies and eminently practical and potentially deployable in the near term without requiring a massive number of IoT devices.
Pretty much everything I am going to be talking about is from the specific context of that many secured text so that's a text stack, like an emerging standard in draft form, it's available on the website but there's also an Open Source stack that's fully accessible which I am sures bodies a lot of these concepts. I am going to be talking about a lot of concepts and I think the key things points I would like to make is although everything I talk about is manifest in this stack, some of the ideas can be peeled off sort of like as individual package, you don't need to swallow the whole beast, you can bite it off one chunk at a time if you believe technology merits it.
And why routers? It's a bit of an old bit of news now, it's just ‑‑ infinity of these type of reports on the Internet but certainly one recent semantic analyst review estimated about 75% of infected IoT devices are really caused either directly or indirectly by the router. So that fact alone is probably sufficient to generate the ‑‑ to generate the gateway focus. But additionally, it is a very, very attractive attack target, any attack on the router generally it can be done from the outside can be issued at a low cost, low economic cost, a massively scaleable so it is ‑‑ it is a device that is emperically being attacked at the moment and will continue to be attacked because of the ease at which it can be done. But then there's the inverse reason which is okay, not only is it attack target in its own right, it is a line of defence, the firewall and the other defensive methods the gateway has line of defence against a whole host of other devices sitting behind the gateway or router. And that's very much why we have taken sort of the many secured branding, it's try and capture this concept, this single thing is both responsible for the protection of all the devices that sit under it. There's a positive side of it as well, so there's the negative side which is okay it's going to be attacked but there's the positive side as well and it's a bit like Peter was just saying, it is a potential line of defence that can be strengthened, it has visibility of the lot of the activity and has the ability to contain some of these ‑.
So challenges. I am going to skip ‑‑ there's quite a lot of these but I am going to try and synthesize them into a few simple ones that's easy to capture.
The first one is the whole massive problem in and of its own right and that's one of the problems we started with and I want to introduce you to make sure people understand it, one of the big problems we have with IoT security is at the moment it is virtually impossible to practically use securely, secure the access between the browser and the IoT device when it's sat on the internal network. Fundamentally, go to 192.168.01 and have a think about how you secure that, it's really a lot harder than it first appears. And why is this problematic? It's problematic because almost every IoT device and almost every router recommends that this is the method by which you configure and bootstrap your device on to the network so your initial user end touch point for the user is fundamentally insecure because it's telling you to go over a website that has that certificate and therefore anyone in that internal network can essentially ‑ the traffic and the passwords. There are alternatives like installing certificates and browsers but there's a whole load of reasons why that's not a good idea but the pragmatic alternative people are taking is essentially issuing a new application by the App Store for new device or router you want to configure and that's all sorts of reasons why that maybe a bad idea both security‑wise and more impactful, system‑wise. This is certainly a problem, and we will come back to it in a moment, which is quite a fundamental foundational issue to do with the IoT and the gateway interaction.
And just as evidence, I picked up the BT router, you can pick up any one, how go to the how do you figure it and what it tells you to do and answer some questions.
The other question is and I think the stuff that both Peter and Michael talked about are different dimensions of this which is if I want to use the router as a method of both detecting and protecting the security I have got one fundamental tricky question: Where are you? What is that device at the end of the IP address? How are you behaving? Do you touch some... of the unit stuff and what can I do about it? To some extent this challenge in the macro sense is what we are trying to address with the sort of the ‑ stack but in quite an expansive way.
I will just throw in a few other nuances but they are really important because these are sort of, as far as we are concerned, really foundational challenges that really change the flavour of the practical solutions you can produce. One, heterogenous networks, not everything in IoT is IP based, there's a few mainstream ones on the market but certainly when you start going industrial the practical problem you have got if you want to contain security across the IoT ecosystem is you need support for quite complex network topologies but also quite a wide variety of fundamentally different transports and network types.
Second problem is, and I think the last presentations or certainly Peter's, touch on this is okay, where are you getting your truth from. I will come back to this later. You are taking a MUD statement, where is that statement coming from and how can you trust it and the hard reality is well, it's a pretty fluid thing, there is no single source of truth and a practical solution needs to be able to reason under a reasonable degree of uncertainty and to some extent the final point echos or reinforces that. Certainly in the consumer space, it's also true in the industrial space. The half‑life of the device is probably longer than half‑life of the manufacturer on average. There's a lot of devices you can buy where the manufacturer has ceased to trade, and, therefore, any ‑‑ any systems that require an operational flow that goes back to the manufacturer is substantially problematic, but also you need to right from day one start thinking about legacy support, how do you support devices for which the manufacturer is not providing information because it doesn't want to do so and how do you support devices for which the manufacturer is not provided information because it isn't there any more.
And just to pick up, this is a slide I think I originally did about three years ago but it just ‑‑ this is just looking at the typical consumer roll‑out in terms of what ‑‑ what a common or garden topology looks like. You can spend about 20 minutes on this slide alone but the big takeaway for this is, it's a lot more complicated than you might first think, there's a lot of routers and extenders, there could be multiple subnets on different routers and the big one is what we put on the protocol bridges, basically your standard echo plus which are things you can buy in shops now are protocol bridges, they are part of the IoT extended network, they are talking largely non‑IP traffic, transcoding it it on to the IP network and sometimes that traffic goes to the Cloud and sometimes locally, and it's quite complicated, also the other thing to draw attention to, the one in the middle, the web user app, any line going from there to a device on the Internet is problematic because it's really hard to secure.
Before I go to the technical meat of it, I am going to do a slight pseudophilosophical distraction into what do we think the fundamental problem is here and what part of the implicit assumption of how we are approaching it is what we need is a theory of types, what do we mean? Let's just change perspective and think I am going out to the woods and living a subsistence existence and I need to survive, what am I going to eat. One of the things is mushrooms, fungus, as you walk in the you see and smell ‑‑ it's taste, it's colours, it's smells, and what you are seeing in every ‑‑ what you are perceiving is the evidence that an individual instance of a thing is presenting to you. But if that's all he know it's very hard to survive because your fundamental challenge here is working out which are going to kill you and which aren't going to kill you. You need a theory of types, you need to take these infinity of abstract instances and construct a taxonomy, a set of methods of classification whereby you can start to delineate what's poisonous versus what's not, and that is a theory of types for the taxonomy but pragmatically this is what happens, you need some sort of flowchart to be able to classify your individual instances based upon the phenomena that you see into the type with some sort of ideally risk measure because it's never perfect. There's an inverse problem as well, in a worst case scenario you are poisoned, you need an inverse bit of logic that can work out from the poison symptoms, go back to see what the threat was. It's a little sort of abstract little cul‑de‑sac to go in and what we contend is all of those steps that you need when you are dealing with fungus, you basically need the same thing dealing with IoT devices fundamentally and really the linchpin of almost everything that I am going to talk about from now on is this fundamental distinction between a device instance and a device type, so the device instance is physical, it's an observable event but it's ephemeral, it's there today, gone today. But a device type is conceptual, it's not really described by individual events but described by it's perpetual, it lasts forever. One of the fundamental challenges as you are trying to monitor and maintain the security of the device is being able to manage the map between instance and type, practical ‑‑ in a practical way that recognises that the information which you are being presented with can change over time and sometimes isn't necessarily fully trustworthy.
So I am going to jump right in. What is it we have done? I will bubble back out of the technical detail and explain how it's built up.
So what we have really done is identified the simplest sort of three standardised interoperable interfaces that we think we can define that allow us to build these architectures at scale and these are the ones with little orange lines on, DB 3, B 3 life cycle. We start with the router or gateway, and there's basically an outbound line and inbound line. So the outbound line is what information does one or multiple routers have to send upstream to make the risk calibration exercise attractable problem? It could be a full PCAP but that's probably too much and it's too heavy and too much data going over the wire so this is a pragmatically it's going to be some subset of that needs to be moved northbound [[inaudible]] and then the inverse of it is what can we do about it? We just ‑‑ it's basically a finite set of remotely attributable APIs that allow you to contain, that it disabled the device all the inverses on board the device. It can be more complex than that but we are trying to be as simple as possible. To some extent the stuff Peter was talking about which is the Unet which is an implementation detail of what could happen inside the router to actually manifest and make those two APIs tractable so implementation detail of those. But there are ‑‑ there are plenty of other ways of actually ‑‑ the way we have, is we have tried to keep it as agnostic as possible from the actual underlying implementation technology, so the control plane, you could implement the EPDF filter with firewall, you could implement it with a completely separate router so we are trying not to knock down how it's done but manifest and define the abstract in place that needs support.
What's happening is happening you can call that the brains, this is the thing that's reasoning on what it sees, constantly changing the risk profile of the individual devices that it sees and possibly recommending an intervention. So you have got these like three layers here, cognitive reasoning, this is a very fancy way of saying it's a set of rules that is based upon what it's seeing in terms of the instance and what it knows in terms of the theory of types and any history it knows about the device, based upon all that information what can I infer? And then I think Michael touched on this in one of his questions, so what do you do about it and how do you deal with the false positives? To some extent we don't have an answer to that problem but we provide a framework in which it's possible to adapt many solutions. There's an autonomous response layer, if you are inferring something has a risk level N and you have a high degree of probability automatically intervene you can decide to lockdown if you don't have that confidence you can, via the interface we have defined, also escalate those incidents, issues up to [[inaudible]]
On the right‑hand side, this is where all of the real intelligence comes from. So basically, going back to what I said at the very start, a lot of what we are trying to do here is reason sensibly, intelligently about type, how do we do that? Basically, there's a whole load of stuff we can use. We have a formal description of types, of firmware, vulnerability databases, device hierarchies and then also here with the MUD file, where that comes in there and we have got more sophisticated MUD file that I will talk about in a minute. So all of this data is data which to some extent already exists, there are eco systems out there but actually you can really see they are all basically from a database perspective following on one thing and that is your firm understanding of what that type is and to extent to really simplify it down and break it down into something pragmatic, MUD files are great idea, they describe the restriction you can put on a device, where is that data going to come from, you can see this as a method of implementing fuzzy MUD, method of softly inferring what that instance is, looking up the constraint should apply and allow you ‑‑ the gateway to act on the information. So where is the standard MUD definitions are relatively tightly coupled we can basically loosely coupled it based on inference [[inaudible]]
Q.
The second blue box to the right is distinct from the first because the information in the first box really relates to type. All in the information in the second box relates to instance. The purchasing, the servicing, provisioning, transfer of ownership. However, there's a lot of information if you know what type your instances you can look up in the box underneath that fundamentally helps with all of those exercises, because most of those exercises on boarding, monitoring etc., are essentially dynamic risk assessments, risk assessments and what do you need to do to do that risk assessment? I need to look at what you are and your history of what you have done and what you came to be and does that relate to other instances of a similar type. That is in abstract how the whole thing fits together. So to some extent, it's ‑‑ Peter has talked about the MUD file so the way that we use the MUD files in many secured or specifically the D 3 statements which are the foundation of it is, we take the MUD description, we encapsulate it and then what we actually do is we digitally sign it with provenance, why? Because it's not necessarily the manufacturer's description of the use, it might be somebody else's description of the use and by signing these statements and being able to assemble to them enables us to quite fluidly [[inaudible]]
How do we know what type you are? So I see an instance how do I know what type it is? Could be done using ‑‑ got slightly different method of doing it, which is basically certificate‑based. FIDO has also got a method of tying an instance to type, all certificate based, all subtle nuance to deviations from each other, and the reason that the system is designed the way it is to be able to embrace all those different sources of truth into one view, but also, pragmatically, solve that same problem of what type are you using other methods. It could be ‑‑ what did you think I was when I was there or it could be a purchase taker or you could have some fancy AI that infers it from the behaviour, the point is every single one of those different methods has a different sort of trust chain, a different truth but the statement is always the same, the statement is always a qualified statement: I think this instance of the ‑ device is this type with this level of certainty.
I think the software bill of materials and ‑‑ the interesting thing about them, let's take CDE, go register and type in the field says what device type are you, if this project does nothing else, but actually manages to provide a universal language or statement to types that allows CDE statements to be resolved I think that would be a massive bonus because that allows the system, the gateway or the gateway management system to be able to read about the devices on its system based on the current name, in a very flexible way that works pragmatically both through multiple competing future standards but also legacies [[inaudible]] as well.
So, building blocks. How do we actually do it? Everything ‑‑ the ‑‑ I don't have time to go through the full details. Everything is specified in sort of draft form on the many secured website. Fundamentally everything we have done is built on WC3 credentials model and it's basically, it's a form of signature, it's a signed statement, but it's a signed statement whereby that doesn't have universal route anchor so it is ideal for the problem that we are talking about, where there were potentially multiple sources of truth. And then over and above those verifiable credentials and we have one statement for declaring firmware and another one for declaring web files etc., what we try and do is make sure they interoperate when they talk about types and instances we use an identity that will always and consistently resolve uniquely.
We get to, and I think this was touched on in Michael ‑‑ in Peter's presentation but where does the information come from to some extent and how do you trust it?
What we tried to do is not knock down a fundamental method. We haven't tried to knock ‑‑ there is only one single source of truth but what we have tried to do is create a set of standards where each ‑‑ each ought on I can statements can be made in an interoperable way so we define the schema and that can be signed in interoperable way so we can define the trust face, there will be one or more what we call dedeaggregators emerge and the method of assembling it, it could be algorithmic where a community of people come to a consensus view over which statements are more qualified than others, it's basically assembles all the D3 statements into a single view of the truth where we know everything links together and common schema and that could be presented to the gateway and the gateway itself could take its sources from other ‑‑ ‑ it's a very flexible method in terms of we are not insisting there's only a single source of truth in the originating data but what we are trying to do is define the mechanical methods by which [[inaudible]]
In terms there are actually lots of use cases for these D3 statements. The first and most obvious is the type aversion, this physical device, this physical machine exists, is this physical set of components which has got this firma running on it at any moment in time? If each case you publish a URI it's got a public key identifier and that defines your types and then we can also then define the hierarchy of types where we believe some types inherit properties from another all devices that actually use same firmware could be many physical instances may share behaviours and vulnerabilities etc. So it's imperative we can
Recognise a type. This is I as a human being or I as a gateway think that this Mac address relates to this specific type, this well known type which is a reference to UC 1 above. And the design of the system is such that that recogniser of events can be issued from many sources. It could come from FIDO certificate or it could come from the user just say I actually think it's one of these.
Then use case 3, that's essentially the MUD statement and actually what we have done here, we have absorbed the best of what's out there already but coupled it using this framework. I will just talk about use 4 ‑‑ use case 4 is like an advanced MUD so what is a MUD is essentially it's a form of a white‑list, an allow list but actually there's a lot of very, very things you can do in terms of qualifying that behaviour and you can use statistics, you can qualify by velocity, by volume, but actually these descripters, we are not intending on standardising but we are intending on providing the language wrappers for people to express them to be able to use. By folding it into the same schema allow the ‑‑ [[inaudible]]
And hopefully it will start to become apparent how this stuff can be assembled to address some of the challenges that Michael and Peter were talking about. The first question is: How long do I need to observe something to be able to write a rule? I have got a bigger question: How many different devices declared they are the same type do I need to observe before I can actually consistently infer the behaviour?
And hopefully you will see that the individual building blocks give you everything you need to be able to make that reason in a qualified way, it gives you a different method for different organisations to sharing that data very practically between each other, and we have also locked down the evidence base which is the ‑‑ will be able to make those [[inaudible]]
So, fundamentally, so where are we? There's a website which is public and accessible, the specs are there in draft form and there's a GitHub from which those results are automatically generated and any member of the community can activity contribute to that. But we have also got an Open Source gateway reference, it actually, it's an OpenWrt package, it does implement most of the sort of technologies that you can actually see in the bounce back so we are trying to iterate those against the implementation in quite clear iterations. And fundamentally, it is quite ambitious, it does sometimes feel like we are trying to ‑‑ hopefully I think are addressing some of the really underlying challenges to do with the trustworthiness of the data and the into the future looking specs. And this is relatively ‑‑ we have still got a long way to go and got a lot of stuff out in the public domain already. The final thing I will mention, partly because I am interested in gathering more opinions, we are in the midst of a virtual IoT cyber lab and really what is that, it's a method by which different organisations, whether they be research organises or ISPs or ‑‑ can practically collaborate on the data, the behaving data to be able to inferring these are practical and trusted.
So that's pretty much all I am going to say. There are a lot more slides in there with more detail for people who are interested, I will make sure they are shared and available and I am happy to take questions both online and off‑line.
SANDOCHE BALAKRICHENAN: Thank you, Nicholas. It was a very good presentation. We have time for questions.
PETER STEINHAUSER: Nicholas, those D 3 aggregators, those to me from what I am working on is a very, very interesting part, so are there already aggregators available for research reasons you could work with? And what's a concept about the aggregators? Are those like community‑funded provided or is there a commercial background, are both models possible?
NICHOLAS ALLOT: What we tried to make sure the are ‑‑ I know the minute you do that, you are tying your asset to a post ‑‑ the commercial thing is complicated, the current interim provisional stakeholder that has asserted an interest at being a [demigrator] is the industry of Internet foundation that he is obviously where this technology was originally housed but there is nothing in the specifications that insists on that. I what I would imagine because I think we ‑‑ if we just look at everything I have just described, but only from the perspective of MUD files, we have got massive bootstrap problem we need to deal with. How do you need enough MUD files to be pragmatic. I can see this happening in a phase where it might start with Open Source, Crowd Source community initiative that would need some sort of mediation moderation and can almost run as an Open Source project albeit there may be funding behind it, I don't know, but as things evolve ideally what you would want is the manufacturer to be the trusted source of that information, and then probably you need slightly more commercial aggregators in place to aggregate that. And the ideal solution would allow that transition between crowdsourced to industry moderated manufacture sourced information in a fluid way rather than it being a big bang because that's the classic network problem, we need to be creative about it.
SANDOCHE BALAKRICHENAN: Do we have any other questions for Nicholas? Okay. Thank you, a round of applause for Nicholas, please.
PETER STEINHAUSER: All right. Almost fell, but fortunately not. So, a quick RIPE NCC update, the RIPE 48 we had the request from the RIPE NCC asking what is the interest, the Working Group for trainings, so the consensus was more or less no trainings are needed, that's at least what we are seeing in the meeting minutes. RIPE NCC made a survey asking about topics would people would like to have trainings and what we can see here, well, we have 27 .6% of people seeming to have a certain need on that, so, I would suggest on the mailing list, which is ‑‑ we start a small thread asking what topics people would like to know about. I think this could be very interesting for the Working Group itself and also it could give possible feedback to the NCC for trainings in case.
Well, and now we have reached the end of our session. Maybe some words from my side, also short feedback about today's session. So we were again talking a lot about security, on boarding, trust factors, etc., so I think as a Working Group of the RIPE community we, in the future, should also talk a little bit more about networking, and I hope that industry initiatives like ‑ threat will bring us a little bit closer to that so I encourage everybody to also participate on the mailing list, we have been quite quiet there, so I think we all have some work lying ahead. And with that, and if Sandoche does not want to add anything so far but please ‑‑
SANDOCHE BALAKRICHENAN: Thank you,Peter, I don't have anything to add but maybe if you are not subscribed to the IoT Working Group mailing list, please subscribe. Thank you.
PETER STEINHAUSER: Thank you, everybody, and have a great rest of the meeting, thank you.
(Applause)
LIVE CAPTIONING BY AOIFE DOWNES, RPR
DUBLIN, IRELAND