M.C.: Our next session is presented by Tom Sego, CEO and co-founder at Blast Wave Inc. Please welcome Tom as he takes the stage to discuss "For OT/ICS, it's not CIA, but AAA."
Tom Sego:
Alright, thank you everybody. I’m taking the stage for the second time. I want to tell a story and you guys may be able to relate to this. In 2016, I received my monthly wired magazine – hard copy – and as I was always excited to read it I fell on an article by Kim Zetter talking about the Ukraine grid attack from Russia. How many of you guys read that article? Not many? Okay. Well, that started my cyber career. That article talking about how a quarter million people lost their power three days before Christmas was a wake up call for me. It made me realize – from my decade of experience in factories at Caterpillar and Eli Lilly – just how vulnerable these OT environments were and I thought, “The world just went through a massive tectonic shift, and somebody needs to do something about this”. Little did I know that I was that person.
So, when you think about how devastating an attack in IT can be, relative to a breach like the Target breach, the Home Depot breach, or Blue Cross where data was leaked – not great, but if you look at a blackout, you look at halting production. That's a very different level and it's a different scale. It's a different order of magnitude. So, if you go to the CISSP manual, they talk about the CIA Triad. Many people probably understand what that is. It's the purpose of information security: To protect data – confidentiality, integrity, availability – but it's really data focused and the vulnerabilities and threats they look at are through that same lens.
Now, when it comes to operational technology, it's all about availability. If you are not producing, you are not making money. It's that simple. If you're Mattel and you're making toys for Christmas today. If you lose an hour of uptime, you can't get that back. It's gone because you're running 24/7 anyway. It has a devastating effect. It's much more about the operation, and much less about the data. That has some interesting implications.
So the bottom line is: IT is much more focused around Protecting data and preventing data breaches. Whereas, OT is all about preventing disruption, keeping things moving, keeping production on, keeping the plant or facility running. So, I'm going to talk about some qualitative differences between IT and OT, and I'm going to also cover some quantitative aspects – which I haven't really seen published anywhere and I find it extremely interesting. Then we're going to talk about some of the root causes that affect availability and then finally, what can we do about it?
So, I started my career at Caterpillar. The picture on the left is a Caterpillar plant. What's interesting is aside from the production facility, one of the most fascinating things about Caterpillar is, this a Union shop. You have the UAW butting heads with management sometimes and other times, they got along great. When they got along great, the UAW workers took tremendous pride in their level of output. They took pride in beating the Japanese, or in beating their competitors, in terms of productivity, cost, etc.
Now, when things didn't go right… what would a union do if they didn't like the wages or the conditions or the safety things? They'd strike, right? What is a strike? It's a disruption. The biggest tool in their toolbox was shutting down production. Stopping the ability of the company to continue to operate. They actually did something pretty clever, too. In 1991, they had what was called “the great slowdown” in which they actually ramped down production and they refused to cooperate with management. It's just a scaled down version of affecting availability.
On the right is a pharmaceutical plant. I worked at Eli Lilly for eight years and what's interesting when you look at these pictures, you see thousands of different devices. Every device is practically a snowflake – not really, but it's close. Every endpoint is different. Now, when you contrast that with an IT scenario, what really stands out is that IT is all about standardization, right? You've got the same PCs on your office desk. You've got the same email software. You've got the same SaaS applications. You got the same servers, the same NIC cards. This is the fundamental point of conflict. Because IT is all about standardization and OT has a tremendous amount of diversity of devices, protocols, etc.
We're going to get into that in a bit but first, we're going to talk about the people. When you look at who's in charge of IT – when you look at the IT function actually stepping back at a corporate level – IT is a cost center. OT Is a production center. IT is spending money. OT is making money. So it's kind of ironic that IT has a bigger budget, but we'll get into that later. Now, who's in charge of that? It's a CIO or a CISO, if you're talking about security. On the OT side, it's the plant manager. If you think about the metrics that the CIO is typically graded and what their bonuses are dependent on. It's things like; What was the average response time of the help desk? or what is the cost per user for all the SaaS app stack they have? On the OT side, it's OEE (Operational Equipment Efficiency). It's one metric. It's looking at how much you can crank out of that plant that's high quality. So, very different metrics. Very different objectives. Very different people. Very different mindset.
Now, if you go down to the level of actually who does the work. On the IT side, you have, oftentimes, people who are certified. You've got your CISSPs (Certified Information Systems Security Professional). You have your CCNA (Cisco Certified Network Associate), or some equivalent. There's usually more than one person in that situation. You go to OT, you've got what I call Bob. Everybody knows a Bob. If you've been to a plant, like a water treatment plant. Ask these questions like, “Okay, now, so who manages the SCADA server?” “Oh, that's Bob.” “Who manages the network?” “Oh, that's Bob.” and I'm like, “So is Bob part of IT?” “No, no, no, no, no. Bob's been around for decades. Bob knows how this stuff works. So, we just let Bob take care of everything.” Bob typically comes from a process control background and it's crazy. Every plant I go to, there's always a Bob.
Now what's also interesting is, oftentimes there's an Alice. It's almost like marriage counseling because oftentimes, The IT side and the OT side hate each other. They don't trust each other. IT is always breaking their stuff. I could just see the scenario where you've got Bob and Alice on the couch together. “Now, Alice, you understand, when you're trying to shove these standards down Bob's throat, he feels like you don't appreciate him and he doesn't understand the OT environment.” And then you say, “...and Bob, you understand when Alice is trying to standardize and shove these standards down your throat, it's not that, ‘he doesn't appreciate standards.’ It's that Alice doesn't feel heard.” She doesn't feel like Bob cares about her. So you have this conflict between the two. I think that's another issue that gets in the way. Again, people. All institutions are made up of people and people are not perfect. There are flaws. That's a very important thing to get out of the way.
Now, you start talking about technical differences, these are going to be well understood. On the IT side, it's standardized – you have very few protocols. On the OT side, you have lots of protocols. Every vendor has their own protocol. Dozens and dozens of protocols you've got to deal with on the OT side. On the IT side, they're refreshing their PCs and other equipment every 3-5 years. On the OT side, maybe 30 to 50 years if you're lucky, sometimes never. So, what does that mean? Well, that means you don't have the ability to quickly adapt/change to the new environment. You can't deal with patches the way the IT side can – they have “Patch Tuesday”.
On the OT side, there's a study where they look at all the CVEs (Common Vulnerabilities and Exposures) and half or 35% of the CVEs that were discovered the first half of this year are what are called “Forever Day Vulnerabilities.” That means there's never going to be a patch. Maybe the vendor is out of business or that product has been EOL'd (End-of-Life) or it's not supported. But one third of those CVEs are unpatchable so you have to have a different approach to deal with an unpatchable system than a patchable system. Then the last thing is: When can you do maintenance? When can you do software updates? That kind of thing. Well, with IT, oftentimes they can do it in the middle of the night or on weekends.That's not going to fly with OT. It's 24/7. A Tesla plant, for example, has three hours of downtime every quarter. That's it. Think about the amount of planning that has to go in. So this is where Bob and Alice have a lot of work to do. Fundamentally, this is like night and day, black and white, apples and oranges. This is the reality and people talking about IT/OT convergence, you've got to figure out how you deal with these qualitative factors.
From a quantitative side, it can get very interesting. IBM Ponemon report is the gold standard for looking at the cost of a breach. 2022 is $4.35 million. In 2021, it was $4.24 million. That's pretty high. In the US, by the way, it's closer to nine million but that's nothing compared to what it looks like on the OT side. On the OT side – and this is just one survey, I have other data that I did not pick – there was a gentleman from Honeywell who presented some data that supports the average downtime for a cyber event is between 24 and 26 days. I'm just being a little conservative here and say it's around 5. So, if the average downtime of an attack is 5 days and the cost per hour – every plant knows the cost per hour when they are down and the average here is about a quarter million – ranges very widely. Up to two million in automotive. Siemens had a report on this. It can go much lower, depending on the size of your business. So, that amounts to about $30 million and that's just downtime. That doesn't include the indirect costs. Things like reputation damage, legal fees. It doesn't include the opportunity cost of people working on things that are more productive. In my way of looking at this, this is 10x more consequential than an IT breach.
If you're an adversary and you have some nice kits that you can deploy. You've got two options: You can go to a kind of IT attack surface that's updated, maintained fairly frequently – latest versions of Windows, etc. or you can go to an OT facility that has Windows 98 running on it. It has typically flat networks. They're using VPNs with over 600 CVEs. It's going to be a sitting duck. So, it's probably 10x easier to attack an OT environment than an IT environment. There's no such thing as an “air gap”, by the way. This is a side note: Christopher Krebs, I think, had to testify to Congress about the air gap concept when he was the head of CISA and, I think, they looked at 1,500 different facilities and – he said this under oath – zero of them were air gapped, actually. It's because the things that you may think are air gapped are not necessarily air gapped. The point is it's much easier to attack an OT environment that is an IT environment and it's 10x more consequential. That's two orders of magnitude, if I'm an adversary and it actually is borne out of the numbers. The ransom data that I've seen from Palo Alto Network's report showed that the average ransom payment last year was $925,000. That's industry average. For the OT side, it's over $2 million. If you look at Johnson Controls’ $51 million ransom, the numbers are getting big and there's always a delay factor. So, when I said I was reading that magazine, and seeing the future, I knew that this would change rapidly.
It's also interesting to point out if you look at the IBM X-Force 2022 Report that showed, in 2019, manufacturing represented 6% of all attacks. Financial services was by far and away #1. Then last year, manufacturing overtook financial services as the most attacked industry. It went from 6% to 25%. Now it's the #1 attacked industry. These are trailing indicators that are showing what I saw back in 2016/2017. It's very significant and humans are not great at doing things proactively.
Let's get to some of the root causes. So, phishing is the number one attack vector that drives these attacks. When I say phishing, I also mean credential theft and things like that. That represents about 60% of the ways that people get initial access down the supply kill chain. Then exploiting vulnerabilities or bugs obviously is a big one. Flat networks, now this one's interesting because in OT, oftentimes, people do deploy these firewalls to try to segment their IT and OT networks. They also try to segment into critical zones and set up conduits but what's also interesting is, again, you've got these humans involved here and what happens when the firewall is causing problems? What do people do? Well, what we've seen them do is they'll introduce the any/any rule to the firewall. The record, by the way, that we've seen is 17 any/any rules on a single firewall. So, you get these flat networks because people are focused on availability.
The next area is Remote Access. I think a lot of people understand this is a major threat vector that's being exploited. It used to be part of the solution. Now, it's increasingly become part of the problem. Then finally, human error. This is a really tough one. How do you engineer out human error? Well, before I got into this, I played professional poker and one of the things about professional poker is, it's not a card-game played with people. It's a people-game played with cards. The way I make money is when my opponent makes mistakes. How do you make an opponent make mistakes? Force them to make a decision because every decision they make is an opportunity to make a mistake. I think the challenge with human error, starting from first principles, is to think, “Okay, how can we engineer out – to the maximum extent possible – decisions that humans really don't need to be making?” and I'll talk about usernames and passwords in a bit.
Let's move on to some solutions here. First solution in terms of how to improve availability is around Mindset. This can get a little controversial depending on your experience but the mindset that I kind of start with is the NIST framework. I think about identifying your assets, taking inventory, trying to put protections in place that gives you defense in-depth and then moving on to Detection, Remediation and then Backup and Recovery. But the mindset I'm talking about is that prevention is possible. Now, that's controversial because a lot of people think prevention is not possible. So, what do they jump to? They jump to some security control that has magic AI that is going to automatically detect any problems that you might have, and that's your solution. Well, I don't buy that. Hopefully, someday, AI can be a part of the solution. AI can help improve our ability to detect malicious patterns and things like that – that's great. It's not a silver bullet. When it comes to security, there is no silver bullet. There's just a lot of lead bullets. I do believe though that the prevention mindset is something that is really important to adopt because there are many things that we can do to help make your environment so much harder to attack that the bad guys and bad girls will give up or they'll at least move on to a softer target. Maybe they'll start robbing banks again or something. You're not going to get rid of the bad guys and girls but maybe you can make your area much more hardened and tougher to attack.
Zero Trust Architecture. Now, I know a lot of people swoon when they hear the Zero Trust marketing BS but I'm really referring to NIST 800-207. Specifically, I'm talking about the Software Defined Perimeter model that's discussed in NIST 800-207. How many people know what a Software Defined Perimeter is? I'll explain it then. Software Defined Perimeter is separating the control plane and the data plane like you do in a software defined network but it also enforces authentication prior to seeing anything on that network. So, if you're not an authenticated user, the packets are dropped. You can't run a network scan as an unauthenticated user and see anything. This was discovered in the military, they call it “Darkening Hosts” but, anyway, Software Defined Perimeter helps provide a level of enforcement, cloaking, and authentication requirement that's very helpful in making you much harder to attack. Software Defined Perimeter is something that I think would be interesting to investigate.
Then Phishing-Resistant MFA. I hate passwords with a passion. When I started the company, I'm like, “We have to get rid of passwords. This is 2017. This is crazy”. It's something for me to remember. It's something that I will likely forget. It's something that I'm going to be forced to rotate to something I definitely will forget. The idea of passwords, to me, are antiquated. It's something that can be socially engineered out of me. So, the idea was: Can you move to a phishing-resistant MFA, which is both convenient and secure? Because if you make things both convenient and secure, you don't motivate human beings to find a workaround and that's where a lot of things [fail]. I can't count how many times I would see a little yellow sticky on an HMI or engineering workstation that had the username and password. What's that doing for you?
And then Segmentation. We talked a lot about segmentation and separating the business network from the production network, and that's a great place to start but then there's also how can you then further subdivide your network in a way that doesn't just blow up your budget? How can you subdivide that OT network into critical zones of conduits? What's really cool about this is, if you can do this in a cost-effective way, and you're a midstream oil and gas company, if something gets compromised – and by the way, you will get compromised. That's gonna happen. But if you can contain the blast radius of that compromise through segmentation so only a compressor station goes down or if you're in wastewater, only a lift station goes down. You contain that so only one thing needs to be remediated, It's much easier to deal with. The impact on the disruption side is much lower.
So, let me then move on to my last slide, which is: What is the government doing to help the OT folks. I'm not going to make any comments about the government, but I will say that the Critical Infrastructure Security Agency (CISA) has come up with some pretty decent guidelines that I think are really helpful. If you're not aware of these, this is a great place to get familiar with them. There's really two main things that they do and then an implementation plan. Cyber Assessments is the first thing that they want you to do and I think that's a really good thing to do. The second thing is to have an Incident Response plan. That's very important. Again, the gentleman from Honeywell was talking about this. And then the implementation plan, which will contain four elements: One is continuous monitoring and visibility – super important. There's a lot of tools that can do that in OT environments – Nozomi, etc. The other thing that you want to have is a Secure Remote Access solution. Ideally, Zero Trust Network Access that allows vendors to only access their Siemens controller at this one facility. You can make that extremely granular so that people can't attach themselves to an HMI and then pivot into your network. That's really powerful. The third thing that you should do: Segment your network, we talked about that. The fourth thing is to have a way to deal with these unpatchable systems. This is the reality. 35% of the CVEs, unpatchable. Having a “patching cadence” is going to help 65% of that but it's not going to address the whole thing. You need a plan that addresses those unpatchable systems so you're not a sitting duck.
Many of those things have been adopted by the TSA in their Security Directive SDO-02D, which came out July 27th of this year. One of the things that's pretty interesting is the military had created an entire office to achieve an objective of reaching zero trust target level by 2027. The idea is that there are 91 activities that have been identified by this group of security professionals and if you achieve these outcomes, you will then achieve that zero trust target level and this will make you much harder to attack. In fact, they're going to red team these things with their NSA and maybe other military folks. The thing that's really cool about it is, it's very simple. Their North Star describes all 91 things that they're trying to accomplish in a very simple sentence. It says, Their North Star is to… “stop, contain, frustrate, thwart, and limit access, lateral movement, and visibility of the adversary.” which is a really great way to make yourself much harder to attack.
Learn more about BlastWave’s solution in our infographic: “How to Keep Industrial Control Systems Safe and Resilient: A Non-Techie's Guide” or request a demo of BlastShield.
Experience the simplicity of BlastShield to secure your OT network and legacy infrastructure.