George Burns, as he advanced in age would quip, “Every day I get up and read the obituaries. If I don’t see my name, I make breakfast.”
From a technical perspective, life itself remains the single most fragile thing in the entire universe. Even the most miniscule difference in temperature, chemistry, pressure, and radiation may have huge impacts upon our ability to continue the exchange of oxygen for carbon dioxide. Much like Mr. Burns, I prefer this operation to the alternative.
Fragile devices fill the OT world.
Fragile, not physically, for they have ruggedly survived decades. Nor operationally, as they support almost epic levels of reliability. But fragile from a cybersecurity perspective, where the smallest unexpected blip in the wire can cause hard locks, abrupt crashes, or catastrophic downtime.
Many years ago, after leaving the financial world to join the electrical sector, I neglected to purchase a copy of Etiquette and Fine Manners Among Polite OT Society by Emily Post. I suffered, therefore, many gaps in fundamental knowledge of “how things actually work” as well as many other critical facts filed under the category of OT Personnel, care and feeding thereof. For instance, I did not understand that I must wait patiently some distance away from the control desk until the operator called me over. Years of IT experience had conditioned me to believe a “quick setting change and I’ll be out of your hair” would work universally. System operators, however, have switching orders, unit ramps, coordinated events, and literal human lives on the line. They must not be distracted at a crucial moment by “the computer guy.”
One day, while still a blissful OT sophomore, it became apparent that our SCADA vendor may not exactly know comprehensively what ports and services their system used to communicate. Shocker, I know. Deep IT instincts snapped into action. My freshly completed compliance program required a concrete ports and services mapping, after all, and nmap is free. So, let’s just see what’s out there…
Many readers, by this point, have already guessed the punch line.
A colleague and I march into the server room with my laptop and boldly plug in. Fortunately, for my continued employment, we had decided to begin our scans on the Quality Assurance System (QAS) rather than in the production Energy Management System (EMS). And while no excuse, I had performed hundreds of nmap scans (safe flags) on bank networks all over the English-speaking world without so much as a dropped packet, so I stood unprepared for the next several minutes. A mere defenseless babe in the wilderness, I set ZenMap for Quick scan.
My finger had barely released the enter key before all the relays fired in the Cornet Serial Switch with echoing clunks from the rack just beside my head. The QAS system one-line on the local monitor popped up a flashing warning of an “System in Invalid State” in progress as both data acquisition (DAC) servers began fighting each other for the dubious privilege of becoming the passive node in their redundant pair.
The modem bank connections began dropping from the QAS front end processor (FEP) servers one-by-one. We watched the drama unfold on the screen as one might gaze helplessly upon a slow-motion train wreck. Only seconds had elapsed, yet somehow, the EMS Development Team Supervisor and the Manager of Operations had both covered the distance from their respective sides of the building in record time. The doors jerked open simultaneously on both ends of the server room, and both shouted in seemingly practiced harmony, “What did you do!”
On a positive note, the production system functioned unscathed.
While presenting in exhaustive detail my incident breakdown report to most of the C-suite in our boardroom, I had time to reflect. The OT systems that were now part of my life do not react well to cowboy scanning and thoughtless whims of administration. That meeting also disabused me forever from using the term “failover” to describe swapping of roles in an active-passive redundant pair.
The DAC servers had multiple network interface cards, one each dedicated to a heartbeat pulse between them for the purposes of “succeeding” over. It was this unprotected and unbuffered IP address that knocked the pins out from under the servers at the barest touch of my nmap scanner, and caused it to go down. No, excuse me, I meant to say that it caused the server to enter into a “differently up” state.
Before we left the boardroom, my CIO leaned over to me and quietly said, “Maybe don’t ping that port again.”
These flaws have long since been remediated by the vendor. We took steps even then to mitigate the risk, by removing the heartbeat interfaces from the network and reconnecting them with a simple crossover cable. In retrospect, we had uncovered a flaw “in testing” before some chattering network card could do it in production… at three o’clock some Sunday morning.
And that’s a good thing. Painful, in my case, but good.
What about your OT system? How do you protect your system from unwanted or unexpected east/west traffic? Can a simple ping sweep bring your network to its figurative knees? Had microsegmentation and zero trust been implemented on that network, not only would my scan cause no problem, but no system would respond at all.
Simply because two systems connect to the same subnet should not authorize communications between them. We’ve seen too many examples of ransomware and other malwares spreading through OT networks like wildfire with unsurprising consequences. How does self-propagating malware spread? By exploiting vulnerabilities in the systems nearby on the network? But what if there were no systems nearby?
Modern OT networks need cloaking and micro-segmentation to prevent more breaches like the Ukraine blackout and the Colonial Pipeline. OT must earn the mantra “never trust, always verify.”
Have your own cybersecurity stories? We’d love to hear them! Send them to me via frontlinetales@blastwave.com
Experience the simplicity of BlastShield to secure your OT network and legacy infrastructure.