As soon as the Oct. 4 Facebook mega outage took place, questions about the cause ran rampant. Was it a cyber crime or a technical glitch?? Who was at fault?
The outage reportedly resulted in the loss of some $60 to $100 million dollars of revenue, and Facebook’s stock plunged 4.9% on the same day. That’s a total of $47.3 billion in lost market cap.
So what’s the difference between a cyber attack and a technical problem? The Facebook outage and other cyber attack examples help us find out.
Looks Like a Cyber Crime
Commenting about the outage, Santosh Janardhan, Facebook VP of Infrastructure said, “The end result was that our DNS servers became unreachable even though they were still operational. This made it impossible for the rest of the internet to find our servers.”
This case wasn’t a cyber crime. During routine maintenance, a command was issued to assess the availability of Facebook’s global backbone network capacity. An error took down the entire network, which disconnected all Facebook global data centers. In addition, a bug in an audit tool prevented anyone from detecting and deterring the command.
The end result was a total break of Facebook server connections between its data centers and the internet.
Real Cyber Crime
On the surface, a DNS flood (a type of distributed denial of service or DDoS attack) might look just like Facebook’s technical error. DNS flood attacks use high bandwidth connections from Internet of Things (IoT) devices to directly overwhelm and jam DNS servers. The flood of requests from IoT devices overwhelms the DNS provider’s services, which prevents real users from accessing the servers.
DDoS attacks deliver massive internet traffic by taking over multiple compromised computer systems as sources of attack traffic. Some attackers who launch this kind of cyber crime can use a mix of IoT and computers.
Still, to overflow Facebook would require a digital tsunami of unheard proportions. Facebook’s bandwidth and interconnectivity is so huge, it can absorb most large-scale attacks. Most likely, Facebook has a very large and highly distributed DNS system. It can probably monitor, absorb and block nearly any sized DDoS attack traffic.
When Is it a Cyber Crime, and When Is it a Tech Glitch?
In the case of a DNS-related failure, DDoS attacks come with huge traffic spikes. Any incident that isn’t a cyber crime would likely appear with normal traffic patterns.
Meanwhile, cyber threat intelligence can pick up on chatter about attacks of many kinds. Cyber threat intelligence analysis focuses on the triad of actors, intent and capability. It considers attacker tactics, techniques & procedures, motivations and access to the intended targets. In some cases, defenders may also use machine learning to monitor and predict threats.
If something has damaged your systems, besides fixing the problem, it pays to deploy attack detection tactics. For instance, check your firewalls. Has something disabled their rules? Does your network seem slow? Malware might be siphoning bandwidth. Have user accounts suddenly accessed areas in your network where they are not allowed? Identity and access management (IAM) tools are very useful for detecting and preventing strange user behavior.
IAM software uses machine learning and artificial intelligence to analyze parameters, such as user, device, activity, environment and behavior. By assigning an adjustable risk score, it can determine whether or not to grant access.
Sadly, depending on the types of cyber crime, you may never discover them until threat actors demand a ransom, claim the attack or begin selling sensitive data on the darknet.
When it comes to mopping up the damage, each technical issue will have its work defined. For the recent Facebook outage, they needed to figure out why a routine maintenance command took out the entire network. Plus, they need to check out the audit tool bug. And was that audit tool defective or disabled by an attacker?
Along more general lines, in the case of a cyber crime, you should be concerned about data theft, malware added to your systems, lingering back doors and web shells that could continue to provide access.
In the case of technical issues, think about physical failure, repeat failures and downstream damage to other systems.
Harden Against Both Problems
For technical glitches and cyber crime failures, you want the best incident response possible. Facebook’s Janardhan commented on this. He said:
“We’ve done extensive work hardening our systems to prevent unauthorized access, and … hardening slowed us down as we tried to recover from an outage caused not by malicious activity, but an error … I believe a tradeoff like this is worth it — greatly increased day-to-day security vs. a slower recovery from a hopefully rare event like this.”
He went on to emphasize the importance of strengthening testing, drills and overall IT resilience. For more specific issues, proactive offensive and defensive security could make all the difference, such as:
- Penetration testing – Directly seek out and identify real vulnerabilities in your networks and infrastructure
- Adversary simulation – Operators build their own attack tools to mimic real-world advanced attackers
- Vulnerability management – Scan and prioritize vulnerability data using an automated ranking engine.
For any cyber crime response, especially where sensitive personally identifiable information is concerned, timely and accurate communication is critical. This not only serves to preserve the brand reputation, but regulatory agencies require full disclosure.
For example, according to GDPR: “In the case of a personal data breach, the controller shall without undue delay and, where feasible, not later than 72 hours after having become aware of it, notify the personal data breach to the supervisory authority….”
Note that in this massive outage, Facebook didn’t have to say anything. What happened was self-evident. Still, on the same day they posted a statement. Until you’re sure, it doesn’t pay to even speculate on the cause of the problem until your research is complete.
It’s critical to have a disaster response strategy and team in place at all times. They should practice their response so if an incident occurs, nobody moves without a pre-established plan and chain of command.
In most cases, it pays to talk openly about it as early as possible. Delays only make it look like you’re trying to cover up something. The post-incident messaging should be clear and concise. Tailor your messaging to satisfy regulators, respect the customer’s right to know and preserve brand integrity. They’ll want to know whether you suffered a cyber crime or a technical glitch too. Telling them up front will help secure their trust, too.
Freelance Technology Writer
Jonathan Reed is a freelance technology writer. For the last decade, he has written about a wide range of topics including cybersecurity, Industry 4.0, AI/ML...