At every healthcare security conference I’ve been to in the last few years, at least one speaker has a slide in their presentation deck with a few data breach figures designed to elicit a collective audience gasp. Being a security geek, a decidedly antisocial vocation characterized by skepticism, suspicion, and conspiracy delusions, I was compelled by my particular brand of insanity to cozy up to the source data first hand. Also to get the most up-to-date figures available.

Fortunately, the Office for Civil Rights (OCR), the arm of the U.S. Department of Health and Human Services (HHS), tracks healthcare data breaches of electronic protected health information (ePHI) greater than or equal to 500 patient records, as prescribed by HIPAA/HITECH. OCR provides an interactive online tool for examining ePHI exposure incidents, and the source data is available for download.

Massaging the Data

As with most data sets, there’s a bit of inconsistency and ambiguity in the systems, so instead of jumping right to the denouement, you need some background on how I teased the data into line. Perhaps it will help you if you decide to take a run at the data yourself, or at least it will explain why not everyone comes up with the same results given the same source data.

  • I’ve ignored the covered entity’s, well…entity. No need to insult the injured. But I did remove a few obvious duplicates: same covered entity (CE) name, same reporting date, same number of records. There were less than a half dozen of these.
  • The OCR tracks Types of Breaches (e.g., theft, unauthorized access/disclosure, hacking/IT incident), as well as Location of Breached Information (e.g., laptop, paper, other portable electronic device), which I refer to as Media Containing ePHI for clarity. The OCR provide these as two separate fields, but containing one or more entries separated by commas, which isn’t that useful for statistical analysis. I created individual fields for each entry and separated them. First, though, I had to resolve a number of misspellings and inconsistencies in entry names (e.g., Loss, Improper Disposal vs Loss/Improper Disposal). I also renamed some of the properties and aggregated others that made sense, for example, I consolidated “Other (Backup Tapes)” and “Other (Backup Disks)” into simply Backup Media.
  • Some of the records, only 14% (ish), have associated comments, or a narrative that provides details about the incident. In some cases I changed the Type of Breach and/or Location of Breached Information based on the narrative.
  • If you add up all the Types of Breaches or Media Containing ePHI, they both exceed the total incidents. This is because many of the breaches have multiple classifications attached to them. Some seem to be catch-alls. For example, “Computer” appears to be a generic category for lost or stolen medium, but may be coupled with “Laptop” or “Network Server”. It’s unclear from those incidents without an accompanying narrative whether there were multiple systems involved—a laptop stolen as well as a network server hacked—or it’s multiple classifications for a single system.
  • The comments tell the story of many incidents involving theft of computers left in cars or brought home or to a remote office, like a business associate’s lab. These are not called out in the original classification so I created properties to capture these incidents.
  • The majority of incidents have only one date associated with them; however, later in the data set, there is a date range. To normalize the chronology, I used the later date, which is presumably the date of discovery, for grouping incidents by year.
  • The latest sane date in the data I used is October 12, 2013. Interestingly, the last record, for “Multiple Health Plans”, which I interpret as a generic identifier for multiple health plans (although there may be a healthcare payer with that name…), is dated in the future: December 7, 2013. I left this last record in the data set even though it’s not particularly significant, with only 1,368 breached records, categorized as both theft and loss of paper media.
  • The current year is not yet come to a close. Consequently, I projected the total number of incidents and records based on the current run rate. However, for the detailed results—number of incidents of improper disposal, for example—results were not extrapolated.

The Results

  • 24 million (plus a bit): the number of ePHI records have been/will be compromised between 2009 and the projected end of 2013
  • 730 incidents were reported In the same period (also projected for 2013)
  • The number of incidents dropped significantly from 2012 to 2011, and has been going down slowly since, except for between 2011 and 2012, where it stayed relatively flat. However, the average number of records per incident has fluctuated significantly, with about 40K records / incident in 2013 and 2011, 17K in 2012, and 25K in 2010.

  • Theft is by far the greatest type of breach, including hospital and office burglaries, and laptops stolen from offices and cars. Unauthorized access or disclosure comes in a distant second, with less than half than theft.
  • Hacking doesn’t figure prominently in breach incidents, with less than 20 incidents per year, and only 10 in 2013 so far.
  • At least 14 incidents were related to employees and contractors leaving media containing ePHI in vehicles which were broken into. That figure is likely higher as it’s not a property tracked by OCR.
  • Similarly, postal mail was a prominent medium for inadvertently disclosing ePHI before 2012, but no incidents appear since. The incidents provide lessons in paying attention to detail. Some include sending the wrong patient information to recipients or inserting a list of patients and associated private information into a group mailing, and in once case ePHI was printed on the external mailing label. Additionally, backup media was mailed in a few instances and never reached its destination or was addressed to the wrong recipient.
  • For positive trends, loss of ePHI on portable devices has declined steadily since 2010, and is currently at 8 for 2013; what’s being classified simply as ‘computer’ for media has also steadily declined, and is currently at 0 for 2013; however, that may be due to better classification, as computer seems to be a catch all.


After spending a few hours becoming intimate with the data, I’m left with the feeling that the healthcare industry is making progress in some areas, but is overall struggling to clot the wound bleeding patient data. If you’ve done your own analysis and came up with different results, I’d love to hear what it is and how you arrived at your conclusions.

Stay safe my friends.

More from Data Protection

Data Privacy: How the Growing Field of Regulations Impacts Businesses

The proposed rules over artificial intelligence (AI) in the European Union (EU) are a harbinger of things to come. Data privacy laws are becoming more complex and growing in number and relevance. So, businesses that seek to become — and stay — compliant must find a solution that can do more than just respond to current challenges. Take a look at upcoming trends when it comes to data privacy regulations and how to follow them. Today's AI Solutions On April…

Defensive Driving: The Need for EV Cybersecurity Roadmaps

As the U.S. looks to bolster electric vehicle (EV) adoption, a new challenge is on the horizon: cybersecurity. Given the interconnected nature of these vehicles and their reliance on local power grids, they’re not just an alternative option for getting from Point A to Point B. They also offer a new path for network compromise that could put drivers, companies and infrastructure at risk. To help address this issue, the Office of the National Cyber Director (ONCD) recently hosted a…

Why Quantum Computing Capabilities Are Creating Security Vulnerabilities Today

Quantum computing capabilities are already impacting your organization. While data encryption and operational disruption have long troubled Chief Information Security Officers (CISOs), the threat posed by emerging quantum computing capabilities is far more profound and immediate. Indeed, quantum computing poses an existential risk to the classical encryption protocols that enable virtually all digital transactions. Over the next several years, widespread data encryption mechanisms, such as public-key cryptography (PKC), could become vulnerable. Any classically encrypted communication could be wiretapped and is…

How the CCPA is Shaping Other State’s Data Privacy

Privacy laws are nothing new when it comes to modern-day business. However, since the global digitization of data and the sharing economy took off, companies have struggled to keep up with an ever-changing legal landscape while still fulfilling their obligations to protect user data. The challenge is that there is no one-size-fits-all solution regarding data privacy's legal requirements. Depending on the location and jurisdiction, data privacy laws can vary significantly in terms of scope and enforcement. But while the laws…