At every healthcare security conference I’ve been to in the last few years, at least one speaker has a slide in their presentation deck with a few data breach figures designed to elicit a collective audience gasp. Being a security geek, a decidedly antisocial vocation characterized by skepticism, suspicion, and conspiracy delusions, I was compelled by my particular brand of insanity to cozy up to the source data first hand. Also to get the most up-to-date figures available.

Fortunately, the Office for Civil Rights (OCR), the arm of the U.S. Department of Health and Human Services (HHS), tracks healthcare data breaches of electronic protected health information (ePHI) greater than or equal to 500 patient records, as prescribed by HIPAA/HITECH. OCR provides an interactive online tool for examining ePHI exposure incidents, and the source data is available for download.

Massaging the Data

As with most data sets, there’s a bit of inconsistency and ambiguity in the systems, so instead of jumping right to the denouement, you need some background on how I teased the data into line. Perhaps it will help you if you decide to take a run at the data yourself, or at least it will explain why not everyone comes up with the same results given the same source data.

  • I’ve ignored the covered entity’s, well…entity. No need to insult the injured. But I did remove a few obvious duplicates: same covered entity (CE) name, same reporting date, same number of records. There were less than a half dozen of these.
  • The OCR tracks Types of Breaches (e.g., theft, unauthorized access/disclosure, hacking/IT incident), as well as Location of Breached Information (e.g., laptop, paper, other portable electronic device), which I refer to as Media Containing ePHI for clarity. The OCR provide these as two separate fields, but containing one or more entries separated by commas, which isn’t that useful for statistical analysis. I created individual fields for each entry and separated them. First, though, I had to resolve a number of misspellings and inconsistencies in entry names (e.g., Loss, Improper Disposal vs Loss/Improper Disposal). I also renamed some of the properties and aggregated others that made sense, for example, I consolidated “Other (Backup Tapes)” and “Other (Backup Disks)” into simply Backup Media.
  • Some of the records, only 14% (ish), have associated comments, or a narrative that provides details about the incident. In some cases I changed the Type of Breach and/or Location of Breached Information based on the narrative.
  • If you add up all the Types of Breaches or Media Containing ePHI, they both exceed the total incidents. This is because many of the breaches have multiple classifications attached to them. Some seem to be catch-alls. For example, “Computer” appears to be a generic category for lost or stolen medium, but may be coupled with “Laptop” or “Network Server”. It’s unclear from those incidents without an accompanying narrative whether there were multiple systems involved—a laptop stolen as well as a network server hacked—or it’s multiple classifications for a single system.
  • The comments tell the story of many incidents involving theft of computers left in cars or brought home or to a remote office, like a business associate’s lab. These are not called out in the original classification so I created properties to capture these incidents.
  • The majority of incidents have only one date associated with them; however, later in the data set, there is a date range. To normalize the chronology, I used the later date, which is presumably the date of discovery, for grouping incidents by year.
  • The latest sane date in the data I used is October 12, 2013. Interestingly, the last record, for “Multiple Health Plans”, which I interpret as a generic identifier for multiple health plans (although there may be a healthcare payer with that name…), is dated in the future: December 7, 2013. I left this last record in the data set even though it’s not particularly significant, with only 1,368 breached records, categorized as both theft and loss of paper media.
  • The current year is not yet come to a close. Consequently, I projected the total number of incidents and records based on the current run rate. However, for the detailed results—number of incidents of improper disposal, for example—results were not extrapolated.

The Results

  • 24 million (plus a bit): the number of ePHI records have been/will be compromised between 2009 and the projected end of 2013
  • 730 incidents were reported In the same period (also projected for 2013)
  • The number of incidents dropped significantly from 2012 to 2011, and has been going down slowly since, except for between 2011 and 2012, where it stayed relatively flat. However, the average number of records per incident has fluctuated significantly, with about 40K records / incident in 2013 and 2011, 17K in 2012, and 25K in 2010.

  • Theft is by far the greatest type of breach, including hospital and office burglaries, and laptops stolen from offices and cars. Unauthorized access or disclosure comes in a distant second, with less than half than theft.
  • Hacking doesn’t figure prominently in breach incidents, with less than 20 incidents per year, and only 10 in 2013 so far.
  • At least 14 incidents were related to employees and contractors leaving media containing ePHI in vehicles which were broken into. That figure is likely higher as it’s not a property tracked by OCR.
  • Similarly, postal mail was a prominent medium for inadvertently disclosing ePHI before 2012, but no incidents appear since. The incidents provide lessons in paying attention to detail. Some include sending the wrong patient information to recipients or inserting a list of patients and associated private information into a group mailing, and in once case ePHI was printed on the external mailing label. Additionally, backup media was mailed in a few instances and never reached its destination or was addressed to the wrong recipient.
  • For positive trends, loss of ePHI on portable devices has declined steadily since 2010, and is currently at 8 for 2013; what’s being classified simply as ‘computer’ for media has also steadily declined, and is currently at 0 for 2013; however, that may be due to better classification, as computer seems to be a catch all.

 

After spending a few hours becoming intimate with the data, I’m left with the feeling that the healthcare industry is making progress in some areas, but is overall struggling to clot the wound bleeding patient data. If you’ve done your own analysis and came up with different results, I’d love to hear what it is and how you arrived at your conclusions.

Stay safe my friends.

More from Data Protection

How governance, risk and compliance (GRC) addresses growing data liability concerns

4 min read - In an era where businesses increasingly rely on artificial intelligence (AI) and advanced data capabilities, the effectiveness of IT services is more critical than ever. Yet despite the advancements in technology, business leaders are increasingly dissatisfied with their IT departments.According to a study by IBM's Institute for Business Value, confidence in the effectiveness of basic IT services among top executives has significantly declined. While AI promises transformational capabilities, particularly generative artificial intelligence (gen AI), the road to realizing these benefits…

Access control is going mobile — Is this the way forward?

2 min read - Last year, the highest volume of cyberattacks (30%) started in the same way: a cyber criminal using valid credentials to gain access. Even more concerning, the X-Force Threat Intelligence Index 2024 found that this method of attack increased by 71% from 2022. Researchers also discovered a 266% increase in infostealers to obtain credentials to use in an attack. Family members of privileged users are also sometimes victims.“These shifts suggest that threat actors have revalued credentials as a reliable and preferred…

Ransomware on the rise: Healthcare industry attack trends 2024

4 min read - According to the IBM Cost of a Data Breach Report 2024, the global average cost of a data breach reached $4.88 million this year, a 10% increase over 2023.For the healthcare industry, the report offers both good and bad news. The good news is that average data breach costs fell by 10.6% this year. The bad news is that for the 14th year in a row, healthcare tops the list with the most expensive breach recoveries, coming in at $9.77…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today