If you want to know how important spam filters are to your online experience, try turning them off for a day. You’ll quickly see why these tools we tend to take for granted are so essential. We may not know how spam filters work, but we’re grateful that they do.

Spam volumes have been dropping in recent years, but there’s still plenty of junk out there. According to Trend Micro’s Global Spam Map, volumes exceed 400 billion messages on some days, but we almost never see spam in our inbox. Why is that?

In the cat-and-mouse game of cybersecurity, spam is one area where the good guys have kept reasonably well ahead of the bad. And the outlook for the future is bright: Machine learning could take spam filtering to a new level.

There are many approaches to catching spam, but they all do basically the same thing: scan header information for evidence of malice, look up senders on blacklists of known spammers and filter content for patterns that point to junk mail. The first two tasks are mostly science — the third is art.

Deciphering Header Data

Header information is that long river of text at the top of an email that you thankfully never have to see. It looks like this:

Received: by with SMTP id p66csp1537538iof

X-Received: by with SMTP id p87mr2784731ioo.80.1477075567036

Fri, 21 Oct 2016 11:46:07 -0700 (PDT)

Buried beneath all that gobbledygook is important information. It shows things like the IP address of every server that touched the email, date and time stamps, security signatures and other stuff you don’t need to know, but is useful in understanding where that mail came from. Spam filters look for attempts to deceive the recipient (e.g., g00gle.com instead of google.com) and compare addresses to blacklists of known spammers to automatically filter out those that match.


Blacklists are lists of known spammers collected by internet service providers (ISPs), email providers and server administrators. Anyone can create and publish a blacklist, but the most popular ones, such as SpamCop, Spamhaus and URIBL, have the most credibility. Publishers create these lists by monitoring spam reports from users. That’s why it’s important to label unwanted email as spam. When you do so, you’re helping to keep everyone’s mailbox pristine.

Smart spammers have ways of disguising header information to make their messages look genuine. Not all spammers are smart, however, so header analysis alone catches a lot of the most obvious spam. Even spammers who are good at cloaking information may overlook some telltale details. If delivery reporting is disabled, for example, it’s a sign that the sender is transmitting a large volume of mail and doesn’t want to be bothered with bounce messages. That’s a possible spammer.

There’s no one rule for how spam filters work. Each has its own quirks. Some frown on email sent from free services like Hotmail and Gmail, for example, or may downgrade messages targeted just to an email address without an accompanying name. Each engine is unique. Fortunately, email administrators can manipulate most of these settings to their liking.

Content Filters

The art of spam filtering comes into play when analyzing the contents of a message. This is where the best filters shine, but it’s also where legitimate messages can end up in spam purgatory.

Some content tactics are almost certain to land a message in the spam folder. Emails containing attached executable files or links to blacklisted websites are sure giveaways, as are those with common spam keywords. A few years ago, many spam filters flagged emails containing short codes from services like bit.ly and 3.ly. With the profusion of short codes spawned by Twitter, however, that tactic is less common today.

If those schemes are so easily detected, you might wonder why spammers continue to use them. Unfortunately, there are enough gullible people out there that even a very low hit rate can be profitable. High-volume spammers don’t expect more than about a .1 percent open rate, but that still translates to 1,000 people for every 1 million messages sent.

“When you get a reply, it’s 70 percent sure that you’ll get the money,” one spammer told the Los Angeles Times in a 2005 interview. Although much has changed since then, even a minuscule response rate can be profitable if the volumes are large enough, and spam is free and easy to send.

Machine Learning: Changing How Spam Filters Work

With the advent of powerful machine learning algorithms and big data economics, there’s potential to change how spam filters work.

Apache SpamAssassin is a widely used platform that incorporates advanced statistical techniques to score incoming messages. The same tactics that are applied to detecting fraudulent reviews on travel and e-commerce sites can work in spam analysis as well. When you mark a message as spam, it goes into a hopper with millions of messages that others have flagged. Algorithms churn through these messages to find similar characteristics, such as word proximity or misspellings, that show up frequently in spam.

Cloud computing is also changing the rules of spam filtering by making more powerful filters available to a broader audience at lower cost. Cloud services are increasingly displacing on-premises filters, bringing the benefits of economies of scale. Because cloud providers collect data from many sources, they can compile large databases for machine learning processing. The result should be better content filtering.

You can fine-tune your own spam settings by specifying senders or domains to exclude. Some email administrators even like to loosen controls to be sure legitimate messages don’t get caught. Either way, it’s a good idea to check your spam folder every few days to ensure messages you’ve been waiting for aren’t lurking there. Spam filters are pretty good these days, but nothing’s perfect.

Read the white paper: Accelerating growth and digital adoption with seamless identity trust

More from Fraud Protection

Kronos Malware Reemerges with Increased Functionality

6 min read - The Evolution of Kronos Malware The Kronos malware is believed to have originated from the leaked source code of the Zeus malware, which was sold on the Russian underground in 2011. Kronos continued to evolve and a new variant of Kronos emerged in 2014 and was reportedly sold on the darknet for approximately $7,000. Kronos is typically used to download other malware and has historically been used by threat actors to deliver different types of malware to victims. After remaining…

6 min read

How Security Teams Combat Disinformation and Misinformation

4 min read - “A lie can travel halfway around the world while the truth is still putting on its shoes.” That popular quote is often attributed to Mark Twain. But since we're talking about misinformation and disinformation, you’ll be unsurprised to learn Twain never said that at all. In fact, no one knows who first strung those words together, but the idea that truth spreads slowly while lies spread quickly is at least several hundred years old. The “Twain” quote also serves to…

4 min read

A View Into Web(View) Attacks in Android

9 min read - James Kilner contributed to the technical editing of this blog. Nethanella Messer, Segev Fogel, Or Ben Nun and Liran Tiebloom contributed to the blog. Although in the PC realm it is common to see financial malware used in web attacks to commit fraud, in Android-based financial malware this is a new trend. Traditionally, financial malware in Android uses overlay techniques to steal victims’ credentials. In 2022, IBM Security Trusteer researchers discovered a new trend in financial mobile malware that targets…

9 min read

New DOJ Team Focuses on Ransomware and Cryptocurrency Crime

4 min read - While no security officer would rely on this alone, it’s good to know the U.S. Department of Justice is increasing efforts to fight cyber crime. According to a recent address in Munich by Deputy Attorney General Lisa Monaco, new efforts will focus on ransomware and cryptocurrency incidents. This makes sense since the X-Force Threat Intelligence Index 2022 named ransomware as the top attack type in 2021. What exactly is the DOJ doing to improve policing of cryptocurrency and other cyber…

4 min read