Reject Spam Rejection

The case for rejecting spam

The proponents of rejecting spam with an SMTP error code (5xx) while the spammer is still connected would have you believe that this is the best idea since sliced bread. Their argument goes something like this:

  1. It immediately tells the sender that you've rejected their email. When the sender is not a spammer, you want the sender to be notified somehow that their message didn't get through and why.
  2. Spammers sometimes use the rejections to clean up their address lists. This means you will receive less spam in the long run.
  3. It saves money.
  4. It helps you avoid legal issues relating to spam. There may be legal issues relating to financial scams, money laundering, creation of a hostile working environment, etc.
I'm not 100% opposed to rejecting spam, but the idea has fundamental flaws.

General reasons

  1. Bayes training data! All that spam is great training data for Bayesian filtering and it's specific to a particular user. And if the spam is high-scoring spam, systems like SpamAssassin are able to automatically learn the message as spam and better recognize similar, but low-scoring spam.
  2. Finally, SpamAssassin and the concept of spam filters get blamed for misconfigured setups that reject mail in this way. (Well, I think they are all misconfigured, but other misconfigurations can cause additional false positives.)

SMTP 5xx - good idea with worse flaws

  1. SMTP bounce messages are sent by the sending mail server.

    There is no guarantee a sender of a false positive will receive a bounce message and even less of a guarantee that they will understand what it means or why they received it, especially for less technical users.

  2. Relying on the remote server to send a reject message gives you no reliable/comprehensible opportunity to provide an explanation that users can follow nor any way to provide a backdoor (like a password bypass, a web page, or other contact information).
  3. It relies on the faulty notions that spammers pay attention to error codes, that spammers clean their address lists, and that spammers compiling lists are the same people sending you spam.

    This is the one use of spam filters that causes the most flack and negative PR. Based on the magnitude of the flack resulting from rejection vs. tagging or foldering, it's hard to imagine how the total cost is lower.

Post-SMTP rejection - more worser

So, you want to send a friendly spam rejection that's easily readable. The Catch-22 is that you can make it readable, but you can't really be sure you're sending it to the right person.

In theory, a good way to reject spam is to accept the message at SMTP time and reject using a user-friendly customized reply with a full explanation and bypass mechanism of some sort. Why doesn't that work?

You do have to be very careful about using forged From: addresses since the From: address comes from the sender, so it can't be trusted and you'll just end up annoying people who are innocent. In theory, checking the sending IP against the domain, using SPF, or another sender verification method can fix this, but practically everyone gets this wrong.

I think most of the desire to reject at SMTP time comes from a gut reaction to spam. It's what I would call the "I won't even let it inside my network! Hehehe!" approach. However, SMTP was not designed for this. It's a misuse of the protocol with a seriously negative impact on usability. Maybe this can be addressed somehow in the future.

Better approaches

This also doesn't mean that all SMTP-based approaches to combating spam are invalid. Some SMTP-based approaches are okay, I think:

  1. teergrubing-style slowdowns on suspected spam (without rejection)
  2. local quarantine of questionable messages, retesting them later when additional data (bayes, distributed systems, etc.) is available.

Other flawed ideas

  1. Doing a temporary failure via 4xx code because you can't trust the sending MTA to resend.