UPDATED SUMMARY: Simple anti-spam system using open-source software and freely-available data

From: Rich Kulawiec <rsk_at_gsp.org> Date: Fri Jul 23 2004 - 16:40:46 EDT · This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:36 EST

This is an update of:

	SUMMARY: Simple anti-spam system using open-source software and freely-available data
	http://www.sunmanagers.org/pipermail/sunmanagers/2003-August/024169.html

which you might want to browse through before reading this -- though
it's not really necessary, as this is a complete rewrite.

This is the approach that I use.  Let me emphasize "approach": I don't
do all these things on all mail servers, and I don't do them in the
exact same way, because every server/domain gets a different mix of
incoming spam.  It's always important to try to figure out what that
mix looks like and tailor the blocking to match it.  But most of this
will work most of the time for most people -- and in a lot of cases
it's turned out to "good enough" that more work isn't necessary.
In others, it's been "good enough" that the additional work required
is made quite a bit easier by it.

So here goes.

I run sendmail and have had excellent results using a layered approach
to blocking spam.  The general idea is to use those measures which
are computationally cheapest first, in order to reduce the burden on
subsequent layers.  The approach I'm taking (outlined below) would also
work for other MTAs (e.g. postfix, exim) on other 'nix systems.

I don't do any kind of content analysis: I'm in agreement with Paul Vixie
on this one: either people share our values or they don't.  If they do,
then they don't allow spam to flow out of their networks (at any rate
beyond a trickle, which is probably inevitable).  If they don't, then
they're either actively supporting spammers or incidentally supporting
them through neglect and incompetence -- and the reason doesn't really
matter to me, my users, my systems or my networks.

More succinctly: systems and networks which emit spam are broken and
should either be repaired immediately or physically disconnected from
the Internet until they are.

More bluntly: I'm not going to waste my resources trying to sort out clean
water from sewage.  That responsibility rests with the people whose servers
and networks are spewing effluent through the pipes designated for water.

1. I use this:

	The Spamhaus Project: DROP (Don't Route Or Peer) List
	http://www.spamhaus.org/DROP/

at the firewall and router level, or in the sendmail 'access' file
when that's not possible.  These are networks which are 100%
controlled by spammers, so no good can come of accepting their traffic.

I've augmented this locally by a few particularly problematic networks;
for example, after reading these:

	Call for Internet Death Penalty: Burstnet/Hostnoc
	http://groups.google.com/groups?selm=20030708121252.GA14167%40example.com

	Call for Internet Death Penalty #2: Optigate/Optinrealbig
	http://groups.google.com/groups?selm=20040604204406.GA2771@example.com

	Call for Internet Death Penalty #3: Hopone/Superb
	http://groups.google.com/groups?selm=20040604204549.GA637@example.com

their network allocations are now a fixture in my deny lists.  It's up to
you, of course, but I see no reason to ever accept another packet from them.

2. I have configured sendmail to reject all mail from domains which
don't resolve.  This also blocks mail from broken mail servers, but
since there's no way to tell them to fix their DNS...

Sendmail comes set up this way by default on most systems.

3. I have set up sendmail to issue a multi-line SMTP greeting banner.
This causes a surprising amount of the malware installed on hijacked Windows
systems to fail, as it's not set up to deal with that.  No doubt future
malware will cope with this, but for the last year it's been very useful.
Simple, easy, fast, and satisfying. ;-)

4. I then use a very large list of domains, via the sendmail 'access'
file.  This is handy because the access file is hashed, thus lookups
are roughly O(1) no matter how large it becomes.  But it's also error-prone:
in fact, during the past two years, every time I've had a false positive
reported to me, this is where I've traced it to on all but two occasions.

But - considering that I'm using a list of about 128,000 domains and
have had less than a dozen false positives in two years, it seems like
a reasonable approach.  Doubly so because this step alone blocks from
30% to 40% of incoming spam with very little overhead.  Even more so
because reduces the number of DNSBL queries (see step 8) which not
only reduces my outbound traffic, but the load I impose on the DNSBLs
that I'm using.

Many domain lists are also available; here's a few of them:

	http://www.rhyolite.com/anti-spam/unwelcome.html
	http://www.river.com/ops/spam/bad-domains.txt
	http://www.spamblocked.com/killfile
	http://www.znet.com/blocked-domains.html
	http://www.cluelessmailers.org/listings/blacklistbydomain.html
	http://obob.manilasites.com/
	http://www.carl.net/spam/access.txt
	http://www.unixgirl.com/blockeddomains.html
	http://www.cart00ney.org/blocklist.txt
	http://abuse.easynet.nl/spamlist-usage.html

Note: if you use a large list of domains in the sendmail 'access' file,
you will want to RTFM on "makemap" and note the "-c" flag.  The speedup
in rebuilding the hash is quite significant.

5. I block all mail from certain TLDs on some mail servers because
the people using those servers don't expect to ever receive mail
from those places.  I don't like doing this, because it's such a drastic
measure, but it's too effective a technique not to use.  In particular,
I routinely block:

	.cn (China)
	.kr (Korea)
	.tw (Taiwan)

I'm about >this close< to adding .biz to that list.

Of course, if you actually expect to get non-spam mail from those TLDs,
you probably can't do this.  This is why I don't block .br, for example:
I have users who actually get non-spam mail from there.  But if you don't,
you might want to consider blocking it.

6. I use a few special-purpose rules in the sendmail access file to
take care of spam from hijacked CacheFlow servers, hijacked AOL
proxy servers, often-forged addresses, and so on.  Let me know if
you want them: they're pretty simple/short/easy.

7. I use ~150 subdomains (also in the sendmail access file) which
correspond to dynamically-allocated IP space, e.g. "dhcp.example.com".
I don't like doing this either, but it's also too effective not to use:
spam from hijacked PCs on cable/DSL connections is epidemic.  I have
been slowly expanding this because it seems to be filling in gaps that
the other measures are missing.

Note: in most cases, the users on such networks are contractually obligated
to use their ISP's designated outbound mail server(s).  So the only SMTP
traffic that this measure blocks is (a) spam from zombies (b) spam from
the spammers' own systems and (c) mail from people who are deliberating
violating their own ISP's TOS.  It's correct to say that (c) isn't
necessarily spam: but I'm not going to lose any sleep over blocking
it anyway.

8. I use multiple DNSBLs, each of which targets a slightly different
mix of spam.

For starters, I use

	cn-kr.blackholes.us
	tw.blackholes.us

for the same reason I block .cn, .kr and .tw -- see step 5 above.  Again,
this may not be a reasonable step for everyone, but check www.blackholes.us
for other available DNSBLs that might be.  They have quite a wide selection,
both by country and by ISP/host.  But locally, use of those two DNSBLs alone
nails about 30% of incoming spam.

I then use these DNSBLs (each listed with DNSBL name and web site)

	sbl-xbl.spamhaus.org		http://www.spamhaus.org/sbl/
					http://www.spamhaus.org/xbl/
	dnsbl.ahbl.org			http://www.ahbl.org/
	list.dsbl.org			http://dsbl.org/
	dnsbl.njabl.org			http://njabl.org/
	relays.ordb.org			http://ordb.org/
	l1.spews.dnsbl.sorbs.net	http://www.spews.org/

The Spamhaus SBL+XBL combined DNSBL is a must-have.  I have never had
a false positive with it.  And the relatively recent addition of the
XBL picks up millions of zombie Windows machines that are spewing spam.

The AHBL augments this nicely, and includes a RHSBL (right-hand-side BL)
which handles blocking by domain name.  If you don't want to do step 4,
this is a good substitute.

The DSBL, NJABL, and ORDB all pick up different combinations of open relays,
open proxies, hijacked systems, etc.

The SPEWS list -- despite what some of its less-informed critics have
said -- is very accurate and correctly targets the spam-supporting ISPs
and hosts who are directly responsible for much of the spam we all endure.

Other DNSBLs that I have either used or am considering using:

	Blitzed OPM	http://opm.blitzed.org/
	PDL		http://www.pan-am.ca/pdl
	Leadmon		http://www.leadmon.net/spamguard/
	SORBS		http://dnsbl.sorbs.net/
	FiveTen		http://www.five-ten-sg.com/blackhole.php

NOTE: You should probably not use any DNSBL until you've read its policies.

NOTE: If you intend to make heavy use of these DNSBLs, you should probably read
their web sites and see about doing zone transfers.

NOTE: I find it very useful to run a local copy of BIND in caching mode on
every mail server, since those servers often get repeatedly pummeled from the
same sets of addresses.  This not only enhances performance locally, but cuts
down on the load my servers impose on the DNSBLs.

NOTE: DNSBLs are invoked sequentially by sendmail, so it's a good idea to
put the one that blocks the most spam as seen by your servers first.  But
figuring out which that is can be quite an effort.   For most people,
the Spamhaus SBL+XBL DNBSL is a pretty good first guess, though.

9. I'm experimenting with using rbldnsd to run my own internal DNSBL --
replacing, in part, the sendmail 'access' file.

The upside of doing this is that rbldnsd stores information in a very
compact format with a low memory footprint; it's designed to serve DNSBLs,
not as a general purpose DNS server.  Another advantage is that keeping
the information in rbldnsd would allow it to be used by sendmail, postfix,
exim, whatever.  Yet another is that it can be queried easily (contrast
with the sendmail 'access' file).

The downside is that it's another process to run; it requires a different
format than sendmail (which means reworking scripts, etc.); and it's one
more step that could conceivably fail.  (Mitigating this is that sendmail
presumes a non-responding DNSBL means "not listed" and thus fails soft.)

It's not clear to me yet who this experiment will turn out, but the early
results are promising enough for me to suggest to others as a possible
course of action.

10. My best estimates of the performance of all this is that the local
measures (1-7) block about half the spam that is blocked, and the
DNSBLs (8) block the other half of the spam that is blocked.  The blocking
rate itself appears to be somewhere around 93% to 97%: it varies as spammers
switch networks or domains, or activate new groups of zombies.

The false positive rate is about 1 per month; but I need to caveat that by
stating that unreported false positives may still be lurking.  (On the 
other hand: my users squawk pretty loud and fast when something goes wrong,
so I don't think there are many.)

NOTE: Assessing performance of anti-spam techniques requires both the FN
(false negative: unblocked spam) and FP (false positive: blocked non-spam).
It's easy to drive either to 0; it's hard to do both at once.

NOTE: Everybody's incoming spam and non-spam mix is different.  The only way
to really figure out which of these steps will best minimize (FP, FN) is to
analyze the statistics.  But 1, 2, 3, and some of 8 are nearly always a
good first guess, and in some cases, they solve enough of the problem that
further analysis/measures aren't necessary.

---Rsk
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers