bogofilter and why HAM is so important
November 2008
Sun Mon Tue Wed Thu Fri Sat
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30            
About
This site is an effort to share some of the base knowledge I have gathered through all this years working with Linux, FreeBSD, OpenBSD, Python or Zope, among others. So, take a look around and I hope you will find the contents useful.
Recent Entries
Recent Comments
Re: FreeBSD ports and Python versions(Wu : 11/15 18:50)
Re: Aprender Python(wu : 11/14 18:39)
Re: Aprender Python(Oscar : 11/14 01:27)
Re: FreeBSD ports and Python versions(Viktor Petersson : 11/13 20:29)
Re: SMTP connection using telnet(Wu : 11/04 12:58)
Re: SMTP connection using telnet(Sifiso : 11/04 09:33)
Re: A nice surprise from greece(w0lfshade : 10/27 21:53)
Re: A nice surprise from greece(graffic : 10/27 10:59)
Re: A nice surprise from greece(betabug : 10/27 09:38)
Re: La triste realidad(ClaytoN : 10/07 11:41)
Other Weblogs
Recent Trackbacks
Categories
OpenBSD (7 items)
BSD (0 items)
FreeBSD (5 items)
Linux (0 items)
Security (3 items)
Python (8 items)
Zope (9 items)
Daily (43 items)
e-shell (4 items)
Hacks (4 items)
PostgreSQL (1 items)
OSX (7 items)
Nintendo DS (0 items)
enlightenment (0 items)
Apache (1 items)
Nintendo Wii (0 items)
Django (19 items)
Music (5 items)
Plone (4 items)
Archives

Syndicate this site (XML)

RSS/RDF 0.91

18 agosto
2008

bogofilter and why HAM is so important

Just in case anyone else suffer the same *X-files* thing in his/her/its mail server...

Just some hours ago I noticed one of my mail servers (FreeBSD 6.x-stable + Sendmail + bogofilter + bogom) became a big black hole for spam. It seems (as reported from users) that a lot of spam was getting into the users' mailboxes.

I checked and re-checked everything but everything seemed to be ok. Sendmail was running ok, bogom was up too... strange. Then I noticed one thing in my logs:

Aug 18 16:12:51 prunus sm-mta[47093]: m7IECjAn047093: Milter insert (0): header: X-Bogosity: Ham, spamicity=0.000046
Aug 18 16:13:04 prunus sm-mta[47095]: m7IECrMe047095: Milter insert (0): header: X-Bogosity: Ham, spamicity=0.000002
Aug 18 16:13:05 prunus sm-mta[47106]: m7IECxi0047106: Milter insert (0): header: X-Bogosity: Ham, spamicity=0.000002
Aug 18 16:13:06 prunus sm-mta[47107]: m7IED0IH047107: Milter insert (0): header: X-Bogosity: Ham, spamicity=0.000000
Aug 18 16:13:18 prunus sm-mta[47123]: m7IEDBtR047123: Milter insert (0): header: X-Bogosity: Unsure, spamicity=0.520000
Aug 18 16:13:18 prunus sm-mta[47128]: m7IEDCt8047128: Milter insert (0): header: X-Bogosity: Ham, spamicity=0.000440
Aug 18 16:13:23 prunus sm-mta[47108]: m7IED6Gu047108: Milter insert (0): header: X-Bogosity: Unsure, spamicity=0.520000

Everything was being marked as HAM/UNSURE! That was the reason, time to find out what was causing it.

bogoutil helped me a lot finding the problem, as with it I was able to get the ammount of both spam and ham messages known by the filter, just issuing something like:

sudo bogoutil -w /var/spool/bogofilter/wordlist.db .MSG_COUNT

In the failing server I got:

                                 spam   good
.MSG_COUNT                       3457     0

While in another server I got:

                                 spam   good
.MSG_COUNT                        282   3337

As this last one didn't experience the same problem as the failing one, and with my mind suddenly remembering Juanjo's recommendations, I realized myself that the problem was, in fact, that bogofilter had no idea about what's ham for that server (if you are interested in how bogofilter works, take a look at the theory of operation manpage section).

Just getting some mails from one of the users' INBOX and adding them as ham to bogofilter

sudo bogofilter -n < user_INBOX_dump

helped to get things back to work. Just in case, I checked the list of spam/ham counts again:

                                 spam   good
.MSG_COUNT                       3457     52

and the sendmail logs too:

Aug 18 17:53:42 prunus sm-mta[52981]: m7IFrYvj052981: Milter insert (0): header: X-Bogosity: Spam, spamicity=1.000000
Aug 18 17:53:51 prunus sm-mta[52988]: m7IFrjYr052988: Milter insert (0): header: X-Bogosity: Spam, spamicity=1.000000
Aug 18 17:53:54 prunus sm-mta[52990]: m7IFrlKV052990: Milter insert (0): header: X-Bogosity: Spam, spamicity=0.999887
Aug 18 17:53:57 prunus sm-mta[52989]: m7IFro2f052989: Milter insert (0): header: X-Bogosity: Spam, spamicity=1.000000
Aug 18 17:54:02 prunus sm-mta[53003]: m7IFrsDG053003: Milter insert (0): header: X-Bogosity: Spam, spamicity=1.000000
Aug 18 17:54:46 prunus sm-mta[53024]: m7IFscv1053024: Milter insert (0): header: X-Bogosity: Spam, spamicity=1.000000
Aug 18 17:54:58 prunus sm-mta[53031]: m7IFsqGG053031: Milter insert (0): header: X-Bogosity: Spam, spamicity=1.000000

Of course I've no idea what happened with the HAM that was supposed to be in the wordlist.db file, now's time to do some phorensics and try to find out what happened here (but hey, the problem was solved at least!)

UPDATE: Found it! This morning, just after arriving at the office, I took a look over the MSG_COUNT value on that faulty server, just to find:

                                 spam   good
.MSG_COUNT                       3460     49

It was decreased since yesterday, so it had to be related to my script that feeds bogofilter every night. Just taking a look at the script I realized where was the problem. The script was calling bogofilter to add spam messages this way:

bogofilter -Ns -B inbox_dump

And, from the man page of bogofilter:

The -N option tells bogofilter to undo a prior registration of the same
message as non-spam. If a message was incorrectly entered as non-spam
by -n or -u and you want to remove it and enter it as spam, then use
-Ns. If -N is used for a message that wasn't registered as non-spam,
the counts will still be decremented.

which means my daily feeding script was removing tokens from the ham list. As my script runs everyday (through a cron job) and the spam added everyday is a lot of spam, it was removing such tokens too much quick. Removing the -N option when feeding the beast will solve the problem. (Or I hope so)

Posted by wu at 19:48 | Comments (2) | Trackbacks (0)
<< From scripting to object-oriented Python programming | Main | Ya estan aquiiii.... >>
Comments
Re: bogofilter and the importance of HAM

Here it is my output:

# bogoutil -w wordlist.db .MSG_COUNT
spam good
.MSG_COUNT 322 2488

May be if you compact the database frequently, in the process the old tokens are discarded (I'm not sure about this).

Posted by: Juanjo at agosto 18,2008 19:52
Re: bogofilter and the importance of HAM

Well, I didn't compact the database (yet) but seems everything was caused by my daily feeding script (take a look at the post update).

Anyway, thnx a lot for your comment!

Posted by: Wu at agosto 19,2008 14:08
Trackbacks
Please send trackback to:http://blog.e-shell.org/90/tbping
There are no trackbacks.
Post a comment