Discussion:
Postfix patch which adds URL-grep information for smtpd policy services
(too old to reply)
Heiko Wundram
2006-03-29 23:31:35 UTC
Permalink
Hi all!

I've written a small patch against Postfix 2.2.9 (which is what I use at the
moment on the mailservers I administer) which adds URL grepping to data_cmd
in smtpd.c, and offers the corresponding information to a policy service
which is registered with an smtpd_end_of_data_restriction.

I've used the patch that's available for download from the URL below for the
last 12 hours on a pretty high-traffic site, a slightly more buggy (in terms
of not freeing its resources) version of the patch longer than that without
problems in email traffic, and it allowed me to filter out quite a lot of
spam mails directly during the smtp-session (which is pretty cheap in terms
of system resources), without passing the mail through a more thorough
examination by amavisd-new, for example, with a simple policy daemon which
checks the urls that are given to it against surbl.org (and does some caching
to enhance response time).

The patch itself is pretty dumb; it uses a regular expression to filter
hostnames, and the RE is of the form:

(https?://|@)([A-Za-z0-9%._-]+)

where the second group contains the hostname that's given to the policy
service. As the patch doesn't understand MIME or any other encoding, save the
structure of an email, it can be fooled by base64-encoded HTML-attachments to
a mail, but for the spam I receive, this is the exception, and not the norm.
An added benefit to this dumb approach is that the patch correctly gets all
hostnames for email addresses and message ids that are present in the header.

As this patch is pretty specific, I don't think it's something that's sensible
to be included in the Postfix core, but anyway, maybe somebody else out there
is interested in this. I'll post the policy daemon I use (a simple Python
script) some time tomorrow to the same subversion repository that contains
the patch. I'd be grateful if you drop me a note if this proves to be useful
for you.

URL: http://svn.modelnine.org/svn/postpatches/postfix-hw1.patch

PS: The patch applies cleanly against 2.2.10-rc1 (AFAICT), and applies pretty
cleanly against the current 2.3 snapshot. There's one rejection (IIRC), but
that's not hard to fix.

--- Heiko Wundram.
Victor Duchovni
2006-03-29 23:39:05 UTC
Permalink
Post by Heiko Wundram
Hi all!
I've written a small patch against Postfix 2.2.9 (which is what I use at the
moment on the mailservers I administer) which adds URL grepping to data_cmd
in smtpd.c, and offers the corresponding information to a policy service
which is registered with an smtpd_end_of_data_restriction.
I've used the patch that's available for download from the URL below for the
last 12 hours on a pretty high-traffic site, a slightly more buggy (in terms
of not freeing its resources) version of the patch longer than that without
problems in email traffic, and it allowed me to filter out quite a lot of
spam mails directly during the smtp-session (which is pretty cheap in terms
of system resources), without passing the mail through a more thorough
examination by amavisd-new, for example, with a simple policy daemon which
checks the urls that are given to it against surbl.org (and does some caching
to enhance response time).
The patch itself is pretty dumb; it uses a regular expression to filter
where the second group contains the hostname that's given to the policy
service. As the patch doesn't understand MIME or any other encoding, save the
structure of an email, it can be fooled by base64-encoded HTML-attachments to
a mail, but for the spam I receive, this is the exception, and not the norm.
An added benefit to this dumb approach is that the patch correctly gets all
hostnames for email addresses and message ids that are present in the header.
This is what pre-queue filters are for.
--
Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
<mailto:***@postfix.org?body=unsubscribe%20postfix-users>

If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.
Heiko Wundram
2006-03-29 23:41:53 UTC
Permalink
Post by Victor Duchovni
This is what pre-queue filters are for.
In terms of speed, the solution I presented is less general, but quite a bit
faster... That's mainly my concern, and that's why I messed with the
Postfix-Core to implement the grepping. As I said, it's certainly not
something you'd want to have in the core generally, but maybe it's useful for
someone besides myself. ;-)

--- Heiko.
Wietse Venema
2006-03-30 02:27:46 UTC
Permalink
Post by Heiko Wundram
The patch itself is pretty dumb; it uses a regular expression to filter
Perhaps a header/body_check action such as

/(https?://|@)([A-Za-z0-9%._-]+)/ check_policy_service xx:yy:zz $1$2

would be more appropriate?

Wietse
Ralf Hildebrandt
2006-03-30 07:55:54 UTC
Permalink
Post by Wietse Venema
Perhaps a header/body_check action such as
would be more appropriate?
Good idea!
--
Ralf Hildebrandt (***@charite.de) ***@charite.de
Postfix - Einrichtung, Betrieb und Wartung Tel. +49 (0)30-450 570-155
http://www.postfix-buch.com
Redmond WA -- Microsoft announced today that the official release date
for the new operating system "Windows 2000" will be delayed until the
second quarter of 1901. -- seen in Brian Hatch's sig
Heiko Wundram
2006-03-30 09:28:29 UTC
Permalink
Post by Wietse Venema
Perhaps a header/body_check action such as
would be more appropriate?
I tried this, but I could not get it to send me the full information that's
available for the policy service in smtpd_*_restrictions. I basically
implement very much longer greylisting times for mail that contains
SURBL-URLs (at the moment, as the organization I administer the mailserver
for can't legally bounce them), and as such, it was more convenient and
appropriate to have the list of caught URLs in the
smtpd_end_of_data_restriction, together with all other info that the policy
service gets there.

If I've overseen something and the data is also available directly with a
header/body_check, please elaborate. I know that I could by some magic infer
the message ID (for example by looking at the smtpd-instance which is
connecting), then attach the header/body_checks to this message ID, and then
write a smtp_end_of_data_restriction to finish this off (which gets the
message ID too), but as I said, I could not get this to work properly.

Anyway, thanks for your reply!

--- Heiko.
Heiko Wundram
2006-03-30 12:29:22 UTC
Permalink
I tried this, ...
I should add: I tried implementing something like this in the queue process
(which handles header/body_checks, doesn't it?) before directly putting it
into smtpd, which was by far easier to achieve in my case...

--- Heiko.
Greg Hackney
2006-03-30 16:30:34 UTC
Permalink
I've been using body_checks such as:

if /http:/
/healthmedz.com/ DISCARD
/heavyoemetal.com/ DISCARD
/hourforflower.com/ DISCARD
endif

manually built from the top spam URLs from:
http://spamcop.net/w3m?action=inprogress&type=www
http://spamvertised.abusebutler.com/spamvertised.php?rep=last24

But it sure would be nice to have a fast, automated, and free
subscription service for this
type of blocking.
--
Greg
Devdas Bhagat
2006-03-30 12:18:06 UTC
Permalink
Post by Wietse Venema
Post by Heiko Wundram
The patch itself is pretty dumb; it uses a regular expression to filter
Perhaps a header/body_check action such as
would be more appropriate?
This is not a documented feature (and it doesn't work, as expected).
Would this be a proposed feature?

Devdas Bhagat
Wietse Venema
2006-03-30 12:23:33 UTC
Permalink
Post by Devdas Bhagat
Post by Wietse Venema
Post by Heiko Wundram
The patch itself is pretty dumb; it uses a regular expression to filter
Perhaps a header/body_check action such as
would be more appropriate?
This is not a documented feature (and it doesn't work, as expected).
Would this be a proposed feature?
Indeed, proposed just like the smtpd patch that was posted here.

There are, however, some details that need to be settled, because
the check_policy_service feature currently implements only the
"check_access" query, while the above appears to require a different
query type, like "check_url".

Wietse
Heiko Wundram
2006-04-03 20:18:27 UTC
Permalink
Post by Heiko Wundram
I'll post the policy daemon I use (a
simple Python script) some time tomorrow to the same subversion repository
that contains the patch.
I've posted the policy daemon I use to the public subversion repository at the
URL I posted before, and up to now, I've had a spam filter quote of about
60-70% right during the SMTP-Session using the patch and the policy daemon
that just queries SURBL for the extracted URLs from the mail body, without
delaying mail-processing (except for an initial SURBL query).

--- Heiko.

Loading...