Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

email bouncebacks

by jalspach (Acolyte)
on Dec 03, 2003 at 10:34 UTC ( [id://311866]=perlquestion: print w/replies, xml ) Need Help??

jalspach has asked for the wisdom of the Perl Monks concerning the following question:

I have been tasked with taking bounced messages and creating a spreadsheet with the rejected addresses and the reason they bounced (box full, bad address, etc...). Eventually I will search a database for the bad address to get the users name, and phone number (I will probably use something like WWW:Automate, since I only have www access to the data. However, I am not sure, since my client requires NTLM and I do not know how this will effect things).

For now, however, I am just trying to handle the first part. Catch a bounced message and store the bad addresses (and if possible the reason for the bounce) in a spread sheet.

This sounds like something that would be a module already, but I could not turn anything up. Bounced messages are almost formatted similarly enough that this would be easy...almost.

I will continue to meditate on this while awaiting word from the spires of the monastery.


Thank you for any help or direction.
James

Replies are listed 'Best First'.
•Re: email bouncebacks
by merlyn (Sage) on Dec 03, 2003 at 11:06 UTC
Re: email bouncebacks
by Art_XIV (Hermit) on Dec 03, 2003 at 14:43 UTC

    I faced a similar problem w/ a web site's mailing list for which I was responsible. This was sending out tens of thousands of emails weekly. One of the biggest hassles was distinguishing bounces from full mailboxes/downed servers from those who had cancelled their accounts or had them deactivated.

    This was on a Windoze environment with an Exchange server. I spent days trying to come up with an effective way to collect, parse and act upon the bounces when suddenly, in a zen-like moment of enlightenment, it occurred to me that I could just parse the web server's SMTP logs. About eight hours later (yeah, I'm slow), I had a Perl script that worked like a charm.

    I ended up doing an end-run around the problem with trying to distinguish temporary failures from more permanant ones, though. It was still pretty difficult to distinguish even with the SMTP logs. My solution was to parse the logs for the previous four weeks, and if four failures in different weeks were found for a given email address, they were removed from the mailing list. This seemed to adequately cover temporary problems.

    Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"
Re: email bouncebacks
by Abigail-II (Bishop) on Dec 03, 2003 at 10:40 UTC
    Bounced messages are almost formatted similarly enough that this would be easy...almost.

    I wish that would be true! That would simplify filtering out bounces I get because virus loaded emails who fake my address as return address are rejected by virus checkers. There doesn't seem to be much standard in the content of bounce messages.

    Abigail

Re: email bouncebacks
by McD (Chaplain) on Dec 03, 2003 at 16:11 UTC
    I've got some experience here, so let me throw out a few observations in addition to what everyone else has said.

    • You can create a "close" soloution to your problem pretty easily, but the "perfect" soloution is a lot of work. For example, you're right that bounces tend to be regularly formatted - but not always, and it's tough to catch all the edge conditions. Depending on your volume and how close you need to be to perfect, this may or may not be much of an issue.

    • Not every bounce message contains the original address that you sent to. It may have been changed en route via a forwarding mechanism, or it may simply be missing. Recognizing who such a bounce is for is tricky.

    • Not every bounce is accurate - I've seen "successful delivery" notices that actually indicate failure, for example. Also, a message that bounced today might deliver tomorrow, depending on the cause of failure.

    Good luck!

    Peace,
    -McD
Re: email bouncebacks
by zakzebrowski (Curate) on Dec 03, 2003 at 15:16 UTC
    Aside, with respect to spreadsheets:
    • Try csv first. If you don't have any special characters, you can just print OUT "val1,val2,val3\n"; or use the CSV modules for formatting...
    • If you are doing microsoft excell, consider SpreadSheet-WriteExcel
    Cheers.


    ----
    Zak
Re: email bouncebacks
by sgifford (Prior) on Dec 03, 2003 at 21:30 UTC

    You want VERPs.

    The problem is there's no standard for parsing a bounce message (well, DSNs are a sort of standard, but nobody follows them). To make this work, you'll have to write a filter for every mail program in existence, to pull out the right data. This doesn't sound that bad, but there are more mail server software packages than you think...

    There is one thing that's standard, however---where the bounce messages go. Bounce messages are always sent to the envelope sender of the message. So if there was only some way to encode the recipient into the envelope sender, so you'd know who the bounce message was coming from...

    Oh, wait, that's right---That's what VERP does! :-)

    The idea is that you use a different envelope sender for each message you send out, and encode the recipient into that, along with whatever other information you'll find useful. For example, a message to me from your list might be from <jalspach-bouncemanager-sgifford=tir.com@jalspach.com>. If the message bounced, it would be sent to this user, and the mail server for jalspach.com could tell what address caused the bounce by looking at who the bounce was sent to.

    This requires that your mail software supports extension addresses. qmail supports these with dashes---I can configure my system so that sgifford-anything goes to my account. sendmail does the same thing but with plus signs---sgifford+anything.

    Once you've got that, you set up a program to handle mail for the account (jalspach-bouncemanager, in the above example), have it look at the envelope information, and write that information to a database or text file.

    Hope that helps!

      You're spot on with the VERP suggestion, but:

      (well, DSNs are a sort of standard, but nobody follows them)

      Actually, DSNs are overall very popular - just writing a simple DSN parser will recognize a large majority of "bounces".

      There is one thing that's standard, however---where the bounce messages go. Bounce messages are always sent to the envelope sender of the message.

      Not entirely true.

      Bounce messages are supposed to go to the envelope FROM address, according to the spec, but they don't always - sometimes, they go to the header From: address instead. It's not a lot - about 1% of bounces, in my experience - but it's some.

      Qmail rocks - but to catch ALL bounces, you also need to consider those that are directed to the header From: address as well.

      Peace,
      -McD
Re: email bouncebacks
by jalspach (Acolyte) on Dec 04, 2003 at 16:10 UTC

    I would like to thank you all very much for your input.


    I am in the unhappy position of having to run this report with both hands tied behind my back...or at least a hand and one foot. My only access to the mail is via Outlook and my access to support data (for later pulling of phone numbers that correspond to the addresses)is via a web page. I will be working from either a CSV dump of a bounce back email folder or I will set up a rule to have outlook bounce a copy of the bounce back messages to my personal mail server.

    Our messages go out around the first of the month, so I will only have about a 1 week period of messages to work with at a time (unfortunately, I will have no way to wait for multiple bounces to help rule out full boxes).

    My current plan for a first run, is to grep anything that looks like an e-mail address from the dump file, into a new report.

    The next step will be to work on obtaining the phone info from the database and including that in the report.

    Lastly, I may work to include a search of the dump file, for fraises that denote soft errors (box full or the like). And mark those addresses in the report (this way customer service can decide what to do with it (wait until the next month or call the person right away).

    This will give the customer service department something that they can use at each step.

    This all sounds pretty straight forward, I may still be back for help but I think you have all given me a great foundation to start with.

    James

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://311866]
Approved by Corion
Front-paged by diotalevi
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-04-19 20:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found