Tuesday, October 2, 2012

Script to extract email attachments from mbox / Thunderbird

Here's a Python script that extracts file attachments from emails
stored in an mbox file. Mbox is the format used by Thunderbird and
plenty other mail clients. That script can be used to automatically
save email file attachments to a specific folder. Using the content
type you can customize what's get downloaded/saved and where files are
saved. A setting file saves the state of which messages have been
saved.
import mailbox, pickle
mb = mailbox.mbox('/path/to/INBOX')

prefs_path = '/path/to/config/.save-attachments'
save_to = '/path/to/Downloads/attachments/'

try:
    with open(prefs_path, 'rb') as f:
        prefs = pickle.load(f)
except:
    prefs = dict(start=0)

def save_attachments(mid):
    msg = mb.get_message(mid)
    if msg.is_multipart():
        for part in msg.get_payload():
            if part.get_content_type() != 'application/octet-stream':
                continue
            notify('Saving' % part.get_filename())
            with open(save_to + part.get_filename(), 'wb') as f:
                f.write(part.get_payload(decode=True))

for i in range(prefs['start'], 1000000):
    try:
        save_attachments(i)
    except KeyError:
        break
prefs['start'] = i

with open(prefs_path, 'wb') as f:
    pickle.dump(prefs, f)

2 comments:

  1. Replies
    1. Hi. I changed the script a little bit.
      Now it works with UTF8 filenames, and it does not rewrite files with repeating names.
      You run it from folder with file called "all.mbox". That's it.
      It will process whole mbox every time to not confuse anyone.
      https://gist.github.com/georgy7/3a80bce2cd8bf2f9985c

      Delete