Tuesday, October 2, 2012

Script to extract email attachments from mbox / Thunderbird

Here's a Python script that extracts file attachments from emails
stored in an mbox file. Mbox is the format used by Thunderbird and
plenty other mail clients. That script can be used to automatically
save email file attachments to a specific folder. Using the content
type you can customize what's get downloaded/saved and where files are
saved. A setting file saves the state of which messages have been
saved.
import mailbox, pickle
mb = mailbox.mbox('/path/to/INBOX')

prefs_path = '/path/to/config/.save-attachments'
save_to = '/path/to/Downloads/attachments/'

try:
    with open(prefs_path, 'rb') as f:
        prefs = pickle.load(f)
except:
    prefs = dict(start=0)

def save_attachments(mid):
    msg = mb.get_message(mid)
    if msg.is_multipart():
        for part in msg.get_payload():
            if part.get_content_type() != 'application/octet-stream':
                continue
            notify('Saving' % part.get_filename())
            with open(save_to + part.get_filename(), 'wb') as f:
                f.write(part.get_payload(decode=True))

for i in range(prefs['start'], 1000000):
    try:
        save_attachments(i)
    except KeyError:
        break
prefs['start'] = i

with open(prefs_path, 'wb') as f:
    pickle.dump(prefs, f)

1 comment: