Get the Gmail attachment filename without downloading it

后端未结

关注

 4  1945

I\'m trying to get all the messages from a Gmail account that may contain some large attachments (about 30MB). I just need the names, not the whole files. I found a piece of

相关标签:

4条回答

醉话见心

2020-12-31 10:24
If you know something about the file name, you can use the X-GM-RAW gmail extensions for imap SEARCH command. These extensions let you use any gmail advanced search query to filter the messages. This way you can restrict the downloads to the matching messages, or exclude some messages you don't want.
```
mail.uid('search', None, 'X-GM-RAW', 
       'has:attachment filename:pdf in:inbox -label:parsed'))
```
The above search for messages with PDF attachments in INBOX not labeled "parsed".

Some pro tips:
- label the messages you have already parsed, so you don't need to fetch them again (the -label:parsed filter in the above example)
- always use the uid version instead of the standard sequential ids (you are already doing this)
- unfortunately MIME is messy: there are a lot of clients that do weird (or plain wrong) things. You could try to download and parse only the headers, but is it worth the trouble?
[edit]

If you label a message after parsing it, you can skip the messages you have parsed already. This should be reasonable enough to monitor your class mailbox.

Perhaps you live in a corner of the world where internet bandwidth is more expensive than programmer time; in this case, you can fetch only the headers and look for "Content-disposition" == "attachment; filename=somefilename.ext".
0 讨论(0)
发布评论:

提交评论
- 加载中...

长情又很酷

2020-12-31 10:25

Old question, but just wanted to share the solution to this I came up with today. Searches for all emails with attachments and outputs the uid, sender, subject, and a formatted list of attachments. Edited relevant code to show how to format BODYSTRUCTURE:

    data   = mailobj.uid('fetch', mail_uid, '(BODYSTRUCTURE)')[1]
    struct = data[0].split()        
    list   = []                     #holds list of attachment filenames

    for j, k in enumerate(struct):
        if k == '("FILENAME"':
            count = 1
            val = struct[j + count]
            while val[-3] != '"':
                count += 1
                val += " " + struct[j + count]
            list.append(val[1:-3])
        elif k == '"FILENAME"':
            count = 1
            val = struct[j + count]
            while val[-1] != '"':
                count += 1
                val += " " + struct[j + count]
            list.append(val[1:-1])

I've also published it on GitHub.

0 讨论(0)

失恋的感觉

2020-12-31 10:33

Rather than fetch RFC822, which is the full content, you could specify BODYSTRUCTURE.

The resulting data structure from imaplib is pretty confusing, but you should be able to find the filename, content-type and sizes of each part of the message without downloading the entire thing.

0 讨论(0)
发布评论:

提交评论
- 加载中...
悲哀的现实

2020-12-31 10:42

A FETCH of the RFC822 message data item is functionally equivalent to BODY[]. IMAP4 supports other message data items, listed in section 6.4.5 of RFC 3501.

Try requesting a different set of message data items to get just the information that you need. For example, you could try RFC822.HEADER or maybe BODY.PEEK[MIME].

0 讨论(0)
发布评论:

提交评论
- 加载中...