Fastest way to move outlook emails to postgresql db using Npgsql

混江龙づ霸主 提交于 2019-12-11 05:41:54

问题


I have around 200 000 emails in outlooks public folders.

Exporting to pst is a little bit fast but I don't know if psts are reliable. Also decoding it perfectly(with attachments) became a big headache. python lib doesn’t save attachments. Java lib not importing html body for all emails.

So, I thought of saving to postgresql db.

I am using Npgsql but it is slow(with the way I am using it).

My current slow way:

sql = "insert into tablename (a,b,c) values (:aa, :bb, :cc)";  
cmd = (sql, con) #I am skipping full commands as I am reading from mind, dont think I made mistake in my code.  

NpgsqlParameter pa = new NpgsqlParameter(":aa", sometype)
#same for other 3

pa.value = somevalue

cmd.Parameters.Add(pa)
#add others also

If I comment the db save part, it is really fast(like 1000 mails in 10 seconds), but with db part it is around 10 mails for 2 seconds.

I think reason is, I have 10 columns (htmlbody, body, headerstring etc) and I am creating new NpgsqlParameter for these 10 columns, setting value for it for every mail which is making it slow I feel.

I am unable to simply do like

"...values('"+htmlbody+"', '"+header_string+"'..

because of errors

syntax error at or near..so I started using npgsqlparameter.  

Also, I am saving below fields only in case parsing the header_string fails(its failing for some emails sadly particularly the ones having 'microsoft ..version 2' at the beginning of the headerstring). Are they enough to represent full emails just in case headerstring is useless?

    sender_entry_id 
    foldername text,
    subject text,

    headerstring text,
    num_of_attachments int,

    body text,
    body_html text,

    bcc text,
    tooo text,
    frm text,
    cc text,
    conversation_id text,
    conversation_index text,
    conversation_topic text,
    message_class text,
    senton timestamp

Also, I am getting this error

System.Runtime.InteropServices.COMException (0x80040304): The operation failed.

at Microsoft.Office.Interop.Outlook.PropertyAccessorClass.GetProperty(String SchemaName)

while saving some attachments as byte[] variable. Here is the code:

 const string PR_ATTACH_DATA_BIN =
"http://schemas.microsoft.com/mapi/proptag/0x37010102";

 byte[] attachmentData =
 (byte[])attachment.PropertyAccessor.GetProperty(
     PR_ATTACH_DATA_BIN);  #attachment is Outlook.Attachment

回答1:


The GetProperty method of the PropertyAccessor class can't read large PT_BINARY values (> 64kB). That's why you get an exception in the code.

You can use a low-level code (Extended MAPI) to open the attachment data as IStream:

IAttach::OpenProperty(PR_ATTACH_DATA_BIN, IID_IStream).



回答2:


You can save the attachment to a temporary file first (Attachment.SaveAsFile), then read its data.

As Eugene mentioned, you can access the attachment data as IStream using Extended MAPI, but it is not accessible in C#, only in C++ or Delphi.

If using Redemption is an option, you can use the AsText and AsArray properties exposed both by the RDOAttachment and Attachment object (returns by the Safe*Item objects).



来源:https://stackoverflow.com/questions/27763616/fastest-way-to-move-outlook-emails-to-postgresql-db-using-npgsql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!