问题
I have around 200 000 emails in outlooks public folders.
Exporting to pst is a little bit fast but I don't know if psts are reliable. Also decoding it perfectly(with attachments) became a big headache. python lib doesn’t save attachments. Java lib not importing html body for all emails.
So, I thought of saving to postgresql db.
I am using Npgsql but it is slow(with the way I am using it).
My current slow way:
sql = "insert into tablename (a,b,c) values (:aa, :bb, :cc)";
cmd = (sql, con) #I am skipping full commands as I am reading from mind, dont think I made mistake in my code.
NpgsqlParameter pa = new NpgsqlParameter(":aa", sometype)
#same for other 3
pa.value = somevalue
cmd.Parameters.Add(pa)
#add others also
If I comment the db save part, it is really fast(like 1000 mails in 10 seconds), but with db part it is around 10 mails for 2 seconds.
I think reason is, I have 10 columns (htmlbody, body, headerstring etc) and I am creating new NpgsqlParameter for these 10 columns, setting value for it for every mail which is making it slow I feel.
I am unable to simply do like
"...values('"+htmlbody+"', '"+header_string+"'..
because of errors
syntax error at or near..so I started using npgsqlparameter.
Also, I am saving below fields only in case parsing the header_string fails(its failing for some emails sadly particularly the ones having 'microsoft ..version 2' at the beginning of the headerstring). Are they enough to represent full emails just in case headerstring is useless?
sender_entry_id
foldername text,
subject text,
headerstring text,
num_of_attachments int,
body text,
body_html text,
bcc text,
tooo text,
frm text,
cc text,
conversation_id text,
conversation_index text,
conversation_topic text,
message_class text,
senton timestamp
Also, I am getting this error
System.Runtime.InteropServices.COMException (0x80040304): The operation failed.
at Microsoft.Office.Interop.Outlook.PropertyAccessorClass.GetProperty(String SchemaName)
while saving some attachments as byte[] variable. Here is the code:
const string PR_ATTACH_DATA_BIN =
"http://schemas.microsoft.com/mapi/proptag/0x37010102";
byte[] attachmentData =
(byte[])attachment.PropertyAccessor.GetProperty(
PR_ATTACH_DATA_BIN); #attachment is Outlook.Attachment
回答1:
The GetProperty method of the PropertyAccessor class can't read large PT_BINARY values (> 64kB). That's why you get an exception in the code.
You can use a low-level code (Extended MAPI) to open the attachment data as IStream:
IAttach::OpenProperty(PR_ATTACH_DATA_BIN, IID_IStream).
回答2:
You can save the attachment to a temporary file first (Attachment.SaveAsFile), then read its data.
As Eugene mentioned, you can access the attachment data as IStream using Extended MAPI, but it is not accessible in C#, only in C++ or Delphi.
If using Redemption is an option, you can use the AsText and AsArray properties exposed both by the RDOAttachment and Attachment object (returns by the Safe*Item objects).
来源:https://stackoverflow.com/questions/27763616/fastest-way-to-move-outlook-emails-to-postgresql-db-using-npgsql