I am Very new at C# and have written a fairly clunky code. I have been doing a lot of courses online and a lot say that there are several ways to approach problems. Now i have m
Office Interop is fairly slow.
Openxml may have been faster, but the file is .doc, so it probably won't be able to handle it.
But just like with Excel in this question there is a way you can improve the performance - do not access each word in a Range by index, because AFAIK it causes creation of a separate Range instance wrapped in RCW, and that is primary candidate for a performance bottleneck in your application.
That means that your best bet to improve the performance is to load all the words (.Text
) into some indexable collection of String
s before the actual processing, and only then use that collection to create the output.
How to do it in the fastest way? I am not exactly sure, but you can try either getting all the words from _Document.Words
enumerator (though it may or may not be more performant, but at least you will be able to see how long it takes to just retrieve the required words):
var words = document
.Cast()
.Select(r =>
r.Text)
.ToList();
or you may try to use _Document.Content range Text
, though you would then have to separate individual words by yourself.