Improving performance of preprocessing large set of documents
I am working on a project related to plagiarism detection framework using Java. My document set contains about 100 documents and I have to preprocess them and store in a suitable data structure. I have a big question that how am i going to process the large set of documents efficiently and avoiding bottlenecks . The main focus on my question is how to improve the preprocessing performance. Thanks Regards Nuwan You're a bit lacking on specifics there. Appropriate optimizations are going to depend upon things like the document format, the average document size, how you are processing them, and