I have written the below linq statement. But it takes huge time to process since there are so many lines. My cpu has 8 cores but only using 1 core due to running single thread.
There are rooms for performance improvements before resorting to AsParallel
HashSet<string> lstAllLines = new HashSet<string>(
File.ReadAllLines("AllLines.txt")
.SelectMany(ls => ls.ToLowerInvariant().Split(' ')));
List<string> lstBannedWords = File.ReadAllLines("allBaddWords.txt")
.Select(s => s.ToLowerInvariant())
.Distinct().ToList();
List<string> lstFoundBannedWords = lstBannedWords.Where(s => lstAllLines.Contains(s))
.Distinct().ToList();
Since access to HasSet is O(1)
and lstBannedWords
is the shorter list, You may even not need any parallelism (TotalSearchTime=lstBannedWords.Count*O(1)
). Lastly, you always have the option AsParallel
The following snippet can perform that operation using the Parallel Tasks Library's Parallel.ForEach
method. The snippet below takes each line in the 'all-lines' file you have, splits it on spaces, and then searches each line for banned words. The Parallel-ForEach should use all available core's on your machine's processor. Hope this helps.
System.Threading.Tasks.Parallel.ForEach(
lstAllLines,
line =>
{
var wordsInLine = line.ToLowerInvariant().Split(' ');
var bannedWords = lstBannedWords.All(bannedWord => wordsInLine.Contains(bannedWord));
// TODO: Add the banned word(s) in the line to a master list of banned words found.
});