For small directory size code is working fine ,it gives this error message when size of directory files are big.
My code :
IEnumerable text
I would use Directory.EnumerateFiles
and File.ReadLines since they are less memory hungry, they are working like a StreamReader
whereas Directory.GetFiles
and File.ReadAllLines
reads all into memory first:
var matchingLines = Directory.EnumerateFiles(@"C:\Users\karansha\Desktop\watson_query\", "*.*")
.SelectMany(fn => File.ReadLines(fn))
.Where(l => l.IndexOf("appGUID: null", StringComparison.InvariantCultureIgnoreCase) >= 0);
foreach (var line in matchingLines)
{
Regex regex = new Regex(@"User:\s*(?<username>[^\s]+)");
// etc pp ...
}
You also don't need to create the List<string>
for all the lines again. Just enumerate the query with foreach
(textLines.ToList
creates a third collection which is also redundant).
try to use next code, it uses ReadLines
, which doesn't load entire file into memory, but read file line by line. It also uses HashSet
to store unique results from matching a regular expression.
Regex regex = new Regex(@"User:\s*(?<username>[^\s]+)");
IEnumerable<string> textLines =
Directory.GetFiles(@"C:\Users\karansha\Desktop\watson_query\", "*.*")
.Select(filePath => File.ReadLines(filePath))
.SelectMany(line => line)
.Where(line => !line.Contains("appGUID: null"));
HashSet<string> users = new HashSet<string>(
textLines.SelectMany(line => regex.Matches(line).Cast<Match>())
.Select(match => match.Groups["username"].Value)
);
int numberOfUsers = users.Count(name => name.Length <= 10);
Console.WriteLine("Unique_Users_Express=" + numberOfUsers);