The example below throws an InvalidOperationException, \"Collection was modified; enumeration operation may not execute.\" when executing the code.
var urls
You can't, basically. What you really want here is a queue:
var urls = new Queue<string>();
urls.Enqueue("http://www.google.com");
while(urls.Count != 0)
{
String url = url.Dequeue();
// Get all links from the url
List<string> newUrls = GetLinks(url);
foreach (string newUrl in newUrls)
{
queue.Enqueue(newUrl);
}
}
It's slightly ugly due to there not being an AddRange
method in Queue<T>
but I think it's basically what you want.
I would create two lists add into the second and then update the reference like this:
var urls = new List<string>();
var destUrls = new List<string>(urls);
urls.Add("http://www.google.com");
foreach (string url in urls)
{
// Get all links from the url
List<string> newUrls = GetLinks(url);
destUrls.AddRange(newUrls);
}
urls = destUrls;
You can probably also create a recursive function, like this (untested):
IEnumerable<string> GetUrl(string url)
{
foreach(string u in GetUrl(url))
yield return u;
foreach(string ret_url in WHERE_I_GET_MY_URLS)
yield return ret_url;
}
List<string> MyEnumerateFunction()
{
return new List<string>(GetUrl("http://www.google.com"));
}
In this case, you will not have to create two lists, since GetUrl does all the work.
But I may have missed the point of you program.
It's hard to make the code better without knowing what GetLinks() does. In any event, this avoids recursion. The standard idiom is you don't alter a collection when you're enumerating over it. While the runtime could have let you do it, the reasoning is that it's a source of error, so better to create a new collection or control the iteration yourself.
.
public List<string> ExpandLinksOrSomething(List<string> urls)
{
List<string> result = new List<string>();
Queue<string> queue = new Queue<string>(urls);
while (queue.Any())
{
string url = queue.Dequeue();
result.Add(url);
foreach( string newResult in GetLinks(url) )
{
queue.Enqueue(newResult);
}
}
return result;
}
The naive implementation assumes that GetLinks()
will not return circular references. e.g. A returns B, and B returns A. This can be fixed by:
List<string> newItems = GetLinks(url).Except(result).ToList();
foreach( string newResult in newItems )
{
queue.Enqueue(newResult);
}
* As others point out using a dictionary may be more efficient depending on how many items you process.
I find it strange that GetLinks() would return a value, and then later resolve that to more Url's. Maybe all you want to do is 1-level expansion. If so, we can get rid of the Queue altogether.
public static List<string> StraightProcess(List<string> urls)
{
List<string> result = new List<string>();
foreach (string url in urls)
{
result.Add(url);
result.AddRange(GetLinks(url));
}
return result;
}
I decided to rewrite it because while other answers used queues, it wasn't apparent that they didn't run forever.
Use foreach with a lambda, it's more fun!
var urls = new List<string>();
var destUrls = new List<string>();
urls.Add("http://www.google.com");
urls.ForEach(i => destUrls.Add(GetLinks(i)));
urls.AddRange(destUrls);