I have what is essentially a jagged array of name value pairs - i need to generate a set of unique name values from this. the jagged array is approx 86,000 x 11 values. It d
How about:
Dictionary<NameValuePair,int> hs = new Dictionary<NameValuePair,int>();
foreach (i in jaggedArray)
{
foreach (j in i)
{
if (!hs.ContainsKey(j))
{
hs.Add(j, 0);
}
}
}
IEnumerable<NameValuePair> unique = hs.Keys;
of course, if you were using C# 3.0, .NET 3.5:
var hs = new HashSet<NameValuePair>();
hs.UnionWith(jaggedArray.SelectMany(item => item));
would do the trick.
Use KeyValuePair as a wrapper class and then create a dictionary with to create a set perhaps? Or implement your own wrapper that overrides the Equals and GetHashCode.
Dictionary<KeyValuePair, bool> mySet;
for(int i = 0; i < keys.length; ++i)
{
KeyValuePair kvp = new KeyValuePair(keys[i], values[i]);
mySet[kvp] = true;
}
I have it running in 0.34 seconds down from 9+ minutes
The problem is when comparing the KeyValuePair structs. I worked around it by writing a comparer object, and passing an instance of it to the Dictionary.
From what I can determine, the KeyValuePair.GetHashCode() returns the hashcode of it's Key
object (in this example the least unique object).
As the dictionary adds (and checks existence of) each item, it uses both Equals and GetHashCode functions, but has to rely on the Equals function when the hashcode is less unique.
By providing a more unique GetHashCode function, it excerises the Equals function far less often. I also optimised the Equals function to compare the more unique Values before the less unqiue Keys.
86,000 * 11 items with 10,000 unique properties runs in 0.34 seconds using the comparer object below (without the comparer object it takes 9 minutes 22 seconds)
Hope this helps :)
class StringPairComparer
: IEqualityComparer<KeyValuePair<string, string>>
{
public bool Equals(KeyValuePair<string, string> x, KeyValuePair<string, string> y)
{
return x.Value == y.Value && x.Key == y.Key;
}
public int GetHashCode(KeyValuePair<string, string> obj)
{
return (obj.Key + obj.Value).GetHashCode();
}
}
EDIT: If it was just one string (instead of a KeyValuePair, where string = Name+Value) it would be approx twice as fast. It's a nice intresting problem, and I have spent faaaaaar too much time on it (I learned quiet a bit though)
Instead of using a Dictionary
why not extend KeyedCollection<TKey, TItem>? According to the documentation:
Provides the abstract base class for a collection whose keys are embedded in the values.
You then need to override the protected TKey GetKeyForItem(TItem item) function. As it is an hybrid between IList<T> and IDictionary<TKey, TValue> I think it's likely to be quite fast.
Have you profiled your code? You are certain that the foreach loops are the bottleneck, and not retriever.GetVehicles()?
I did create a small test project where I fake the retriever and let it return 86.000 X 11 values. My first attempt ran at 5 seconds, creating the data included.
I used the same value for both the key and value where the first key was "0#0" and the last "85999#10".
Then I switched to guids. Same result.
Then I made the key longer, like this:
var s = Guid.NewGuid().ToString();
return s + s + s + s + s + s + s+ s + s + s;
Now it took almost 10 seconds.
Then I made the keys insanely long and got an out of memory exception. I don't have a swap file on my computer, so I got this exception immediately.
How long are your keys? Are your virtual memory consumption the reason for your poor performance?
if you don't need any specific correlation between each key/value pair and the unique values you're generating, you could just use a GUID? I'm assuming the problem is that your current 'Key' isn't unique in this jagged array.
Dictionary<System.Guid, KeyValuePair<string, string>> myDict
= new Dictionary<Guid, KeyValuePair<string, string>>();
foreach of your key values in their current format
myDict.Add(System.Guid.NewGuid(), new KeyValuePair<string, string>(yourKey, yourvalue))
Sounds like it would store what you need but I don't know how you would pull data back from this as there would be no semantic relationship between the generate Guid & what you originally had...
Can you provide any more info in your question?