I have been looking into the speeds of different Java collection types and have come across something weird. I am adding 1,000,000 objects from a static array to a different collection type and returning the time required. This part of the code works fine.
Under further investigation I noticed that the TreeSet
is not receiving all of the 1,000,000 objects, and is receiving a different amount each time. Below is the method to transfer the objects from an array to the TreeSet
:
public int treeSet(int num)
{
Date before = new Date();
for(int i=0; i<num; i++)
{
treeSet.add(personsArray[i]);
}
Date after = new Date();
return (int) (after.getTime() - before.getTime());
}
Below is the code which calls the treeSet() method and tests for its size.
System.out.println("\tTree set with 1,000,000 objects--" + t.treeSet(1000000));
System.out.println("Tree set contains " + t.treeSet.size() + " elements");
The output of this is:
Tree set with 1,000,000 objects--1192
Tree set contains 975741 elements
I'm hoping someone can explain to me why the TreeSet
is not receiving all of the objects and why it is receiving inconsistent amounts.
You are almost certainly generating duplicate Person objects.
In your comment, you said each person is randomly generated from sex, first names and last names from a text file containing "hundreds" of names, and age. Let's say there are two possibilities for sex, 300 possibilities for each of first name and last name, and 100 possible values of age. That's a total of 18,000,000 possible unique people.
Let's further assume that equals()
is implemented correctly on this object, that is, that it checks all of these fields correctly.
You're generating 1,000,000 unique people using random characteristics out of a space of 18,000,000 possibilities.
Intuitively, you might think there's a "minuscule" chance of duplicates, but the probability of there being duplicates is in fact about 1.0 minus epsilon. This is known as the Birthday Problem or sometimes the Birthday Paradox.
As given on that page, the probability of a collision occuring between any two choices is about
1 - ((d-1) / d) ^ n(n-1)/2
where d is the number of values in the domain, and n is the number of choices made. I'm not entirely sure, but with values of d = 18,000,000 and n = 1,000,000 I think this works out to be about 1.0 - 1E-323. (EDIT: The correct value is about 1.0 - 2.84E-12294
. That's pretty darned close to one.)
The expected number of collisions in such a choice is given by this formula:
n - d + d * ((d-1) / d) ^ n
If d = 18,000,000 and n = 1,000,000 then this works out to about 27,000. That is, on average you'd get 27,000 collisions. That's pretty close to the number of "missing" elements in your TreeSet, which is how collisions would manifest themselves. I admit I picked my numbers to come pretty close to what you're seeing, but my assumptions and the results are entirely plausible.
You need to rethink the way you're generating the data you're storing into the set.
with high level of confidence I can say you are adding duplicates to your TreeSet
. if you don't believe me, just add numbers to your treeSet
, make sure numbers are from 1
to 1000000
then you'll see you'll get exactly what you expect.
Once you have cleared your doubts, then let's try to sort your People class.
Add the following to your People Class:
int id; //ensure that every people object you create has different id. e.g. 1 to 10m;
@override
public boolean equals(Object o){
if(this.getClass()!=o.getClass()) return false;
else return (People (o)).id==this.id;
}
@override
public int hashCode(){
return id;
}
now start adding things to your Set. :)
NOTE This code is just an example of simple approach to create different People Class. It is a good approach to do some testing with treeSet and etc. but it is not recommended for real problems
Make sure the compareTo()
method on your People
class is correctly implemented. The Comparable
javadoc states the following:
It is strongly recommended (though not required) that natural orderings be consistent with equals. This is so because sorted sets (and sorted maps) without explicit comparators behave "strangely" when they are used with elements (or keys) whose natural ordering is inconsistent with equals. In particular, such a sorted set (or sorted map) violates the general contract for set (or map), which is defined in terms of the
equals
method.For example, if one adds two keys
a
andb
such that(!a.equals(b) && a.compareTo(b) == 0)
to a sorted set that does not use an explicit comparator, the secondadd
operation returns false (and the size of the sorted set does not increase) becausea
andb
are equivalent from the sorted set's perspective.
来源:https://stackoverflow.com/questions/29009067/treeset-not-adding-all-elements