TreeSet not adding all elements?

橙三吉。 提交于 2019-12-03 05:01:47

You are almost certainly generating duplicate Person objects.

In your comment, you said each person is randomly generated from sex, first names and last names from a text file containing "hundreds" of names, and age. Let's say there are two possibilities for sex, 300 possibilities for each of first name and last name, and 100 possible values of age. That's a total of 18,000,000 possible unique people.

Let's further assume that equals() is implemented correctly on this object, that is, that it checks all of these fields correctly.

You're generating 1,000,000 unique people using random characteristics out of a space of 18,000,000 possibilities.

Intuitively, you might think there's a "minuscule" chance of duplicates, but the probability of there being duplicates is in fact about 1.0 minus epsilon. This is known as the Birthday Problem or sometimes the Birthday Paradox.

As given on that page, the probability of a collision occuring between any two choices is about

1 - ((d-1) / d) ^ n(n-1)/2

where d is the number of values in the domain, and n is the number of choices made. I'm not entirely sure, but with values of d = 18,000,000 and n = 1,000,000 I think this works out to be about 1.0 - 1E-323. (EDIT: The correct value is about 1.0 - 2.84E-12294. That's pretty darned close to one.)

The expected number of collisions in such a choice is given by this formula:

n - d + d * ((d-1) / d) ^ n

If d = 18,000,000 and n = 1,000,000 then this works out to about 27,000. That is, on average you'd get 27,000 collisions. That's pretty close to the number of "missing" elements in your TreeSet, which is how collisions would manifest themselves. I admit I picked my numbers to come pretty close to what you're seeing, but my assumptions and the results are entirely plausible.

You need to rethink the way you're generating the data you're storing into the set.

with high level of confidence I can say you are adding duplicates to your TreeSet. if you don't believe me, just add numbers to your treeSet, make sure numbers are from 1 to 1000000 then you'll see you'll get exactly what you expect.

Once you have cleared your doubts, then let's try to sort your People class.

Add the following to your People Class:

int id;    //ensure that every people object you create has different id. e.g. 1 to 10m;

@override
public boolean equals(Object o){
  if(this.getClass()!=o.getClass()) return false;
  else return (People (o)).id==this.id;
}

@override
public int hashCode(){
 return id;
}

now start adding things to your Set. :)

NOTE This code is just an example of simple approach to create different People Class. It is a good approach to do some testing with treeSet and etc. but it is not recommended for real problems

Make sure the compareTo() method on your People class is correctly implemented. The Comparable javadoc states the following:

It is strongly recommended (though not required) that natural orderings be consistent with equals. This is so because sorted sets (and sorted maps) without explicit comparators behave "strangely" when they are used with elements (or keys) whose natural ordering is inconsistent with equals. In particular, such a sorted set (or sorted map) violates the general contract for set (or map), which is defined in terms of the equals method.

For example, if one adds two keys a and b such that (!a.equals(b) && a.compareTo(b) == 0) to a sorted set that does not use an explicit comparator, the second add operation returns false (and the size of the sorted set does not increase) because a and b are equivalent from the sorted set's perspective.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!