问题
I've written a little script in JAVA, that tests the parameter limit
with four different values (10, 100, 1000 and 10000) when querying a user's news feed of Facebook using the Open Graph API and the RestFB client. As you'll see, it has a strange behavior...
Scenario:
public static void main(String[] args) {
// vars
DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
FacebookClient client = new DefaultFacebookClient(accessToken);
Connection<Post> home;
List<Post> postList;
Map<String, Post> postMap;
int i;
// limits to test
String[] limits = {"10", "100", "1000", "10000"};
for (String limit : limits) {
// init list and map (looking for duplicate posts)
postList = new LinkedList<Post>();
postMap = new LinkedHashMap<String, Post>();
// get news feed
home = client.fetchConnection(id + "/home", Post.class, Parameter.with("limit", limit));
// going through pages
i = 1;
for (List<Post> page : home) {
for (Post post : page) {
// store into list
postList.add(post);
// store into map (unique post id)
postMap.put(post.getId(), post);
}
i++;
}
// sort posts by created time
Collections.sort(postList, new Comparator<Post>() {
@Override
public int compare(Post post1, Post post2) {
return post1.getCreatedTime().compareTo(post2.getCreatedTime());
}
});
// log
try {
FileWriter out = new FileWriter("log/output.txt", true);
out.write("LIMIT: " + limit + "\n");
out.write("\tPAGES: " + (i - 1) + "\n");
out.write("\tLIST SIZE: " + postList.size() + "\n");
out.write("\tMAP SIZE: " + postMap.size() + "\n");
out.write("\tOLDER POST: " + dateFormat.format(postList.get(0).getCreatedTime()) + "\n");
out.write("\tYOUGNER POST: " + dateFormat.format(postList.get(postList.size() - 1).getCreatedTime()) + "\n");
out.close();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
Output:
LIMIT: 10
PAGES: 7
LIST SIZE: 56
MAP SIZE: 56
OLDER POST: 2009-03-22 14:58:03
YOUGNER POST: 2012-05-11 15:48:49
LIMIT: 100
PAGES: 3
LIST SIZE: 174
MAP SIZE: 172
OLDER POST: 2012-01-12 23:01:34
YOUGNER POST: 2012-05-11 15:48:49
LIMIT: 1000
PAGES: 2
LIST SIZE: 294
MAP SIZE: 292
OLDER POST: 2009-03-22 14:58:03
YOUGNER POST: 2012-05-11 15:48:49
LIMIT: 10000
PAGES: 2
LIST SIZE: 294
MAP SIZE: 292
OLDER POST: 2009-03-22 14:58:03
YOUGNER POST: 2012-05-11 15:48:49
Interpretations and questions:
Obviously, you can't get all the posts a user has had on his news feed since his account was created. Is limit limited?
With a
limit
of 100, 1000 and 10000, I must have had each time two duplicated posts within the whole returned news feed (174 - 172 = 194 - 192). Why? I never saw the same post twice on my personal news feed...With (and only with) a
limit
of 100, the older post I get was created during the year 2012, meanwhile the other values oflimit
make the query retrieving a post that was created during the year 2009. I can understand that with an upperlimit
(1000 or 10000), the query retrieves older posts. But why does alimit
of 10 make the query retrieving an older post than a query limited by 100?Last but not least point: I'm not getting the same number of posts. Obviously, the more the
limit
is high, the more the number of retrieved posts is high. What I thought first, is that the only consequence of a smallerlimit
was an upper number of pages (which is the case though), but that the number of retrieved posts would not change. But it does. Why? That said, the number of posts seems to converge between alimit
of 100 and 1000, because the number of posts is identical with alimit
of 1000 and alimit
of 10000.
PS: specifying a since
and/or a until
parameter to the query doesn't change anything.
Any answer/comment is welcome :)
Cheers.
Edit:
This is my best recall:
LIMIT: 200
PAGES: 3
LIST SIZE: 391
MAP SIZE: 389
OLDER POST: 2012-01-27 14:17:16
YOUGNER POST: 2012-05-11 16:52:38
Why 200? Is it specified anywhere in the documentation?
回答1:
Its not in documentation but personally I have tested following for my project.
Facebook limit
is limited to 500 posts. No matter you put a limit higher than 500 it will fetch only 500 results max. Try with 500 (or more), you will get maximum posts.
You wont get 500 posts every time but will get above 490 posts in general. Some posts get filtered by various reasons (like privacy, blocked user, not suitable for specific region and other things)
This answers your 1st and 4th quetion.
For question no. 2 , I do not work in java, so I cant say if there's a prob in your code/logic or what your code is doing.
For question no. 3 , God help facebook !
Edit
For 4th problem, you may be hitting the queries/hour limit of graph api (facebook uses it to prevent spamming, you cant query apis frequently in quick succession)
Also,
this is why, you do not get all results returned by facebook.
(if you specified a limit of “5” but the five posts returned are not visible to the viewer, you will get an empty result set.)
In addition to the limits mentioned in the documentation for each of the tables and connections listed above, it is helpful to know that the maximum number of results we will fetch before running the visibility checks is 5,000.
Reference: Paging with graph api and fql
Also, there is a limit on no of results for a particular table. You can get a detail about them on respective fql tables.
For stream table (the one for posts/feed),
Each query of the stream table is limited to the previous 30 days or 50 posts, whichever is greater, however you can use time-specific fields such as created_time along with FQL operators (such as < or >) to retrieve a much greater range of posts.
Reference: Fql stream table
Look here too: Facebook FQL stream limit?
回答2:
There is an ongoing bug in Facebook open graph API paging having to do with the limit parameter. The higher the limit, the more pages of posts --- as if a lower limit also culls a sampling of posts. The problem has surfaced and retreated ever since the post search function was down for a month in September.
A new bug has surfaced: at present a post search without an access_token and a small limit (like 12) will return few and sparsely populated results pages. The same search made with the access_token given in the API documentation example will give full pages of 12 results +/- and no skipping. I have no idea what kind of access_token they use, but no attempts on my part have duplicated their results. The post search without access token is more or less non-functional (again)!
回答3:
There could be some logic on facebook side to prevent data mining. Try add some delay while going through pages and see if better.
来源:https://stackoverflow.com/questions/10553510/facebook-open-graph-api-weird-behavior-of-parameter-limit-while-getting-a-pagin