Facebook Open Graph API: weird behavior of parameter limit while getting a paginated user's news feed

a 夏天 提交于 2019-12-18 02:51:14

问题


I've written a little script in JAVA, that tests the parameter limit with four different values (10, 100, 1000 and 10000) when querying a user's news feed of Facebook using the Open Graph API and the RestFB client. As you'll see, it has a strange behavior...

Scenario:

public static void main(String[] args) {

    // vars
    DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    FacebookClient client = new DefaultFacebookClient(accessToken);
    Connection<Post> home;
    List<Post> postList;
    Map<String, Post> postMap;
    int i;

    // limits to test
    String[] limits = {"10", "100", "1000", "10000"};
    for (String limit : limits) {

        // init list and map (looking for duplicate posts)
        postList = new LinkedList<Post>();
        postMap = new LinkedHashMap<String, Post>();
        // get news feed
        home = client.fetchConnection(id + "/home", Post.class, Parameter.with("limit", limit));

        // going through pages
        i = 1;
        for (List<Post> page : home) {
            for (Post post : page) {
                // store into list
                postList.add(post);
                // store into map (unique post id)
                postMap.put(post.getId(), post);
            }
            i++;
        }

        // sort posts by created time
        Collections.sort(postList, new Comparator<Post>() {
            @Override
            public int compare(Post post1, Post post2) {
                return post1.getCreatedTime().compareTo(post2.getCreatedTime());
            }
        });

        // log
        try {
            FileWriter out = new FileWriter("log/output.txt", true);
            out.write("LIMIT: " + limit + "\n");
            out.write("\tPAGES: " + (i - 1) + "\n");
            out.write("\tLIST SIZE: " + postList.size() + "\n");
            out.write("\tMAP SIZE: " + postMap.size() + "\n");
            out.write("\tOLDER POST: " + dateFormat.format(postList.get(0).getCreatedTime()) + "\n");
            out.write("\tYOUGNER POST: " + dateFormat.format(postList.get(postList.size() - 1).getCreatedTime()) + "\n");
            out.close();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }

    }

}

Output:

LIMIT: 10
    PAGES: 7
    LIST SIZE: 56
    MAP SIZE: 56
    OLDER POST: 2009-03-22 14:58:03
    YOUGNER POST: 2012-05-11 15:48:49
LIMIT: 100
    PAGES: 3
    LIST SIZE: 174
    MAP SIZE: 172
    OLDER POST: 2012-01-12 23:01:34
    YOUGNER POST: 2012-05-11 15:48:49
LIMIT: 1000
    PAGES: 2
    LIST SIZE: 294
    MAP SIZE: 292
    OLDER POST: 2009-03-22 14:58:03
    YOUGNER POST: 2012-05-11 15:48:49
LIMIT: 10000
    PAGES: 2
    LIST SIZE: 294
    MAP SIZE: 292
    OLDER POST: 2009-03-22 14:58:03
    YOUGNER POST: 2012-05-11 15:48:49

Interpretations and questions:

  1. Obviously, you can't get all the posts a user has had on his news feed since his account was created. Is limit limited?

  2. With a limit of 100, 1000 and 10000, I must have had each time two duplicated posts within the whole returned news feed (174 - 172 = 194 - 192). Why? I never saw the same post twice on my personal news feed...

  3. With (and only with) a limit of 100, the older post I get was created during the year 2012, meanwhile the other values of limit make the query retrieving a post that was created during the year 2009. I can understand that with an upper limit (1000 or 10000), the query retrieves older posts. But why does a limit of 10 make the query retrieving an older post than a query limited by 100?

  4. Last but not least point: I'm not getting the same number of posts. Obviously, the more the limit is high, the more the number of retrieved posts is high. What I thought first, is that the only consequence of a smaller limit was an upper number of pages (which is the case though), but that the number of retrieved posts would not change. But it does. Why? That said, the number of posts seems to converge between a limit of 100 and 1000, because the number of posts is identical with a limit of 1000 and a limit of 10000.

PS: specifying a since and/or a until parameter to the query doesn't change anything.

Any answer/comment is welcome :)

Cheers.

Edit:

This is my best recall:

LIMIT: 200
    PAGES: 3
    LIST SIZE: 391
    MAP SIZE: 389
    OLDER POST: 2012-01-27 14:17:16
    YOUGNER POST: 2012-05-11 16:52:38

Why 200? Is it specified anywhere in the documentation?


回答1:


Its not in documentation but personally I have tested following for my project.

Facebook limit is limited to 500 posts. No matter you put a limit higher than 500 it will fetch only 500 results max. Try with 500 (or more), you will get maximum posts.

You wont get 500 posts every time but will get above 490 posts in general. Some posts get filtered by various reasons (like privacy, blocked user, not suitable for specific region and other things)

This answers your 1st and 4th quetion.

For question no. 2 , I do not work in java, so I cant say if there's a prob in your code/logic or what your code is doing.

For question no. 3 , God help facebook !

Edit

For 4th problem, you may be hitting the queries/hour limit of graph api (facebook uses it to prevent spamming, you cant query apis frequently in quick succession)

Also,

this is why, you do not get all results returned by facebook.

(if you specified a limit of “5” but the five posts returned are not visible to the viewer, you will get an empty result set.)

In addition to the limits mentioned in the documentation for each of the tables and connections listed above, it is helpful to know that the maximum number of results we will fetch before running the visibility checks is 5,000.

Reference: Paging with graph api and fql

Also, there is a limit on no of results for a particular table. You can get a detail about them on respective fql tables.

For stream table (the one for posts/feed),

Each query of the stream table is limited to the previous 30 days or 50 posts, whichever is greater, however you can use time-specific fields such as created_time along with FQL operators (such as < or >) to retrieve a much greater range of posts.

Reference: Fql stream table

Look here too: Facebook FQL stream limit?




回答2:


There is an ongoing bug in Facebook open graph API paging having to do with the limit parameter. The higher the limit, the more pages of posts --- as if a lower limit also culls a sampling of posts. The problem has surfaced and retreated ever since the post search function was down for a month in September.

A new bug has surfaced: at present a post search without an access_token and a small limit (like 12) will return few and sparsely populated results pages. The same search made with the access_token given in the API documentation example will give full pages of 12 results +/- and no skipping. I have no idea what kind of access_token they use, but no attempts on my part have duplicated their results. The post search without access token is more or less non-functional (again)!




回答3:


There could be some logic on facebook side to prevent data mining. Try add some delay while going through pages and see if better.



来源:https://stackoverflow.com/questions/10553510/facebook-open-graph-api-weird-behavior-of-parameter-limit-while-getting-a-pagin

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!