PHP forums - how to cope with unread discussions / topics / posts

后端 未结 7 968
伪装坚强ぢ
伪装坚强ぢ 2020-12-07 17:10

I know this question has been asked here a couple of times, but none of the answers had pleased me. This is because almost all of them involve a huge read / write process re

相关标签:
7条回答
  • 2020-12-07 17:22

    There isn't a lot of choices.

    1. mark every reader thread by each user.

      • Disadvantages: a lot of rows in very active forums
      • Advantages: Every user knows with post has read or not.
    2. mark every unread thread by each user.

      • Disadvantages: a lot of space with "unreaded" rows if there is inactivity of a lot of users
      • Solutions: add a lifetime timestamp and delete old records with a cron
      • Advantages: Every user knows with post has read or not.
    3. use timestamps to determine if show it as unread or not.

      • Disadvantages: The users don't know with are the real unread threads, the marks only show the "new trheads" since the last login
      • Advantage: Save space

    The other alternative is mixing solutions, that is,

    1 and 3) show thread as "unread" if they aren't older than X days and there isn't a row marked as readed for the user. The "read" rows can be deleted when they are X day older without affect anything.

    Advantages

    • less spaced used to determine unread threads

    Disadvantages

    • create a cron that keeps the system clean
    • Users don't know if they read threads olders than x days.

    Advantages

    • Every user knows which "new posts" has read or not.
    0 讨论(0)
  • 2020-12-07 17:22

    Almost any forum I know of will use some sort of reference timestamp to determine whether or not a thread/message should be regarded as "unread" or not. This timestamp usually is the date/time of the last action you performed on your previous visit to the forum.

    So you keep ie. a previous_last_action & last_action timestamp in your user table, last_action is updated on every user action, the previous_last_action column is set once to last_action when logging in (or upon creation of a new session - if you have "remember me" functionality). To determine if a thread/message is unread you would compare that thread/message creation (or update) timestamp with the value in previous_last_action for the user currently logged in.

    0 讨论(0)
  • 2020-12-07 17:29

    There is... another.

    Another way to store detailed read/unread data for a hierarchical forum structure ( board > section > thread, etc.). It does so without a) having to pre-populate read/unread information, and b) without having to ever store more than U*(M/2) rows in its worst case, where U is the number of users, and M is the total number of posts in the database (and usually much, much less than this)

    I researched this topic a while ago. I found that SMF/phpBB "cheat" a little bit in how they store user reading history. Their schema supports storage of either the last timestamps or message ID that was marked as read in a given board, forum, subforum, topic (or viewed directly by the browser), like so:

    [ user_id, board, last_msg_id, last_timestamp ]

    [ user_id, board, forum, last_msg_id, last_timestamp ]

    [ user_id, board, forum, subforum, last_msg_id, last_timestamp ]

    [ user_id, board, forum, subforum, topic, last_msg_id, last_timestamp ]

    This lets users mark specific boards, forums, topics, etc., as "read". It requires, however, either action on the part of the user (either by reading, or actively clicking "mark as read"), and in the case of phpBB, doesn't give you the granularity to say "I have seen this specific message, but not that specific message." You also get the situation where you read the last message in a topic first (viewing the latest activity in a thread), and you're immediately assumed to have read the rest of the thread.

    It works for SMF and phpBB to store things like this because it's rare that you're ever viewing just one post (default views are set up for 20+ posts in the last page of a topic). However, for more threaded forums (particularly forums where you're viewing messages one at a time), this is less than ideal. Users of this system will likely care a whole lot if they have read one message but not another, and might consider it cumbersome to only be able to mark an entire section as read, when really they just wanted a few marked as read.

    You store messages in tuples like this: [ user_id, lower_msg_id, upper_msg_id ]

    The user history log is maintained as the following:

    Upon page view, a function looks to see if user_id has a record where current_msg_id is between lower_msg_id and upper_msg_id. If it has, then this page is read, and no action needs taken. If it hasn't, then another query has to be issued, this time determining if current_msg_id is either one less than lower_msg_id (current_msg_id == lower_msg_id-1), or one more than upper_msg_id (current_msg_id == upper_msg_id +1). This is the case where we grow our "read" or "seen" boundary by 1. If we're one away from lower_msg_id or uppper_msg_id, then we grow the tuple by 1 in that direction. If we aren't growing our tuple range, then we insert a new tuple, [ user_id, current_msg_id, current_msg_id ].

    Corner case is when two tuple ranges approach each other. In this case, upon searching between the lower tuple boundary and the upper tuple boundary, merge the two boundaries by setting the upper boundary of the lower tuple to the upper boundary of the upper tuple, and delete the upper tuple.

    Code example in PHP:

    function seen_bounds( $usr_id, $msg_id ) {
    
        # mysql escape
        $usr_id = mres( $usr_id );
        $msg_id = mres( $msg_id );
    
        $seen_query = "
            SELECT
                msb.id,
                msb.lower_msg_id,
                msb.upper_msg_id
            FROM
                msgs_seen_bounds msb
            WHERE
                $msg_id BETWEEN msb.lower_msg_id AND msb.upper_msg_id AND
                msb.usr_id = $usr_id
            LIMIT 1;
        ";
    
        # See if this post already exists within a given
        # seen bound.
        $seen_row = query($seen_query, ROW);
    
        if($seen_row == 0) {
            # Has not been seen, try to detect if we're "near"
            # another bound (and we can grow that bound to include
            # this post).
            $lower_query = "
                SELECT
                    msb.id,
                    msb.lower_msg_id,
                    msb.upper_msg_id
                FROM
                    msgs_seen_bounds msb
                WHERE
                    msb.upper_msg_id = ($msg_id - 1) AND
                    msb.usr_id = $usr_id
                LIMIT 1;
            ";
    
            $upper_query = "
                SELECT
                    msb.id,
                    msb.lower_msg_id,
                    msb.upper_msg_id
                FROM
                    msgs_seen_bounds msb
                WHERE
                    msb.lower_msg_id = ($msg_id + 1) AND
                    msb.usr_id = $usr_id
                LIMIT 1;
            ";
    
            $lower = query($lower_query, ROW);
            $upper = query($upper_query, ROW);
    
            if( $lower == 0 && $upper == 0 ) {
                # No bounds exist for or near this. We'll insert a single-ID
                # bound
    
                $saw_query = "
                    INSERT INTO
                        msgs_seen_bounds
                    (usr_id, lower_msg_id, upper_msg_id)
                    VALUES
                    ($usr_id, $msg_id, $msg_id)
                    ;
                ";
    
                query($saw_query, NONE);
            } else {
                if( $lower != 0 && $upper != 0 ) {
                    # Found "near" bounds both on the upper
                    # and lower bounds.
    
                    $update_query = '
                        UPDATE msgs_seen_bounds
                        SET
                            upper_msg_id = ' . $upper['upper_msg_id'] . '
                        WHERE
                            msgs_seen_bounds.id = ' . $lower['id'] . '
                        ;
                    ';
    
                    $delete_query = '
                        DELETE FROM msgs_seen_bounds
                        WHERE
                            msgs_seen_bounds.id = ' . $upper['id'] . '
                        ;
                    ';
    
                    query($update_query, NONE);
                    query($delete_query, NONE);
                } else {
                    if( $lower != 0 ) {
                        # Only found lower bound, update accordingly.
                        $update_query = '
                            UPDATE msgs_seen_bounds
                            SET
                                upper_msg_id = ' . $msg_id . '
                            WHERE
                                msgs_seen_bounds.id = ' . $lower['id'] . '
                            ;
                        ';
    
                        query($update_query, NONE);
                    }
    
                    if( $upper != 0 ) {
                        # Only found upper bound, update accordingly.
                        $update_query = '
                            UPDATE msgs_seen_bounds
                            SET
                                lower_msg_id = ' . $msg_id . '
                            WHERE
                                msgs_seen_bounds.id = ' . $upper['id'] . '
                            ;
                        ';
    
                        query($update_query, NONE);
                    }
                }
            }
        } else {
            # Do nothing, already seen.
        }
    
    }
    

    Searching for unread posts is finding where current_msg_id does not exist between any lower_msg_id and upper_msg_id for a given user (a NOT EXISTS query in SQL terms). It's not the most efficient of queries when implementing in a relational database, but can be solved by aggressive indexing. For example, the following is a SQL query for counting unread posts for a given user, grouping by the discussion area ("item") that the posts are in:

    $count_unseen_query = "
        SELECT 
            msgs.item as id,
            count(1) as the_count
        FROM msgs
        WHERE
        msgs.usr != " . $usr_id . " AND
        msgs.state != 'deleted' AND
        NOT EXISTS (
           SELECT 1 
           FROM 
              msgs_seen_bounds msb
           WHERE 
              msgs.id BETWEEN msb.lower_msg_id AND msb.upper_msg_id
              AND msb.usr_id = " . $usr_id . "
        )
        GROUP BY msgs.item
        ;
    

    The more users read on the forum, the wider the bounds marked as read by each tuple, and the less tuples have to be stored. Users can get an accurate count of read vs. unread, and can pretty easily be aggregated to see read vs. unread in each forum, subforum, topic, etc.

    Given a small forum of about 2000+ posts, the following are the usage statistics regarding the number of tuples stored, sorted by number of times users have logged in (approximating user activity). The column "num_bounds" is the number of tuples required to store the user's "num_posts_read" viewing history.

    id  num_log_entries num_bounds num_posts_read num_posts
    479             584         11           2161       228
    118             461          6           2167       724
    487             119         34           2093       199
    499              97          6           2090       309
    476              71        139            481        82
    480              33         92            167        26
    486              33        256            757       154
    496              31        108            193        51
    490              31         80            179        61
    475              28        129            226        47
    491              22         22           1207        24
    502              20        100            232        65
    493              14         73            141         5
    489              14         12           1517        22
    498              10         72            132        17
    

    I haven't seen this particular implementation in any forum but my own custom one, and it's a small one at that. I'd be interested if anybody else has implemented, or seen this implemented elsewhere, particularly in a large and/or active forum.

    Regards,

    Kaiden

    0 讨论(0)
  • 2020-12-07 17:31

    Non exactly a PHP-answer, but here's how we do it in our asp.net-based forum (I'm affiliated with this product, disclosing that due to the rules)

    1. We use cookies, not the database.
      • Disadvantage of cookies - not "cross-device" (visiting from another computer shows everything as unread)
      • Advantage - no huge DB reads/writes. And tracking works for "guest" users also! This is awesome.
    2. We store a cookie with { topicID, lastReadMessageID } pairs for every topic the user visits.
    3. If the data for a particular topic is not found in the cookie we assume the topic is either:
      • fully unread (if the topic's last message is greater than MAX lastReadMessageID from (2)
      • fully read (if otherwise)

    This has some minor flaws, but it does the job.

    PS. Also, some might say that using cookies leaves garbage on the user's computer (I personally hate this), but we found out that an average user tracks about 20 topics tops, so it takes about 10 bytes per topic so it takes less than 200 bytes on the user's harddrive.

    0 讨论(0)
  • 2020-12-07 17:32

    I have read all the answers and I came with an idea that may be the best mix for this subject (no code though).
    This idea is a mix of all of your ideas and the little experience I have in programming
    Aprox 95% of the users (statistics got from a forum admin and his forum logs) read the forum's topics straight to the last post (or page) and don't go back, read the 1st pages' posts (or just the 1st post) and then go to the last page, or they read the whole thread from the beginning 'til the end and if they turn back they had already read that part. So a good solution would work like this:
    I think If we make a store, for each user, for each thread, the timestamp of the last post the user viewed (and, if applicable, the first post the user viewed even if that may not get to be useful) we could get somewhere with this. The system is pretty simple and is almost like phpbb's. It would also be useful to mark the last post we have seen to continue in that one later (instead of being forced to consider all that page as read). And, as each thread has its own id. There's no need to organize like phpbb does.

    0 讨论(0)
  • 2020-12-07 17:33

    Why are you concerned?

    I don't see an issue with any I/O for getting the unread threads. It doesn't have to be live. A 15 minute delay based on a cache value would work.

    So for unread threads you just

    Pseudo code..

    $result = SELECT id,viewcount from my_forum_threads
    
    $cache->setThreads($result['id'],$result['viewcount']);
    

    Then on a page load you just get the cache values rather than querying the database again. Its really not a big issue at all.

    The average page on my website takes 20 mysql queries. When I cache it is only two to four queries.

    0 讨论(0)
提交回复
热议问题