How do I implement threaded comments?

耗尽温柔 提交于 2019-12-02 14:18:42

Storing trees in a database is a subject which has many different solutions. It depends on if you want to retrieve a subhierarchy as well (so all children of item X) or if you just want to grab the entire set of hierarchies and build the tree in an O(n) way in memory using a dictionary.

Your table has the advantage that you can fetch all comments on a post in 1 go, by filtering on the parentpost. As you've defined the comment's parent in the textbook/naive way, you have to build the tree in memory (see below). If you want to obtain the tree from the DB, you need a different way to store a tree: See my description of a pre-calc based approach here: http://www.llblgen.com/tinyforum/GotoMessage.aspx?MessageID=17746&ThreadID=3208 or by using balanced trees described by CELKO here:

or yet another approach: http://www.sqlteam.com/article/more-trees-hierarchies-in-sql

If you fetch everything in a hierarchy in memory and build the tree there, it can be more efficient due to the fact that the query is pretty simple: select .. from Comment where ParentPost = @id ORDER BY ParentComment ASC

After that query, you build the tree in memory with just 1 dictionary which keeps track of the tuple CommentID - Comment. You now walk through the resultset and build the tree on the fly: every comment you run into, you can lookup its parentcomment in the dictionary and then store the comment currently processed also in that dictionary.

Couple things to also consider...

1) When you say "sort like reddit" based on rank or date, do you mean the top-level or the whole thing?

2) When you delete a node, what happens to the branches? Do you re-parent them? In my implementation, I'm thinking that the editors will decide--either hide the node and display it as "comment hidden" along with the visible children, hide the comment and it's children, or nuke the whole tree. Re-parenting should be easy (just set the chidren's parent to the deleted's parent), but it anything involving the whole tree seems to be tricky to implement in the database.

I've been looking at the ltree module for PostgreSQL. It should make database operations involving parts of the tree a bit faster. It basically lets you set up a field in the table that looks like:

ltreetest=# select path from test where path <@ 'Top.Science';
                path                
------------------------------------
 Top.Science
 Top.Science.Astronomy
 Top.Science.Astronomy.Astrophysics
 Top.Science.Astronomy.Cosmology

However, it doesn't ensure any kind of referential integrity on its own. In other words, you can have a records for "Top.Science.Astronomy" without having a record for "Top.Science" or "Top". But what it does let you do is stuff like:

-- hide the children of Top.Science
UPDATE test SET hide_me=true WHERE path @> 'Top.Science';

or

-- nuke the cosmology branch
DELETE FROM test WHERE path @> 'Top.Science.Cosmology';

If combined with the traditional "comment_id"/"parent_id" approach using stored procedures, I'm thinking you can get the best of both worlds. You can quickly traverse the comment tree in the database using your "path" and still ensure referential integrity via "comment_id"/"parent_id". I'm envisioning something like:

CREATE TABLE comments (
comment_id SERIAL PRIMARY KEY,
parent_comment_id int REFERENCES comments(comment_id) ON UPDATE CASCADE ON DELETE CASCADE,
thread_id int NOT NULL  REFERENCES threads(thread_id) ON UPDATE CASCADE ON DELETE CASCADE,
path ltree NOT NULL,
comment_body text NOT NULL,
hide boolean not null default false
);

The path string for a comment look like be

<thread_id>.<parent_id_#1>.<parent_id_#2>.<parent_id_#3>.<my_comment_id>

Thus a root comment of thread "102" with a comment_id of "1" would have a path of:

102.1

And a child whose comment_id is "3" would be:

102.1.3

A some children of "3" having id's of "31" and "54" would be:

102.1.3.31
102.1.3.54

To hide the node "3" and its kids, you'd issue this:

UPDATE comments SET hide=true WHERE path @> '102.1.3';

I dunno though--it might add needless overhead. Plus I don't know how well maintained ltree is.

Your current design is basically fine for small hierarchies (less than thousand items)

If you want to fetch on a certian level or depth, add a 'level' item to your structure and compute it as part of the save

If performance is an issue use a decent cache

I'd add the following new fields to the above tabel:

  • thread_id: identifier for all comments attached to a specific object

  • date: the comment date (allows fetching the comments in order)

  • rank: the comment rank (allows fetching the comment order by ranking)

Using these fields you'll be able to:

  1. fetch all comments in a thread in a single op
  2. order comments in a thread either by date or rank

Unfortunately if you want to preserve your queries DB close to SQL standard you'll have to recreate the tree in memory. Some DBs are offering special queries for hierarchical data (f.e. Oracle)

./alex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!