Using Relational Algebra, how can I find duplicate rows in a tuple?

不打扰是莪最后的温柔 提交于 2019-12-25 04:33:15

问题


I am completing a piece of homework and I'm really stuck and have been for a week. I'm not asking for the answer to the question, but just how I'd go about doing it. Basically I need to find duplicates in a single tuple. For example, if each entry was a user ID and a hobby, how would I find all entries where the user ID and hobby appear exactly the same at least two time? So if I had the following tuple...

ID | Hobby
----------
1  | Swimming

2  | Running

3  | Football

1  | Swimming

3  | Football

3  | Football

How would I find the User IDs of the users with duplicate entries? (1 and 3)


回答1:


I was recently assigned a problem very similar to this for homework in a database theory course I'm currently taking. After thinking about it for several minutes, I have a solution! Here we go..

  1. Perform two identical projections on your table (I'll call them P1 and P2), with the restrictions being the table key (unique identifier) and the attribute that's believed to have multiple occurrences of the same value (attr). In the context of this post, ID and Hobby would be the projection restrictions.
  2. Retitle the columns for one of the projections. Or in other words, change the names of ID and Hobby, but maybe still something similar. For our example, we'll rename P2's columns to ID2 and Hobby2.
  3. Critical step!: Perform a cross product between P1 & P2. This will allow for each record to pair with every other record..which is what we want. I'll call this table C. click here for a visual
  4. Perform a selection on C with the criteria (specific to this problem) that ID = ID2 and Hobby = Hobby2. This will be table S.
  5. Perform a projection on S to clear out duplicates, which will leave a table that consists of unique records of paired ID and Hobby values. We'll call it P(S).
  6. Apply the difference operator in the fashion of C - P(S). This will take away the cases when a record is compared with its 'counterpart', leaving only records that are true duplicates.
  7. Lastly, perform a projection on this resulting table, with the restriction of ID.

This should work for detecting duplicates of any other kind/form..simply change the criteria to fit the details of the problem at hand, starting at step 4, and on.



来源:https://stackoverflow.com/questions/19864120/using-relational-algebra-how-can-i-find-duplicate-rows-in-a-tuple

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!