self-join

How to implement self-join/cross-product with hadoop?

三世轮回 提交于 2019-12-06 06:13:35
问题 It is common task to make some evaluation on pairs of items: Examples: de-duplication, collaborative filtering, similar items etc This is basically self-join or cross-product with the same source of data. 回答1: To do a self join, you can follow the "reduce-side join" pattern. The mapper emits the join/foreign key as key, and the record as the value. So, let's say we wanted to do a self-join on "city" (the middle column) on the following data: don,baltimore,12 jerry,boston,19 bob,baltimore,99

Odd behaviour of data.table's update on non-equi self-join

强颜欢笑 提交于 2019-12-06 04:30:48
问题 While preparing an answer to the question dplyr or data.table to calculate time series aggregations in R I noticed that I do get different results depending on whether the table is updated in place or returned as a new object. Also, I do get different result when I change the order of columns in the non-equi join conditions. Currently, I don't have an explanation for this, perhaps due to a major misunderstanding on my side or a simple coding error. Please, note that this question is asking

Multiple Foreign keys to a single table and single key pointing to more than one table

妖精的绣舞 提交于 2019-12-06 00:20:29
I need some suggestions from the database design experts here. I have around six foreign keys into a single table (defect) which all point to primary key in user table. It is like: defect (.....,assigned_to,created_by,updated_by,closed_by...) If I want to get information about the defect I can make six joins. Do we have any better way to do it? Another one is I have a states table which can store one of the user-defined set of values. I have defect table and task table and I want both of these tables to share the common state table (New, In Progress etc.). So I created: task (.....,state_id

LINQ: Self join query, how to accomplish this?

╄→尐↘猪︶ㄣ 提交于 2019-12-05 23:54:56
Can anyone help? I have 1 class, basically it holds Members and within that class is a List. The members i have in a List also... So basically it goes like this, I have 2 members and each member has a number of sessions. I wish to only return each member with 1 Session. I have done a LINQ query, but of course it doesn't work... I think i need to do a self join, any ideas? Basically my error is m doesn't exist in my subquery self join. var sessions = from m in this.members join s in ( from se in m.Sessions group se by se.Name into g select new {Name = g.Key, SessioEndTime = g.Max(a=>a

Improving performance with a Similarity Postgres fuzzy self join query

空扰寡人 提交于 2019-12-05 23:26:48
I am trying to run a query that joins a table against itself and does fuzzy string comparison (using trigram comparisons) to find possible company name matches. My goal is to return records where the trigram similarity of one record's company name (ref_name field) matches another record's company name. Currently, I have my threshold set to 0.9 so it will only bring back matches that are very likely to contain the a similar string. I know that self joins can result in many comparisons by nature, but I want to optimize my query the best I can. I don't need results instantaneously, but currently

MYSQL: Avoiding cartesian product of repeating records when self-joining

独自空忆成欢 提交于 2019-12-05 17:36:48
There are two tables: table A and table B. They have the same columns and the data is practically identical . They both have auto-incremented IDs, the only difference between the two is that they have different IDs for the same records. Among the columns, there is an IDENTIFIER column which is not unique , i.e. there are (very few) records with the same IDENTIFIER in both tables. Now, in order to find a correspondence between the IDs of table A and the IDs of table B, I have to join these two tables (for all purposes it's a self-join) on the IDENTIFIER column, something like: SELECT A.ID, B.ID

Self join to a table

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-05 10:33:29
I have a table like Employee ================== name salary ================== a 10000 b 20000 c 5000 d 40000 i want to get all the employee whose salary is greater than A's salary. I don't want to use any nested or sub query. It has been asked in an interview and hint was to use self join. I really can't figure out how to achieve the same. select e1.* from Employee e1, Employee e2 where e2.name = 'a' and e1.salary > e2.salary Using self join select e1.* from Employee e1 join Employee e2 on e2.name = 'a' and e1.salary > e2.salary Chinjoo SELECT emp1.* FROM Employee emp1 JOIN Employee emp2 ON

Self-join on a table with ActiveRecord

孤人 提交于 2019-12-05 08:39:34
I have an ActiveRecord called Name which contains names in various Languages . class Name < ActiveRecord::Base belongs_to :language class Language < ActiveRecord::Base has_many :names Finding names in one language is easy enough: Language.find(1).names.find(whatever) But I need to find matching pairs where both language 1 and language 2 have the same name. In SQL, this calls for a simple self-join: SELECT n1.id,n2.id FROM names AS n1, names AS n2 WHERE n1.language_id=1 AND n2.language_id=2 AND n1.normalized=n2.normalized AND n1.id != n2.id; How can I do a query like this with ActiveRecord?

Join table on itself - performance

谁说胖子不能爱 提交于 2019-12-05 06:47:36
问题 I would like some help with the following join. I have one table (with about 20 million rows) that consists of: MemberId (Primary Key) | Id (Primary Key) | TransactionDate | Balance I would like to get the latest Balance for all the customers in one query. I know I could do something like this (I just wrote it from my memory). But this way is terribly slow. SELECT * FROM money WHERE money.Id = (SELECT MAX(Id) FROM money AS m WHERE m.MemberId = money.MemberId) Are there any other (faster

Selecting rows from a table that have the same value for one field

陌路散爱 提交于 2019-12-04 22:02:23
I have a MySQL database with these two tables: Tutor(tutorId, initials, lastName, email, phone, office) Student(studentId, initials, lastName, email, tutorId) What is the query to return the initials and last names of any student who share the same tutor? I tried SELECT intials, lastName FROM Student WHERE tutorId = tutorId but that just returns the names of all students. You'll have to join students against itself: SELECT s1.initials, s1.lastName FROM Student s1, Student s2 WHERE s1.studentId <> s2.studentID /* Every student has the same tutor as himself */ AND s1.tutorId = s2.tutorid If you