self join vs inner join

那年仲夏 提交于 2019-12-23 06:41:13

问题


what is difference between self join and inner join


回答1:


I find it helpful to think of all of the tables in a SELECT statement as representing their own data sets.

Before you've applied any conditions you can think of each data set as being complete (the entire table, for instance).

A join is just one of several ways to begin refining those data sets to find the information that you really want.

Though a database schema may be designed with certain relationships in mind (Primary Key <-> Foreign Key) these relationships really only exist in the context of a particular query. The query writer can relate whatever they want to whatever they want. I'll give an example of this later...


An INNER JOIN relates two tables to each other. There are often multiple JOIN operations in one query to chain together multiple tables. It can get as complicated as it needs to. For a simple example, consider the following three tables...

STUDENT

| STUDENTID | LASTNAME | FIRSTNAME |
------------------------------------
      1     |  Smith   |   John
      2     |  Patel   |  Sanjay
      3     |   Lee    |  Kevin
      4     |  Jackson |  Steven

ENROLLMENT

| STUDENTID | CLASSID |
-----------------------
      2     |    3
      3     |    1
      4     |    2

CLASS

| CLASSID | COURSE | PROFESSOR |
--------------------------------
     1    | CS 101 |   Smith
     2    | CS 201 |  Ghandi
     3    | CS 301 |  McDavid
     4    | CS 401 |  Martinez

The STUDENT table and the CLASS table were designed to relate to each other through the ENROLLMENT table. This kind of table is called a Junction Table.

To write a query to display all students and the classes in which they are enrolled one would use two inner joins...

SELECT stud.LASTNAME, stud.FIRSTNAME, class.COURSE, class.PROFESSOR
FROM STUDENT stud
INNER JOIN ENROLLMENT enr
    ON stud.STUDENTID = enr.STUDENTID
INNER JOIN CLASS class
    ON class.CLASSID = enr.CLASSID;

Read the above closely and you should see what is happening. What you will get in return is the following data set...

 | LASTNAME | FIRSTNAME | COURSE | PROFESSOR |
 ---------------------------------------------
     Patel  |   Sanjay  | CS 301 |  McDavid
      Lee   |   Kevin   | CS 101 |   Smith
    Jackson |  Steven   | CS 201 |  Ghandi

Using the JOIN clauses we've limited the data sets of all three tables to only those that match each other. The "matches" are defined using the ON clauses. Note that if you ran this query you would not see the CLASSID 4 row from the CLASS table or the STUDENTID 1 row from the STUDENT table because those IDs don't exist in the matches (in this case the ENROLLMENT table). Look into "LEFT"/"RIGHT"/"FULL OUTER" JOINs for more reading on how to make that work a little differently.

Please note, per my comments on "relationships" earlier, there is no reason why you couldn't run a query relating the STUDENT table and the CLASS table directly on the LASTNAME and PROFESSOR columns. Those two columns match in data type and, well look at that! They even have a value in common! This would probably be a weird data set to get in return. My point is it can be done and you never know what needs you might have in the future for interesting connections in your data. Understand the design of the database but don't think of "relationships" as being rules that can't be ignored.

In the meantime... SELF JOINS!


Consider the following table...

PERSON

| PERSONID | FAMILYID |  NAME  |
--------------------------------
      1    |     1    |  John
      2    |     1    | Brynn
      3    |     2    | Arpan
      4    |     2    | Steve
      5    |     2    |  Tim
      6    |     3    | Becca

If you felt so inclined as to make a database of all the people you know and which ones are in the same family this might be what it looks like.

If you wanted to return one person, PERSONID 4, for instance, you would write...

SELECT * FROM PERSON WHERE PERSONID = 4;

You would learn that he is in the family with FAMILYID 2. Then to find all of the PERSONs in his family you would write...

SELECT * FROM PERSON WHERE FAMILYID = 2;

Done and done! SQL, of course, can accomplish this in one query using, you guessed it, a SELF JOIN.

What really triggers the need for a SELF JOIN here is that the table contains a unique column (PERSONID) and a column that serves as sort of a "Category" (FAMILYID). This concept is called Cardinality and in this case represents a one to many or 1:M relationship. There is only one of each PERSON but there are many PERSONs in a FAMILY.

So, what we want to return is all of the members of a family if one member of the family's PERSONID is known...

SELECT fam.*
FROM PERSON per
JOIN PERSON fam
    ON per.FamilyID = fam.FamilyID
WHERE per.PERSONID = 4;

Here's what you would get...

| PERSONID | FAMILYID |  NAME  |
--------------------------------
      3    |     2    | Arpan
      4    |     2    | Steve
      5    |     2    |  Tim

Let's note a couple of things. The words SELF JOIN don't occur anywhere. That's because a SELF JOIN is just a concept. The word JOIN in the query above could have been a LEFT JOIN instead and different things would have happened. The point of a SELF JOIN is that you are using the same table twice.

Consider my soapbox from before on data sets. Here we have started with the data set from the PERSON table twice. Neither instance of the data set affects the other one unless we say it does.

Let's start at the bottom of the query. The per data set is being limited to only those rows where PERSONID = 4. Knowing the table we know that will return exactly one row. The FAMILYID column in that row has a value of 2.

In the ON clause we are limiting the fam data set (which at this point is still the entire PERSON table) to only those rows where the value of FAMILYID matches one or more of the FAMILYIDs of the per data set. As we discussed we know the per data set only has one row, therefore one FAMILYID value. Therefore the fam data set now contains only rows where FAMILYID = 2.

Finally, at the top of the query we are SELECTing all of the rows in the fam data set.

Voila! Two queries in one.


In conclusion, an INNER JOIN is one of several kinds of JOIN operations. I would strongly suggest reading further into LEFT, RIGHT and FULL OUTER JOINs (which are, collectively, called OUTER JOINs). I personally missed a job opportunity for having a weak knowledge of OUTER JOINs once and won't let it happen again!

A SELF JOIN is simply any JOIN operation where you are relating a table to itself. The way you choose to JOIN that table to itself can use an INNER JOIN or an OUTER JOIN. Note that with a SELF JOIN, so as not to confuse your SQL engine you must use table aliases (fam and per from above. Make up whatever makes sense for your query) or there is no way to differentiate the different versions of the same table.

Now that you understand the difference open your mind nice and wide and realize that one single query could contain all different kinds of JOINs at once. It's just a matter of what data you want and how you have to twist and bend your query to get it. If you find yourself running one query and taking the result of that query and using it as the input of another query then you can probably use a JOIN to make it one query instead.

To play around with SQL try visiting W3Schools.com There is a locally stored database there with a bunch of tables that are designed to relate to each other in various ways and it's filled with data! You can CREATE, DROP, INSERT, UPDATE and SELECT all you want and return the database back to its default at any time. Try all sorts of SQL out to experiment with different tricks. I've learned a lot there, myself.

Sorry if this was a little wordy but I personally struggled with the concept of JOINs when I was starting to learn SQL and explaining a concept by using a bunch of other complex concepts bogged me down. Best to start at the bottom sometimes.

I hope it helps. If you can put JOINs in your back pocket you can work magic with SQL!

Happy querying!




回答2:


A self join joins a table to itself. The employee table might be joined to itself in order to show the manager name and the employee name in the same row.

An inner join joins any two tables and returns rows where the key exists in both tables. A self join can be an inner join (most joins are inner joins and most self joins are inner joins). An inner join can be a self join but most inner joins involve joining two different tables (generally a parent table and a child table).




回答3:


An inner join (sometimes called a simple join) is a join of two or more tables that returns only those rows that satisfy the join condition.

A self join is a join of a table to itself. This table appears twice in the FROM clause and is followed by table aliases that qualify column names in the join condition. To perform a self join, Oracle Database combines and returns rows of the table that satisfy the join condition.



来源:https://stackoverflow.com/questions/31925971/self-join-vs-inner-join

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!