NOT IN vs IN Do Not Return Complimentary Results

后端 未结 4 957
灰色年华
灰色年华 2021-02-11 10:03

Hi I am working through example #7 from the sql zoo tutorial: SELECT within SELECT. In the following question

\"Find each country that belongs to a continent where all p

相关标签:
4条回答
  • 2021-02-11 10:23

    why use a sub query?

    try using:

    SELECT name, continent, population FROM world 
    WHERE population > 25000000
    

    and/or

    SELECT name, continent, population FROM world 
    WHERE population <= 25000000
    

    the column of your condition: "population" is in the FROM table: "world". There is no need to use a sub query of the same table "world" again, just use the "population" column directly in the WHERE

    or are you trying to do this:

    SELECT name, continent, population FROM world 
    WHERE continent NOT IN (
        SELECT continent FROM world
        GROUP BY continent 
        HAVING SUM(population) > 25000000)
    

    notice the: SUM(), GROUP BY, and HAVING

    0 讨论(0)
  • 2021-02-11 10:27

    If I'm reading this correctly, the question asks to list every country in a continent where every country has a population below 25000000, correct?

    If yes, look at your sub query:

    SELECT continent FROM world
    WHERE population > 25000000
    

    You are pulling every continent that has at least one country w/ population over 25000000, so excluding those is why it works.

    Example: Continent Alpha has 5 countries, four of them are small, but one of them, country Charlie has a population of 50000000.

    So your sub query will return Continent Alpha because country Charlie fit the constraint of population > 25000000. This sub query will find everything that you don't want, that's why using the not in will work.

    On the other hand:

    SELECT continent FROM world
    WHERE population > 25000000
    

    If ANY country is below 25000000, it will display the continent, which is not what you want, because you want EVERY country to be below.

    Example: Continent Alpha from before, the four small countries. Those four are below 25000000, so they will be returned by your sub query, regardless of the fact that Country Charlie has 50000000.

    Obviously, this is not the best way to go about it, but this is why the first query worked, and the second did not.

    0 讨论(0)
  • 2021-02-11 10:34

    Because every other continent has at least one country with less then 25 Mio population. That is what this says.

      SELECT name, continent, population FROM world 
    WHERE continent IN (
        SELECT continent FROM world
        WHERE population < 25000000)
    

    Translating it into words: From the list of all countries (in table world) please find all countries where the continent has a country that has less than 25 Mio population.

    0 讨论(0)
  • 2021-02-11 10:49

    Show the table DECLARATION. It seems you use CONTINENT as the continent number. Then you should check it is marked with PRIMARY KEY and NOT NULL options. I realyl suspect you just forgot about very special meaning NULL has in SQL.

    I make an example in Firebird 2.5.1 SQL server.

    CREATE TABLE WORLD (
        CONTINENT   INTEGER,
        NAME        VARCHAR(20),
        POPULATION  INTEGER
    );
    
    
    INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (NULL, 'null-id', 100);
    INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (1, 'normal 1', 10);
    INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (2, 'normal 2', 200);
    INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (3, 'null-pop', NULL);
    INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (4, 'normal 4', 110);
    
    COMMIT WORK;
    

    Now let's try your requests and see if the 1st row, having CONTINENT IS NULL would be present anywhere:

    SELECT continent, population FROM world
    WHERE continent IN (
        SELECT continent FROM world
        WHERE population > 100)
    
    CONTINENT   POPULATION
    2           200
    4           110
    

    and then

    SELECT continent, population FROM world
    WHERE continent NOT IN (
        SELECT continent FROM world
        WHERE population > 100)
    
    CONTINENT   POPULATION
    1           10
    3           <NULL>
    

    By the logic of the request you suppose CONTINENT to be the row ID, then you should make it NOT-NULL and then there would not be the line, that is not seen by [NOT] IN condition.


    Now, let re-phrase this into flat query:

    SELECT continent, population FROM world
        WHERE NOT (population > 100)
    
    CONTINENT   POPULATION
    <NULL>      100
    1           10
    
    SELECT continent, population FROM world
        WHERE population > 100
    
    CONTINENT   POPULATION
    2           200
    4           110
    

    This time the missed row was the one having NULL for Population column.


    Then FreshPrinceOfSO suggested using EXISTS clause. While potentially it may end with most slow (non-effective) query plan, it at least masks away the special meaning of NULL value in SQL.

    SELECT continent, population FROM world w_ext
    WHERE EXISTS (
       SELECT continent FROM world w_int
       WHERE (w_int.population > 100) and (w_int.continent = w_ext.continent)
    )
    
    CONTINENT   POPULATION
    2   200
    4   110
    
    SELECT continent, population FROM world w_ext
    WHERE NOT EXISTS (
       SELECT continent FROM world w_int
       WHERE (w_int.population > 100) and (w_int.continent = w_ext.continent)
    )
    
    CONTINENT   POPULATION
    <NULL>  100
    1   10
    3   <NULL>
    
    0 讨论(0)
提交回复
热议问题