Using same column multiple times in WHERE clause

前端 未结 8 739
醉话见心
醉话见心 2020-12-15 10:19

I have a following table structure.

USERS

PROPERTY_VALUE

PROPERTY_NAME

USER_

相关标签:
8条回答
  • 2020-12-15 10:37

    Assuming you want to select all the fields in the USERS table

    SELECT u.* 
    FROM USERS u
    INNER JOIN 
    (
        SELECT USERS.id as user_id, COUNT(*) as matching_property_count
        FROM USERS
        INNER JOIN (
            SELECT m.user_id, n.name as property_name, v.value
            FROM PROPERTY_NAME n
            INNER JOIN PROPERTY_VALUE v ON n.id = v.property_name_id
            INNER JOIN USER_PROPERTY_MAP m ON m.property_value_id = v.property_value_id
            WHERE  (n.id = @property_id_1 AND v.value = @property_value_1) -- Property Condition 1
                OR (n.id = @property_id_2 AND v.value = @property_value_2) -- Property Condition 2
                OR (n.id = @property_id_3 AND v.value = @property_value_3) -- Property Condition 3
                OR (n.id = @property_id_N AND v.value = @property_value_N) -- Property Condition N
        ) USER_PROPERTIES ON USER_PROPERTIES.user_id = USERS.id
        GROUP BY USERS.id
        HAVING COUNT(*) = N     --N = the number of Property Condition in the WHERE clause
        -- Note : 
        -- Use HAVING COUNT(*) = N if property matches will be "MUST MATCH ALL"
        -- Use HAVING COUNT(*) > 0 if property matches will be "MUST MATCH AT LEAST ONE"
    ) USER_MATCHING_PROPERTY_COUNT ON u.id = USER_MATCHING_PROPERTY_COUNT.user_id
    
    0 讨论(0)
  • 2020-12-15 10:38

    If I understand your question correctly I would do it like this.

    SELECT u.id, u.user_name, u.city FROM users u 
    WHERE (SELECT count(*) FROM property_value v, user_property_map m 
    WHERE m.user_id = u.id AND m.property_value_id = v.id AND v.value IN ('101', '102')) = 2
    

    This should return a list of users that have all the properties listed in the IN clause. The 2 represents the number of properties searched for.

    0 讨论(0)
  • 2020-12-15 10:40

    This is a case of relational-division. I added the tag.

    Indexes

    Assuming a PK or UNIQUE constraint on USER_PROPERTY_MAP(property_value_id, user_id) - columns in this order to make my queries fast. Related:

    • Is a composite index also good for queries on the first field?

    You should also have an index on PROPERTY_VALUE(value, property_name_id, id). Again, columns in this order. Add the the last column id only if you get index-only scans out of it.

    For a given number of properties

    There are many ways to solve it. This should be one of the simplest and fastest for exactly two properties:

    SELECT u.*
    FROM   users             u
    JOIN   user_property_map up1 ON up1.user_id = u.id
    JOIN   user_property_map up2 USING (user_id)
    WHERE  up1.property_value_id =
          (SELECT id FROM property_value WHERE property_name_id = 1 AND value = '101')
    AND    up2.property_value_id =
          (SELECT id FROM property_value WHERE property_name_id = 2 AND value = '102')
    -- AND    u.user_name = 'user1'  -- more filters?
    -- AND    u.city = 'city1'
    

    Not visiting table PROPERTY_NAME, since you seem to have resolved property names to IDs already, according to your example query. Else you could add a join to PROPERTY_NAME in each subquery.

    We have assembled an arsenal of techniques under this related question:

    • How to filter SQL results in a has-many-through relation

    For an unknown number of properties

    @Mike and @Valera have very useful queries in their respective answers. To make this even more dynamic:

    WITH input(property_name_id, value) AS (
          VALUES  -- provide n rows with input parameters here
            (1, '101')
          , (2, '102')
          -- more?
          ) 
    SELECT *
    FROM   users u
    JOIN  (
       SELECT up.user_id AS id
       FROM   input
       JOIN   property_value    pv USING (property_name_id, value)
       JOIN   user_property_map up ON up.property_value_id = pv.id
       GROUP  BY 1
       HAVING count(*) = (SELECT count(*) FROM input)
       ) sub USING (id);
    

    Only add / remove rows from the VALUES expression. Or remove the WITH clause and the JOIN for no property filters at all.

    The problem with this class of queries (counting all partial matches) is performance. My first query is less dynamic, but typically considerably faster. (Just test with EXPLAIN ANALYZE.) Especially for bigger tables and a growing number of properties.

    Best of both worlds?

    This solution with a recursive CTE should be a good compromise: fast and dynamic:

    WITH RECURSIVE input AS (
       SELECT count(*)     OVER () AS ct
            , row_number() OVER () AS rn
            , *
       FROM  (
          VALUES  -- provide n rows with input parameters here
            (1, '101')
          , (2, '102')
          -- more?
          ) i (property_name_id, value)
       )
     , rcte AS (
       SELECT i.ct, i.rn, up.user_id AS id
       FROM   input             i
       JOIN   property_value    pv USING (property_name_id, value)
       JOIN   user_property_map up ON up.property_value_id = pv.id
       WHERE  i.rn = 1
    
       UNION ALL
       SELECT i.ct, i.rn, up.user_id
       FROM   rcte              r
       JOIN   input             i ON i.rn = r.rn + 1
       JOIN   property_value    pv USING (property_name_id, value)
       JOIN   user_property_map up ON up.property_value_id = pv.id
                                  AND up.user_id = r.id
       )
    SELECT u.*
    FROM   rcte  r
    JOIN   users u USING (id)
    WHERE  r.ct = r.rn;          -- has all matches
    

    dbfiddle here

    The manual about recursive CTEs.

    The added complexity does not pay for small tables where the additional overhead outweighs any benefit or the difference is negligible to begin with. But it scales much better and is increasingly superior to "counting" techniques with growing tables and a growing number of property filters.

    Counting techniques have to visit all rows in user_property_map for all given property filters, while this query (as well as the 1st query) can eliminate irrelevant users early.

    Optimizing performance

    With current table statistics (reasonable settings, autovacuum running), Postgres has knowledge about "most common values" in each column and will reorder joins in the 1st query to evaluate the most selective property filters first (or at least not the least selective ones). Up to a certain limit: join_collapse_limit. Related:

    • Postgresql join_collapse_limit and time for query planning
    • Why does a slight change in the search term slow down the query so much?

    This "deus-ex-machina" intervention is not possible with the 3rd query (recursive CTE). To help performance (possibly a lot) you have to place more selective filters first yourself. But even with the worst-case ordering it will still outperform counting queries.

    Related:

    • Check statistics targets in PostgreSQL

    Much more gory details:

    • PostgreSQL partial index unused when created on a table with existing data

    More explanation in the manual:

    • Statistics Used by the Planner
    0 讨论(0)
  • 2020-12-15 10:41
    SELECT *
      FROM users u
     WHERE u.id IN(
             select m.user_id
               from property_value v
               join USER_PROPERTY_MAP m
                 on v.id=m.property_value_id 
              where (v.property_name_id, v.value) in( (1, '101'), (2, '102') )
              group by m.user_id
             having count(*)=2
          )
    

    OR

    SELECT u.id
      FROM users u
     INNER JOIN user_property_map upm ON u.id = upm.user_id
     INNER JOIN property_value pv ON upm.property_value_id = pv.id
     WHERE (pv.property_name_id=1 and pv.value='101')
        OR (pv.property_name_id=2 and pv.value='102')
     GROUP BY u.id
    HAVING count(*)=2
    

    No property_name table needed in query if propery_name_id are kown.

    0 讨论(0)
  • 2020-12-15 10:43

    you are using AND operator between two pn.id=1 and pn.id=2. then how you getting the answer is between that:

    (SELECT id FROM property_value WHERE value like '101') and
    (SELECT id FROM property_value WHERE value like '102') 
    

    So like above comments , Use or operator.

    Update 1:

    SELECT * FROM users u
    INNER JOIN user_property_map upm ON u.id = upm.user_id
    INNER JOIN property_value pv ON upm.property_value_id = pv.id
    INNER JOIN property_name pn ON pv.property_name_id = pn.id
    WHERE pn.id in (1,2) AND pv.id IN (SELECT id FROM property_value WHERE value like '101' or value like '102');
    
    0 讨论(0)
  • 2020-12-15 10:44
    SELECT * FROM users u
    INNER JOIN user_property_map upm ON u.id = upm.user_id
    INNER JOIN property_value pv ON upm.property_value_id = pv.id
    INNER JOIN property_name pn ON pv.property_name_id = pn.id
    WHERE (pn.id = 1 AND pv.id IN (SELECT id FROM property_value WHERE value 
    like '101') )
    OR ( pn.id = 2 AND pv.id IN (SELECT id FROM property_value WHERE value like 
    '102'))
    
    OR (...)
    OR (...)
    

    You can't do AND because there is no such a case where id is 1 and 2 for the SAME ROW, you specify the where condition for each row!

    If you run a simple test, like

    SELECT * FROM users where id=1 and id=2 
    

    you will get 0 results. To achieve that use

     id in (1,2) 
    

    or

     id=1 or id=2
    

    That query can be optimised more but this is a good start I hope.

    0 讨论(0)
提交回复
热议问题