MYSQL sum() for distinct rows

前端 未结 8 1791
太阳男子
太阳男子 2020-11-28 04:43

I\'m looking for help using sum() in my SQL query:

SELECT links.id, 
       count(DISTINCT stats.id) as clicks, 
       count(DISTINCT conversions.id) as con         


        
相关标签:
8条回答
  • 2020-11-28 05:09

    I may be wrong but from what I understand

    • conversions.id is the primary key of your table conversions
    • stats.id is the primary key of your table stats

    Thus for each conversions.id you have at most one links.id impacted.

    You request is a bit like doing the cartesian product of 2 sets :

    [clicks]
    SELECT *
    FROM links 
    LEFT OUTER JOIN stats ON links.id = stats.parent_id 
    
    [conversions]
    SELECT *
    FROM links 
    LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
    

    and for each link, you get sizeof([clicks]) x sizeof([conversions]) lines

    As you noted the number of unique conversions in your request can be obtained via a

    count(distinct conversions.id) = sizeof([conversions])
    

    this distinct manages to remove all the [clicks] lines in the cartesian product

    but clearly

    sum(conversions.value) = sum([conversions].value) * sizeof([clicks])
    

    In your case, since

    count(*) = sizeof([clicks]) x sizeof([conversions])
    count(*) = sizeof([clicks]) x count(distinct conversions.id)
    

    you have

    sizeof([clicks]) = count(*)/count(distinct conversions.id)
    

    so I would test your request with

    SELECT links.id, 
       count(DISTINCT stats.id) as clicks, 
       count(DISTINCT conversions.id) as conversions, 
       sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value 
    FROM links 
    LEFT OUTER JOIN stats ON links.id = stats.parent_id 
    LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
    GROUP BY links.id 
    ORDER BY links.created desc;
    

    Keep me posted ! Jerome

    0 讨论(0)
  • 2020-11-28 05:14

    For an explanation of why you were seeing incorrect numbers, read this.

    I think that Jerome has a handle on what is causing your error. Bryson's query would work, though having that subquery in the SELECT could be inefficient.

    0 讨论(0)
提交回复
热议问题