PostgreSQL - Getting statistical data

后端 未结 1 1314
北恋
北恋 2021-02-06 03:57

I need to collect some statistical information in my application. I have a table of users (tb_user) Every time a new user accesses the application, it adds a new record in this

相关标签:
1条回答
  • 2021-02-06 04:14

    You should look into aggregate functions (min, max, count, avg), which go hand in hand with GROUP BY. For date-based aggregations, date_trunc is also useful.

    For example, this will return the number of rows per day:

    SELECT date_trunc('day', date_time) AS day_start,
           COUNT(id) AS user_count FROM tb_user
        GROUP BY date_trunc('day', date_time);
    

    You can then do the daily average using something like this (with a CTE):

    WITH daily_count AS (SELECT date_trunc('day', date_time) AS day_start,
           COUNT(id) AS user_count FROM tb_user
        GROUP BY date_trunc('day', date_time))
    SELECT AVG(user_count) FROM daily_count;
    

    Use 'week' instead of day for the weekly counts, and so on (see date_trunc documentation).

    EDIT: (Following comment: average up to and including 5/1/2012, i.e. before the 6th.)

    WITH daily_count AS (SELECT date_trunc('day', date_time) AS day_start,
           COUNT(id) AS user_count
        FROM tb_user
           WHERE date_time >= DATE('2012-01-01') AND date_time < DATE('2012-01-06') 
        GROUP BY date_trunc('day', date_time))
    SELECT SUM(user_count)/(DATE('2012-01-06') - DATE('2012-01-01')) FROM daily_count;
    

    What's above is over-complicated, in this case. This should give you the same result:

    SELECT COUNT(id)/(DATE('2012-01-06') - DATE('2012-01-01'))
        FROM tb_user
           WHERE date_time >= DATE('2012-01-01') AND date_time < DATE('2012-01-06');
    

    EDIT 2: After your edit, I guess what you're after is just a single global average for the entire period of existence of your database, rather than groups by month/week/day.

    This should give you the average number of rows per day:

    WITH total_min_max AS (SELECT
            COUNT(id) AS total_visits,
            MIN(date_time) AS first_date_time,
            MAX(date_time) AS last_date_time,
        FROM tb_user)
    SELECT total_visits/((last_date_time::date-first_date_time::date)+1) AS users_per_day
        FROM total_min_max
    

    (I would replace last_date_time with NOW() to make the average over the time until now, rather than until the last visit, if there's no recent visit.)

    Then, for daily, weekly, and "monthly":

    WITH daily_avg AS (
        WITH total_min_max AS (SELECT
                COUNT(id) AS total_visits,
                MIN(date_time) AS first_date_time,
                MAX(date_time) AS last_date_time,
            FROM tb_user)
        SELECT total_visits/((last_date_time::date-first_date_time::date)+1) AS users_per_day
            FROM total_min_max)
    SELECT
             users_per_day,
             (users_per_day * 7) AS users_per_week,
             (users_per_month * 30) AS users_per_month
        FROM daily_avg
    

    This being said, conclusions you draw from such statistics might not be great, especially if you want to see how it changes.

    I would also normalise the data per day rather than assuming 30 days in a month (if not per hour, because not all days have 24 hours). Say you have 10 visits per day in Jan 2011 and 10 visits per day in Feb 2011. That gives you 310 visits in Jan and 280 visits in Feb. If you don't pay attention, you could think you've had a almost a 10% drop in terms of number of visitors, so something went wrong in Feb, when really, this isn't the case.

    0 讨论(0)
提交回复
热议问题