I have a a PostgreSQL database table with the following simplified structure:
Basica
Use the often overlooked built-in function width_bucket() in combination with your aggregation:
If your coordinates run from, say, 0 to 2000 and you want to consolidate everything within squares of 5 to single points, I would lay out a grid of 10 (5*2) like this:
SELECT device_id
, width_bucket(pos_x, 0, 2000, 2000/10) * 10 AS pos_x
, width_bucket(pos_y, 0, 2000, 2000/10) * 10 AS pos_y
, count(*) AS ct -- or any other aggregate
FROM tbl
GROUP BY 1,2,3
ORDER BY 1,2,3;
To minimize the error you could GROUP BY
the grid as demonstrated, but save actual average coordinates:
SELECT device_id
, avg(pos_x)::int AS pos_x -- save actual averages to minimize error
, avg(pos_y)::int AS pos_y -- cast if you need to
, count(*) AS ct -- or any other aggregate
FROM tbl
GROUP BY
device_id
, width_bucket(pos_x, 0, 2000, 2000/10) * 10 -- aggregate by grid
, width_bucket(pos_y, 0, 2000, 2000/10) * 10
ORDER BY 1,2,3;
sqlfiddle demonstrating both alongside.
Well, this particular case could be simpler:
...
GROUP BY
device_id
, (pos_x / 10) * 10 -- truncates last digit of an integer
, (pos_y / 10) * 10
...
But that's just because the demo grid size of 10
conveniently matches the decimal system. Try the same with a grid size of 17
or something ...
You can expand this approach to cover date
and timestamp
values by converting them to unix epoch (number of seconds since '1970-1-1') with extract().
SELECT extract(epoch FROM '2012-10-01 21:06:38+02'::timestamptz);
When you are done, convert the result back to timestamp with time zone
:
SELECT timestamptz 'epoch' + 1349118398 * interval '1s';
Or simply to_timestamp():
SELECT to_timestamp(1349118398);
select [some aggregates] group by (pos_x/5, pos_y/5);
Where instead of 5 you can have any number depending how much aggregation you need/