问题
I have experiments, features, and feature_values. Features have values in different experiments. So I have something like:
Experiments:
experiment_id, experiment_name
Features:
feature_id, feature_name
Feature_values:
experiment_id, feature_id, value
Lets say, I have three experiments (exp1, exp2, exp3) and three features (feat1, feat2, feat3). I would like to have a SQL-result that looks like:
feature_name | exp1 | exp2 | exp3
-------------+------+------+-----
feat1 | 100 | 150 | 110
feat2 | 200 | 250 | 210
feat3 | 300 | 350 | 310
How can I do this? Furthermore, It might be possible that one feature does not have a value in one experiment.
feature_name | exp1 | exp2 | exp3
-------------+------+------+-----
feat1 | 100 | 150 | 110
feat2 | 200 | | 210
feat3 | | 350 | 310
The SQL-Query should be with good performance. In the future there might tens of millions entries in the feature_values table. Or is there a better way to handle the data?
回答1:
I'm supposing here that feature_id, experiment_id
is unique key for Feature_values
.
Standard SQL way to do this is to make n joins
select
F.feature_name,
FV1.value as exp1,
FV2.value as exp2,
FV3.value as exp3
from Features as F
left outer join Feature_values as FV1 on FV1.feature_id = F.feature_id and FV1.experiment_id = 1
left outer join Feature_values as FV2 on FV2.feature_id = F.feature_id and FV2.experiment_id = 2
left outer join Feature_values as FV3 on FV3.feature_id = F.feature_id and FV3.experiment_id = 3
Or pivot data like this (aggregate max
is not actually aggregating anything):
select
F.feature_name,
max(case when E.experiment_name = 'exp1' then FV.value end) as exp1,
max(case when E.experiment_name = 'exp2' then FV.value end) as exp2,
max(case when E.experiment_name = 'exp3' then FV.value end) as exp3
from Features as F
left outer join Feature_values as FV on FV.feature_id = F.feature_id
left outer join Experiments as E on E.experiment_id = FV.experiment_id
group by F.feature_name
order by F.feature_name
sql fiddle demo
you can also consider using json (in 9.3 version) or hstore to get all experiment values into one column.
回答2:
This is a common request. It's called a pivot or crosstab query. PostgreSQL doesn't have any nice built-in syntax for it, but you can use the crosstab function from the tablefunc module to do what you want.
For more information search Stack Overflow for [postgresql] [pivot]
or [postgresql] [crosstab]
.
Some relational database systems offer a nice way to do this with a built-in query, but as yet PostgreSQL does not.
回答3:
What you're attempting is a bit difficult since you are trying to present a set of tables as a single table and obviously, this involves some transformation and some assumptions.
Assuming that you know in advance that there are only 3 experiments and only three features, you could do something like the following
SELECT
feature_id,
SUM(CASE WHEN experiment_id = 1 THEN value ELSE 0 END) AS Exp1Total,
SUM(CASE WHEN experiment_id = 2 THEN value ELSE 0 END) AS Exp2Total,
SUM(CASE WHEN experiment_id = 3 THEN value ELSE 0 END) AS Exp3Total,
FROM
Feature_values
GROUP BY
feature_id
ORDER BY
feature_id
In this case, your table will contain the ID of the experiments and the features rather than their names. To get their names, you will need to join on the Features table and also, on the Experiments table. I've omitted this for clarity since I think the most difficult part is the case logic.
来源:https://stackoverflow.com/questions/18463634/how-to-flatten-a-postgresql-result