I am querying the github public dataset on BigQuery. Currently, my best query for what I need looks like the following.
SELECT type, created_at, repository_name FROM [githubarchive:github.timeline]
WHERE
(created_at CONTAINS '2012-')
AND repository_owner="twitter"
ORDER BY created_at, repository_name;
This gives me all the events ("type") from the repository_owner twitter (or any other user) for all the repositories ("repository_name") that this user owns, but in a single column.
However, what I really want is to have all the events ("type") in columns, one column for each repository ("repository_name"), more or less like this:
bootstrap commons twui
WatchEvent PushEvent PushEvent
WatchEvent WatchEvent PushEvent
The timestamp ("created_at") is only relevant as an ordering mechanism. The columns does not need have to be equally long, and the events on a single row does not need to happening at the same time.
I will use this to put the events into the R package TraMineR to do sequence analysis.
How can I accomplish this?
does this work?
SELECT type, created_at, repository_name
FROM [githubarchive:github.timeline]
WHERE
(created_at CONTAINS '2012-')
AND repository_owner="twitter"
GROUP BY type,created_at, repository_name;
I'm not sure I understand exactly what you're hoping to accomplish, but it is possible to get columns via something like this:
SELECT type, bootstrap, commons, twui
FROM (
SELECT type,
SUM(IF(repository_name = 'bootstrap', 1, 0)) AS bootstrap,
SUM(IF(repository_name = 'commons', 1, 0)) AS commons,
SUM(IF(repository_name = 'twui', 1, 0)) AS twui
FROM [githubarchive:github.timeline]
WHERE created_at CONTAINS '2012-'
AND repository_owner = "twitter"
GROUP BY type
)
ORDER BY type
来源:https://stackoverflow.com/questions/11875929/how-to-get-several-columns-from-bigquery