问题
I can use the following query to general all the HLL sketches of the distinct counts:
SELECT category, count(distinct city), HLL_COUNT.INIT(city) FROM `table`
GROUP BY category
And I get something like this:
While I would normally use the HLL_COUNT.merge(...)
function to get the total count, for example:
select 'all -- hll', HLL_COUNT.MERGE(x), null from (select category, count(distinct city), HLL_COUNT.INIT(city) x from `datadocs-163219.010ff92f6a62438aa47c10005fe98fc9.inv` group by category) _
For various reasons, I need to do the MERGE
outside of SQL/BigQuery. Is there some sort of library/open source library where I could do something like the following:
>>> hll_set
>>> {'CHAQMBgCIAuCBz8QFBgPIBQyN8hxlqEBvMMBnLMBgWnD5gTB3AH+ROgD/YMEpM8Jr70C6Q2LwwfZlQ3QMNu8AYDSBKf7AbOSqgE=', 'CHAQDhgCIAuCBxwQBxgPIBQyFP3PBMBtibMR3sgC77oViasKwfMF', 'CHAQJxgCIAuCBzIQEBgPIBQyKshxlqEBvMMBzfECh6gJxJABoNwF/rEGwf0PgYYFvOoFmzjJPZwg2y3nbw==', 'CHAQBBgCIAuCBw4QAhgPIBQyBpSJAfapKA==', 'CHAQBRgCIAuCBxEQAxgPIBQyCbaJBfqsH57tBw==', 'CHAQGBgCIAuCBykQDRgPIBQyId6SAtNvwJ0XgO8Ct/EFlvUOskG1E87ZA7/OApwg2y3nbw==', 'CHAQZhgCIAuCB2MQIxgPIBQyW5SJAcqJAbzDAcvcAoIV2xSMFsTyA42IAYkl+Wvj/AHqdJxRlEGbywG/WNjoAqS9BP3CAuPrBNSFAfdDt+YEoeIBr+ICmIYF6CL/MaLNAqKdA8k9rxntBrPVrAE=', 'CHAQEBgCIAuCByQQChgPIBQyHN6SAqjtArAJ/esCj9wSg+8KiVKNygHrpgXIogU=', 'CHAQpgkYAiALggfZAhChARgPIBQyzwKPBMwRkAzxP+wPogyqC8qJAeBo8BHsSOypAbAJriL+MYYR/1jnKqIyzR3wJIkI/QXkecNH7WCzQZgMuDvxFLh+xkboA7QB12akDhu5E+4+3KgBjAZ4nxLBRMw0xRWvIPZYszt+v1gnz2a0BZoF4wzQggHqOewsJeAxgguGErUCjGG3KuhKgUyfCtItkjOMZZwCpi3phgHlA+wRknEhwiq1Os4slgmhELEWl1f1rgH+B6e4AdCtAdkE4R7fK/gihHSRFqipAbYY9BmqP5oBgqsBvhrvEKGRAcpj7XHEVaAUrY8BylLRDgWn1wGpT6IS6irPHewb/AbKHqgQjQPyAeU82zuSHpgQ04UBzwqkFIADiBD4X6ABjBihFsIy6wmovgHNKssPsQOvGcADrQOQevMQvxKMBtANizqbP7l21+kB0UDxY92rVYCBMcD5H8CiEA=='}
>>> hll_merge_method(hll_set)
>>> 193
Is it possible to do this in any way using a library outside of BQ with the hash generated from it?
回答1:
That's a feature request you might already find in the issue tracker: the current hash is Google proprietary, but one day BigQuery could use an open one. Vote that request up.
- https://issuetracker.google.com/issues/62153424
There might be news soon, and subscribing to the issue will keep you updated.
2019 Update: Find the open source version of BigQuery's HyperLogLog++ at:
- https://github.com/google/zetasketch
来源:https://stackoverflow.com/questions/56301007/using-hll-count-merge-outside-of-sql