Why does Flink SQL use a cardinality estimate of 100 rows for all tables?

旧时模样 提交于 2019-12-20 03:01:27

问题


I wasn't sure why the logical plan wasn't correctly evaluated in this example.

I looked more deeply in the Flink base code and I checked that when calcite evaluate/estimate the number of rows for the query in object. For some reason it returns always 100 for any table source.

In Flink in fact, during the process of the program plan creation, for each transformed rule it is called the VolcanoPlanner class by the TableEnvironment.runVolcanoPlanner. The planner try to optimise and calculate some estimation by calling RelMetadataQuery.getRowCount

I reproduced the error by creating a failing test which should assert 0 as row count for relation table 'S' but it returns always 100.

Why this is happening? Does anyone has an answer to this issue?


回答1:


In the current version (1.7.1, Jan 2019), Flink's relational APIs (Table API and SQL) do not attempt to estimate the cardinality of base tables. Hence, Calcite uses its default value which is 100.

This works fine for basic optimizations like filter and projection push-down and is currently sufficient because Flink does not (yet) reorder joins.

The only way to inject cardinality estimates for tables is via an ExternalCatalog.



来源:https://stackoverflow.com/questions/54101174/why-does-flink-sql-use-a-cardinality-estimate-of-100-rows-for-all-tables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!