python multiline regex to get optional group in parentheses

独自空忆成欢 提交于 2019-12-24 04:32:10

问题


I am struggling to write a regex to basically get 4 groups for both tables

  1. Table Name e.g. table_v1
  2. Table columns in first () after table name
  3. Primary key values in ()
  4. Optionally Value in () if CLUSTERING ORDER is there

I tried this, mostly works except cannot get cluster order value .

EDIT: HERE IS A FAILING DEMO

re.compile("CREATE\s+TABLE\s+(?:[a-z][a-z0-9_]*).*?((?:[a-z][a-z0-9_"]*)).*?(\(.*?\)) WITH.*?(\(.*?\)).*?;").findall(string_below)

Here is the String trying to run above regex on.

CREATE TABLE abcdeg.table_v1 (
    "id" text,
    "obj" text,
    "version" bigint,
    output text,
    server text,
    PRIMARY KEY ("id", "obj", "version")
) WITH CLUSTERING ORDER BY ("id" ASC, "version" DESC)
    AND bloom_filter_fp_chance = 0.1
    AND comment = ''
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

CREATE TABLE abcdeg.result_v1 (
    "id" text,
    "obj" text,
    time int,
    PRIMARY KEY (("id", "obj"))
) WITH bloom_filter_fp_chance = 0.1
    AND comment = ''
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND speculative_retry = '99.0PERCENTILE';

CREATE TABLE abcdeg.result_v2 (
    "id" text PRIMARY KEY,
    "obj" text,
    time int
) WITH bloom_filter_fp_chance = 0.1
    AND comment = ''
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND speculative_retry = '99.0PERCENTILE';

回答1:


I can't test this in python, but it works in the demo. Captures table name into group 1, fields into group 2, primary keys (even with double parantheses) into group3, and if Clustering Order is present, into group 5

CREATE\sTABLE\s.+?\.(\w+)\s\(\n?(.+?)\n\s*PRIMARY\sKEY\s\(+([^)]*)\)+.*?(CLUSTERING\sORDER\sBY\s\(([^)]+)\)|;)

Demo




回答2:


I could write a regex that handles these two SQL-s properly. However, this solution is brittle and hard to understand. It's brittle because there can be other SQL constructs that we haven't considered yet. And it's hard to understand, because, you know, just look at this:

CREATE\s+TABLE[^.]*\.(\S*)\s*\((.*?)PRIMARY KEY \(?\(([^)]*)\)\)?\s*\)\s*(?:WITH CLUSTERING ORDER BY \(([^)]*)\))?

The trick was to say what you mean. Instead of lazy quantifiers I used negated character classes: \([^)]\). This way we can retrieve the contents of a parentheses effectively and without unexpected matches.

Demo here.

I suggest to take a different approach. You can use an SQL parser. SQLParse looks promising. I haven't worked with it though.



来源:https://stackoverflow.com/questions/49444285/python-multiline-regex-to-get-optional-group-in-parentheses

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!