How to find a columns set for a primary key candidate in CSV file?

前端 未结 2 1366
小蘑菇
小蘑菇 2021-01-13 18:54

I have a CSV file (not normalized, example, real file up to 100 columns):

   ID, CUST_NAME, CLIENT_NAME, PAYMENT_NUM, START_DATE, END_DATE
    1,     CUST1,         


        
2条回答
  •  广开言路
    2021-01-13 19:26

    pandas and itertools will give you what you're looking for.

    import pandas
    from itertools import chain, combinations
    
    def key_options(items):
        return chain.from_iterable(combinations(items, r) for r in range(1, len(items)+1) )
    
    df = pandas.read_csv('test.csv');
    
    # iterate over all combos of headings, excluding ID for brevity
    for candidate in key_options(list(df)[1:]):
        deduped = df.drop_duplicates(candidate)
    
        if len(deduped.index) == len(df.index):
            print ','.join(candidate)
    

    This gives you the output:

    PAYMENT_NUM, END_DATE
    CUST_NAME, CLIENT_NAME, END_DATE
    CUST_NAME, PAYMENT_NUM, END_DATE
    CLIENT_NAME, PAYMENT_NUM, END_DATE
    PAYMENT_NUM, START_DATE, END_DATE
    CUST_NAME, CLIENT_NAME, PAYMENT_NUM, END_DATE
    CUST_NAME, CLIENT_NAME, START_DATE, END_DATE
    CUST_NAME, PAYMENT_NUM, START_DATE, END_DATE
    CLIENT_NAME, PAYMENT_NUM, START_DATE, END_DATE
    CUST_NAME, CLIENT_NAME, PAYMENT_NUM, START_DATE, END_DATE
    

提交回复
热议问题