I have a CSV file (not normalized, example, real file up to 100 columns):
ID, CUST_NAME, CLIENT_NAME, PAYMENT_NUM, START_DATE, END_DATE
1, CUST1,
pandas and itertools will give you what you're looking for.
import pandas
from itertools import chain, combinations
def key_options(items):
return chain.from_iterable(combinations(items, r) for r in range(1, len(items)+1) )
df = pandas.read_csv('test.csv');
# iterate over all combos of headings, excluding ID for brevity
for candidate in key_options(list(df)[1:]):
deduped = df.drop_duplicates(candidate)
if len(deduped.index) == len(df.index):
print ','.join(candidate)
This gives you the output:
PAYMENT_NUM, END_DATE
CUST_NAME, CLIENT_NAME, END_DATE
CUST_NAME, PAYMENT_NUM, END_DATE
CLIENT_NAME, PAYMENT_NUM, END_DATE
PAYMENT_NUM, START_DATE, END_DATE
CUST_NAME, CLIENT_NAME, PAYMENT_NUM, END_DATE
CUST_NAME, CLIENT_NAME, START_DATE, END_DATE
CUST_NAME, PAYMENT_NUM, START_DATE, END_DATE
CLIENT_NAME, PAYMENT_NUM, START_DATE, END_DATE
CUST_NAME, CLIENT_NAME, PAYMENT_NUM, START_DATE, END_DATE