dynamically create string from three data frames

问题

Dynamically create string from pandas column

I have three data frame like below one is df and another one is anomalies:-

d = {'10028': [0], '1058': [25], '20120': [29], '20121': [22],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}
    
    df1 = pd.DataFrame(data=d)

Basically anomalies in a mirror copy of df just in anomalies the value will be 0 or 1 which indicates anomalies where value is 1 and non-anomaly where value is 0

d = {'10028': [0], '1058': [1], '20120': [1], '20121': [0],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}

df2 = pd.DataFrame(data=d)

And a third data frame like below:-

d = {'10028': ['US,IN'], '1058': ['NA, JO, US'], '20120': [''], '20121': ['US,PK'],'20122': ['IN'], '20123': ['Us,LN'], '5043': ['AI,AL'], '5046': ['AA,AB']}

df3 = pd.DataFrame(data=d)

and I am converting that into a specific format with the below code:-

details = (
        '\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly' + '\t' + 'Country' 
        '\n' + '10028:' + '\t'+ '\t' + str(df1.tail(1)['10028'][0]) + '\t' + str(df2['10028'][0]) + '\t'+ str(df3['10028'][0]) + 
        '\n' + '1058:' + '\t' + '\t' + str(df1.tail(1)['1058'][0]) + '\t' + str(df2['1058'][0]) + '\t'+ str(df3['1058'][0]) +
        '\n' + '20120:' + '\t' +'\t' + str(df1.tail(1)['20120'][0]) + '\t' + str(df2['20120'][0]) + '\t'+ str(df3['20120'][0]) +
        '\n' + '20121:' + '\t' + '\t' +str(round(df1.tail(1)['20121'][0], 2)) + '\t' + str(df2['20121'][0]) + '\t'+ str(df3['20121'][0]) +
        '\n' + '20122:' + '\t' + '\t' +str(round(df1.tail(1)['20122'][0], 2)) + '\t' + str(df2['20122'][0]) + '\t'+str(df3['20122'][0]) +
        '\n' + '20123:' + '\t' + '\t' +str(round(df1.tail(1)['20123'][0], 3)) + '\t' + str(df2['20123'][0]) + '\t'+str(df3['20123'][0]) +
        '\n' + '5043:' + '\t' + '\t' +str(round(df1.tail(1)['5043'][0], 3)) + '\t' + str(df2['5043'][0]) + '\t'+str(df3['5043'][0]) +
        '\n' + '5046:' + '\t' + '\t' +str(round(df1.tail(1)['5046'][0], 3)) + '\t' + str(df2['5046'][0]) + '\t'+str(df3['5046'][0]) +
        '\n\n' + 'message:' + '\t' +
        'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
            )

The problem is the column values are changing always in every run I mean like in this run its '10028', '1058', '20120', '20121', '20122', '20123', '5043', '5046' but maybe in next run it will be '10029', '1038', '20121', '20122', '20123', '5083', '5946'

How I can create the details dynamically depending on what columns are present in the data frame as I don't want to hard code and in the message I want to pass the name of columns whose value is 1.

The value of columns will always be either 1 or 0 for df1 and df2 and for df3 either a list or blank.

Expected Output:-

For two data frames I got a working solution which is below :-

# first part of the string
s = '\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly' 

# dynamically add the data
for idx, val in df1.iloc[-1].iteritems():
    s += f'\n{idx}\t{val}\t{df2[idx][0]}' 
# last part
s += ('\n\n' + 'message:' + '\t' +
      'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
     )

and if the matching value is not present then print null

回答1:

To obtain the expected result, you can do the following (the input data must be the dictionaries as shown in question, if not, please provide the real input data):

import pandas as pd

final_d = []
d = {'10028': 0, '1058': 25, '20120': 29, '20121': 22,'20122': 0, '20123': 0, '5043': 0, '5046': 0}
final_d.append(d)

d = {'10028': 0, '1058': 1, '20120': 1, '20121': 0,'20122': 0, '20123': 0, '5043': 0, '5046': 0, '91111':0}
final_d.append(d)

d = {'10028': ['US','IN'], '1058': ['NA', 'JO', 'US'], '20120': [''], '20121': ['US','PK'],'20122': ['IN'], '20123': ['Us','LN'], '5043': ['AI','AL'], '5046': ['AA','AB'], '00000':['kk','dd','ee']}
final_d.append(d)

# Now, we will merge the dictionaries on key
data = {}
for i, dt in enumerate(final_d):
    for k,v in dt.items():
        if k in data:
            if type(v)==list:
                data[k][i] = ','.join(v)
            else:
                data[k][i] = v
        else:
            data[k] = ['']*len(final_d)
            if type(v)==list:
                data[k][i] = ','.join(v)
            else:
                data[k][i] = v
maxlen = max([len(v) for v in data.values()])
data = {k:v if len(v)==maxlen else v+['']*(maxlen-len(v)) for k,v in data.items()}

# Creating the base dataframe
df = pd.DataFrame.from_dict(data)

# Converting the column headers (metric names) into a row in the dataframe
df = pd.concat([pd.DataFrame.from_dict({k:[v] for k,v in zip(df.columns.tolist(), df.columns.tolist())}), df], ignore_index=True)

# removing column names
df.columns = [''] * len(df.columns)

# organising the dataframe according to your required output
result = df.T.reset_index(drop=True)

# Adding the column names as required
result.columns = ['Metric Name', 'Count', 'Anomaly', 'Country']

# Voila!
print(result.to_string(index=False))

The generated dataframe:

Metric Name Count Anomaly   Country
      10028     0       0     US,IN
       1058    25       1  NA,JO,US
      20120    29       1          
      20121    22       0     US,PK
      20122     0       0        IN
      20123     0       0     Us,LN
       5043     0       0     AI,AL
       5046     0       0     AA,AB
      91111             0          
      00000                kk,dd,ee

来源：https://stackoverflow.com/questions/66003733/dynamically-create-string-from-three-data-frames

标签

python

python-3.x

dataframe

dictionary