Loop through multiple CSV files and run a script

后端未结

关注

 4  1312

I have a script which pulls in data from a csv file, does some manipulations to it and creates an output excel file. But, its a tedious process as I need to do it for multiple f

相关标签:

4条回答

轮回少年

2021-01-28 16:54

You can use Python's glob.glob() to get all of the CSV files from a given folder. For each filename that is returned, you could derive a suitable output filename. The file processing could be moved into a function as follows:

# Import libraries
import pandas as pd
import xlsxwriter
import glob
import os

def process_csv(input_filename, output_filename):
    # Get data
    df = pd.read_csv(input_filename)

    # Clean data
    cleanedData = df[['State','Campaigns','Type','Start date','Impressions','Clicks','Spend(INR)',
                    'Orders','Sales(INR)','NTB orders','NTB sales']]
    cleanedData = cleanedData[cleanedData['Impressions'] != 0].sort_values('Impressions', 
                                                                        ascending= False).reset_index()
    cleanedData.loc['Total'] = cleanedData.select_dtypes(pd.np.number).sum()
    cleanedData['CTR(%)'] = (cleanedData['Clicks'] / 
                            cleanedData['Impressions']).astype(float).map("{:.2%}".format)
    cleanedData['CPC(INR)'] = (cleanedData['Spend(INR)'] / cleanedData['Clicks'])
    cleanedData['ACOS(%)'] = (cleanedData['Spend(INR)'] / 
                            cleanedData['Sales(INR)']).astype(float).map("{:.2%}".format)
    cleanedData['% of orders NTB'] = (cleanedData['NTB orders'] / 
                                    cleanedData['Orders']).astype(float).map("{:.2%}".format)
    cleanedData['% of sales NTB'] = (cleanedData['NTB sales'] / 
                                    cleanedData['Sales(INR)']).astype(float).map("{:.2%}".format)
    cleanedData = cleanedData[['State','Campaigns','Type','Start date','Impressions','Clicks','CTR(%)',
                            'Spend(INR)','CPC(INR)','Orders','Sales(INR)','ACOS(%)',
                            'NTB orders','% of orders NTB','NTB sales','% of sales NTB']]

    # Create summary
    summaryData = cleanedData.groupby(['Type'])[['Spend(INR)','Sales(INR)']].agg('sum')
    summaryData.loc['Overall Snapshot'] = summaryData.select_dtypes(pd.np.number).sum()
    summaryData['ROI'] = summaryData['Sales(INR)'] / summaryData['Spend(INR)']

    # Push to excel
    writer = pd.ExcelWriter(output_filename, engine='xlsxwriter')
    summaryData.to_excel(writer, sheet_name='Summary')
    cleanedData.to_excel(writer, sheet_name='Overall Report')
    writer.save()

# Set system paths
INPUT_PATH = 'SystemPath//Downloads//'
OUTPUT_PATH = 'SystemPath//Downloads//Output//'

for csv_filename in glob.glob(os.path.join(INPUT_PATH, "*.csv")):
    name, ext = os.path.splitext(os.path.basename(csv_filename))
    # Create an output filename based on the input filename
    output_filename = os.path.join(OUTPUT_PATH, f"{name}Output.xlsx")
    process_csv(csv_filename, output_filename)

os.path.join() can be used as a safer way to join file paths together.

0 讨论(0)

鱼传尺愫

2021-01-28 17:02

try this:

import glob

files = glob.glob(INPUT_PATH + "*.csv")

for file in files:
    # Get data
    df = pd.read_csv(file)

    # Clean data
    #your cleaning code  

   # Push to excel
   writer = pd.ExcelWriter(OUTPUT_PATH + file.split("/")[-1].replace(".csv","_OUTPUT.xlxs", engine='xlsxwriter')

0 讨论(0)

清酒与你

2021-01-28 17:11

you can run this scrip inside a for loop:

for file in os.listdir(INPUT_PATH):
    if file.endswith('.csv') or file.endswith('.CSV'):
        INPUT_FILE = INPUT_PATH + '/' + file
        OUTPUT_FILE = INPUT_PATH  + '/Outputs/' + file.[:-4] + 'xlsx'

0 讨论(0)

悲&欢浪女

2021-01-28 17:15

Something like:

import os
import glob
import pandas as pd

os.chdir(r'path\to\folder') #changes folder path to working dir
filelist=glob.glob('*.csv') #creates a list of all csv files
for file in filelist:       #loops through the files
    df=pd.read_csv(file,...)
    #Do something and create a final_df
    final_df.to_excel(file[:-4],+'_output.xlsx',index=False) #excel with same name+ouput

0 讨论(0)