I have a lot of .csv files with the following format.
338,800
338,550
339,670
340,600
327,500
301,430
299,350
284,339
284,338
283,335
283,330
283,310
282,310
282,300
282,300
283,290
From column 1, I wanted to read current row and compare it with the value of the previous row. If it is greater OR equal, continue comparing and if the value of the current cell is smaller than the previous row - then i divide the current value with the previous value and proceed. For example in the table given above: the smaller value we will get depending on my requirement from Column 1 is 327 (because 327 is smaller than the previous value 340) - and then we divide 327 by 340 and we get the value 0.96. My python script should exit right after we print the criteria (A) as given below.
from __future__ import division
import csv
def category(val):
if 0.8 < val <= 0.9:
return "A"
if abs(val - 0.7) < 1e-10:
return "B"
if 0.5 < val < 0.7:
return "C"
if abs(val - 0.5) < 1e-10:
return "E"
return "D"
with open("test.csv", "r") as csvfile:
ff = csv.reader(csvfile)
results = []
previous_value = 0
for col1, col2 in ff:
if not col1.isdigit():
continue
value = int(col1)
if value >= previous_value:
previous_value = value
continue
else:
result = int(col1)/ int(previous_value)
results.append(result)
print category(result)
previous_value = value
print (results)
print (sum(results))
print (category(sum(results) / len(results)))
Finally, i want to run my scrip for all the .csv files i have in the current directory and build a confusion matrix like the following. Let's say A1.csv
, A2.csv
, A3.csv
are supposed (or predicted) to print A, B1.csv
, B2.csv
, B3.csv
are supposed (or predicted) to print B
and C1.csv
, C2.csv
and C3.csv
are supposed (or predicted) to print C, ... etc. How can we automatically create a confusion matrix from multiple .csv
files for example like the following using Python
?
As it is shown below, the colored blocks of the matrix (row-labels) will show us the number of counts
of A (count of true values for A), B (count of true values for b) and C (count of true values for C), ..etc from the control logic of our function category()
- given above. The column labels from the control logic we have inside the if-else statement (A, B, C, D and E).
Add a def get_predict(filename)
def get_predict(filename):
if 'Alex' in filename:
return 'Alexander'
else:
return filename [0]
Reading n files, compute confusion matrix using pandas crosstab
:
import os
import pandas as pd
def get_category(filepath):
def category(val):
print('predict({}; abs({})'.format(val, abs(val)))
if 0.8 < val <= 0.9:
return "A"
if abs(val - 0.7) < 1e-10:
return "B"
if 0.5 < val < 0.7:
return "C"
if abs(val - 0.5) < 1e-10:
return "E"
return "D"
with open(filepath, "r") as csvfile:
ff = csv.reader(csvfile)
results = []
previous_value = 0
for col1, col2 in ff:
value = int(col1)
if value >= previous_value:
previous_value = value
else:
results.append(value / previous_value)
previous_value = value
return category(sum(results) / len(results))
matrix = {'actual':[], 'predict':[]}
path = 'test/confusion'
for filename in os.listdir( path ):
# The first Char in filename is Predict Key
matrix['predict'].append(filename[0])
matrix['actual'].append(get_category(os.path.join(path, filename)))
df = pd.crosstab(pd.Series(matrix['actual'], name='Actual'),
pd.Series(matrix['predict'], name='Predicted')
)
print(df)
Output: (Reading "A.csv, B.csv, C.csv" with the given example Data three times)
Predicted A B C Actual A 3 0 0 B 0 3 0 C 0 0 3
Tested with Python:3.4.2 - pandas:0.19.2
Using Scikit-Learn
is the best option to go for in your case as it provides a confusion_matrix
function. Here is an approach you can easily extend.
from sklearn.metrics import confusion_matrix
# Read your csv files
with open('A1.csv', 'r') as readFile:
true_values = [int(ff) for ff in readFile]
with open('B1.csv', 'r') as readFile:
predictions = [int(ff) for ff in readFile]
# Produce the confusion matrix
confusionMatrix = confusion_matrix(true_values, predictions)
print(confusionMatrix)
This is the output you would expect.
[[0 2]
[0 2]]
For more hint - check out the following link:
来源:https://stackoverflow.com/questions/44215561/creating-confusion-matrix-from-multiple-csv-files