问题
This is example of my data in mysql, I use lib flashext.mysql and python 3
RT NK NB SU SK P TNI IK IB TARGET
84876 902 1192 2098 3623 169 39 133 1063 94095
79194 902 1050 2109 3606 153 39 133 806 87992
75836 902 1060 1905 3166 161 39 133 785 83987
75571 902 112 1878 3190 158 39 133 635 82618
83797 1156 134 1900 3518 218 39 133 709 91604
91648 1291 127 2225 3596 249 39 133 659 99967
The formula MinMax is
(data-min)/(max-min)*0.8+0.1
I got the code normalize data from csv
import pandas as pd
df = pd.read_csv("dataset.csv")
norm = (df - df.min()) / (df.max() - df.min() )*0.8 + 0.1
I know how to count it like this
(first data of RT - min column RT data) / (max column RT- min column RT) * 0.8 + 0.1
So does the next column
(first data of NK - min column NK data) / (max column NK- min column NK) * 0.8 + 0.1
Please help me, How to normalize data from database, it call "dataset" and normalize it and input in another table call "normalize"
回答1:
Here is a SQL query that should get you started (assuming you want to calculate it per column):
create table normalize as
select
(RT - min(RT)over()) / (max(RT)over() - min(RT)over()) * 0.8 + 0.1 as RT_norm
from test;
I tested this query in sqlite3, not MySQL. It isn't necessarily optimal, but intuitively follows the formula. Notice, the over
turns the min / max aggregate functions into window functions, which means they look at whole column, but the result is repeated on each row.
Todo
You would still need to:
- send the MySQl query via Python
- repeat the same code for each column
- give each column a name
- assign the resulting table to a schema (most likely)
- handle divide by 0 in case a column max and min are equal
来源:https://stackoverflow.com/questions/55084336/how-normalize-data-mining-min-max-from-mysql-in-python