问题
I'm dealing with a csv table converted from a pdf with Adobe acrobat pro. For some reason the software creates recurrent error each 117 rows. It "duplicates & concatenates the numbers" e.g. a row
7307 1 87.1
is transformed into something of this sort:
73077307 11 87187.1
how I can "correct" these rows with python? I would need to split the float in the middle and erase the first half.
I have read several threads about truncation but most of them need to split floats by the decimal point or deal just with integers. The data type would be float64 because I'm using the pandas read_csv function to read the csv.
df = pd.read_csv('path/file.csv',sep=';',index_col='Rang', na_values=['NA'])
df.dropna(how="all", inplace=True) # drop empty rows (an additional issue)
df[(df.index >10000)]
EDIT1: Code added, I thought I could identify the wrong ones because I have 1 row per hour of the year. Any line with index bigger than 365*24=8760 is wrong. But I see now that is not sufficient. One could loop the dataframe and if the index of row(i+1)-index of row (i) is greater than one then it needs correction. But I'm a begginer in python.. I'm not sure how to write that, but that's kind of a different problem.
I'm using python version 2.7.8 pandas v. 0.14.1
many thanks!
回答1:
Grab each space-delimited word into a list as a string. For each item in that list, check if len of word is even or odd. If even, replace word with right half of word. If odd (because of '.' on right side) grab right half rounding up (so 5 right-most chars in 9 char word, for example). Add conversion to float64 when replacing each word.
来源:https://stackoverflow.com/questions/28444525/split-float-numbers-in-half-with-python