问题
I have some data in Notepad that is a mess. There is basically no space between any of the different columns which hold different data. I know the spaces for the data. For example, Columns 1-2 are X, Columns 7-10 are Y....
How can I organize this? Can it be done in R? What is the best way to do this?
回答1:
?read.fwf
may be a good bet for this circumstance.
Set the path to the file:
temp <- "\pathto\file.txt"
Then set the widths of the variables within the file, as demonstrated below.
#1-2 = x, 3-10=y
widths <- c(2,8)
Then set the names of the columns.
cols <- c("X","Y")
Finally, import the data into a new variable in your session:
dataset <- read.fwf(temp,widths,header=FALSE,col.names=cols)
回答2:
Something I've done in the past to handle that kind of mess is actually import it into excel as delimited width text, then save as a CSV.
Just a suggestion for you. If it's a one off project then that should be fine. no coding at all. But if it's a repeat offender... then you might look at regular expressions.
i.e. ^(.{6})(.{7})(.{2})(.{5})$ for 4 fields of 6,7,2 and 5 characters width in order.
来源:https://stackoverflow.com/questions/11571148/organizing-messy-notepad-data