问题
I'm trying to create a table from a csv file comma separated. I'm aware that not all the rows have the same number of elements so I would write some code to eliminate those rows. The problem is that there are rows that include numbers (in thousands) which include another comma as well. I'm not capable of splitting those rows properly, here's my code:
pURL <- "http://financials.morningstar.com/ajax/exportKR2CSV.html?&callback=?&t=EI®ion=FRA&order=asc"
res <- read.table(pURL, header=T, sep='\t', dec = '.', stringsAsFactors=F)
x <- unlist( lapply(keyRatios, function(u) strsplit(u,split='\n')) [[1]] )
回答1:
You need to make use of the quote =
argument of either read.table
or read.delim
...
res <- read.delim( pURL, header=F, sep=',', dec = '.', stringsAsFactors=F , quote = "\"" , fill = TRUE , skip = 2 )
The seperator is ","
not "\t"
. Numbers written as thousands of millions are always quoted in this file so you can use the quote
argument to make R ignore the comma inside the quotes with quote = "\""
, and you want to skip the first two lines, and use fill = TRUE
to fill in blanks on uneven lines.
head( res )
# 2003-12 2004-12 2005-12 2006-12 2007-12 2008-12 2009-12 2010-12 2011-12 2012-12 TTM
#2 Revenue EUR Mil 2,116 2,260 2,424 2,690 2,908 3,074 3,268 3,892 4,190 4,989 5,034
#3 Gross Margin % 60.6 60.3 57.3 58.2 57.6 56.9 56.1 55.5 55.4 55.8 56.1
#4 Operating Income EUR Mil 365 404 394 460 505 515 555 618 683 832 841
#5 Operating Margin % 17.2 17.9 16.2 17.1 17.4 16.7 17.0 15.9 16.3 16.7 16.7
#6 Net Income EUR Mil 200 227 289 331 371 389 402 472 518 584 594
#7 Earnings Per Share EUR 3.90 4.30 5.44 6.22 3.48 3.62 3.78 4.36 4.82 2.77 2.80
I set the column names of res
afterwards like this...
names( res ) <- res[1,]; res <- res[-1,]
It gave better formatting.
来源:https://stackoverflow.com/questions/19400282/read-table-with-comma-separated-values-and-also-commas-inside-each-element