问题
I have the following code:
raw_test <- fread("avito_test.tsv", nrows = intNrows, skip = intSkip)
Which produces the following error:
Error in fread("avito_test.tsv", nrows = intNrows, skip = intSkip, autostart = (intSkip + :
Expected sep (',') but new line, EOF (or other non printing character) ends field 14 on line 1003 when detecting types: 10066652 ТранÑпорт Ðвтомобили Ñ Ð¿Ñ€Ð¾Ð±ÐµÐ³Ð¾Ð¼ Nissan R Nessa, 1998 Ð¢Ð°Ñ€Ð°Ð½Ñ‚Ð°Ñ Ð² отличном ÑоÑтоÑнии. на прошлой неделе возили на тех. ОбÑлуживание. Ð’ дорожных неприÑтноÑÑ‚ÑÑ… не был учаÑтником. Детали кузова без коцок и терок. ПредназначалаÑÑŒ Ð´Ð»Ñ Ð¿Ð¾ÐµÐ·Ð´Ð¾Ðº на природу, Отдам только в добрые руки. Ð’ Ñалон не поÑтавлю не звоните "{""Марка"":""Nissan"", ""Модель"":""R Nessa"", ""Год выпуÑка"":""1998"", ""Пробег"":""180 000 - 189 999"", ""Тип кузова"":""МинивÑн"", ""Цвет"":""Оранжевый"", ""Объём двигателÑ"":""2.4"", ""Коробка передач"":""МеханичеÑкаÑ
I have tried changing it to this:
raw_test <- fread("avito_test.tsv", nrows = intNrows, skip = intSkip, autostart = (intSkip + 2))
Which is based on what I read on a similar question skip and autostart in fread
However, it produces a similar error as above.
How can I skip the first 1000 rows, and read the next thousand? My expected output is 1000 rows total, skipping the first thousand from my CSV file, and reading the second thousand.
Note: Reading the file with raw_test <- fread("avito_test.tsv", nrows = 1000, skip = -1)
works well for getting me only the first thousand, but I am trying to get only the second thousand.
Edit: The data is publicly available at http://www.kaggle.com/c/avito-prohibited-content/data
Edit: Environment and package info:
> packageVersion("data.table")
[1] ‘1.9.3’
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
来源:https://stackoverflow.com/questions/24759346/fread-skip-and-autostart-issue