Read only n-th column of a text file which has no header with R and sqldf

问题

I have a similiar problem like this question: selecting every Nth column in using SQLDF or read.csv.sql

I want to read some columns of large files (table of 150rows, >500,000 columns, space separated, filled with numeric data and only a 32 bit system available). This file has no header, therefore the code in the thread above didn't work and I decided to write a new post.

Do you have an idea to solve this problem?

I thought about something like that, but any results with fread or read.table are also ok:

MyConnection <- file("path/file.txt")
df<-sqldf("select column 1 100 1000 235612 from MyConnection",file.format = list(header=F,sep=" "))

回答1:

You can use substr to specify the start and end position of the columns you want to read in if they are fixed width:

x <- tempfile()
cat("12345", "67890", "09876", "54321", sep = "\n", file = x)

myfile <- file(x)

sqldf("select substr(V1, 1, 1) var1, substr(V1, 3, 5) var2 from myfile")
#   var1 var2
# 1    1  345
# 2    6  890
# 3    9   76
# 4    5  321

See this blog post for some more examples. The "select" statement can easily be constructed with paste if you know the details about the column starting positions and widths.

来源：https://stackoverflow.com/questions/19706927/read-only-n-th-column-of-a-text-file-which-has-no-header-with-r-and-sqldf

标签

sql

bigdata

read.table

sqldf

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!