I've looked at the 'Pass R variable to RODBC's sqlQuery with multiple entries? ' already but can't seem to get it to work. I'm trying to do an sqlQuery()
from R on a SQL Server 2008 R2 db. I'm trying to get a sample from a large db based on row numbers. First I created a list of random numbers:
sampRowNum <- sample(seq(1,100000,1), 5000)
Then I try to use those numbers in a query using:
query1 <- sqlQuery(channel, paste("select *
FROM db where row_id in (", sampRowNum,")", sep=""))
I get just the results from the db where the row_id
is equal to the first number in sampRowNum
. Any suggestions?
You're not paste
ing your query together correctly.
If you run the paste
statement in isolation, you'll see that you get a vector of length 5000, so sqlQuery
is only executing the first one of those, corresponding to the first element in samRowNum
.
What you want to do is something more like this:
paste("select * FROM db where row_id in (",
paste(sampRowNum,collapse = ","),")", sep="")
Just as an added note (and since I've had to do stuff like this a lot...) constructing sql queries with an IN
clause with strings is a bit more of a nuisance, since you have to tack on all the single quotes:
vec <- letters[1:5]
paste("SELECT * FROM db WHERE col IN ('",
paste(vec,collapse = "','"),"')",sep = "")
[1] "SELECT * FROM db WHERE col IN ('a','b','c','d','e')"
If you do this a lot, you'll end up writing a little function that does that pasting of character vectors for you.
As always, this kind of SQL string manipulation is Not Good if you are dealing with user inputs (e.g. in a web app), due to SQL injection attacks. In my particular situation this isn't much of a concern, but in general people will prefer parametrized queries if you don't have much control over the input values.
来源:https://stackoverflow.com/questions/8639221/r-rodbc-putting-list-of-numbers-into-an-in-statement