I have lot\'s of data to insert (SET \\ INCR) to redis DB, so I\'m looking for pipeline \\ mass insertion through node.js.
I couldn\'t find any good exa
You might want to look at batch()
too. The reason why it'd be slower with multi()
is because it's transactional. If something failed, nothing would be executed. That may be what you want, but you do have a choice for speed here.
The redis-stream package doesn't seem to make use of Redis' mass insert functionality so it's also slower than the mass insert Redis' site goes on to talk about with redis-cli
.
Another idea would be to use redis-cli and give it a file to stream from, which this NPM package does: https://github.com/almeida/redis-mass
Not keen on writing to a file on disk first? This repo: https://github.com/eugeneiiim/node-redis-pipe/blob/master/example.js
...also streams to Redis, but without writing to file. It streams to a spawned process and flushes the buffer every so often.
On Redis' site under mass insert (http://redis.io/topics/mass-insert) you can see a little Ruby example. The repo above basically ported that to Node.js and then streamed it directly to that redis-cli
process that was spawned.
So in Node.js, we have:
var redisPipe = spawn('redis-cli', ['--pipe']);
spawn()
returns a reference to a child process that you can pipe to with stdin
. For example: redisPipe.stdin.write()
.
You can just keep writing to a buffer, streaming that to the child process, and then clearing it every so often. This then won't fill it up and will therefore be a bit better on memory than perhaps the node_redis
package (that literally says in its docs that data is held in memory) though I haven't looked into it that deeply so I don't know what the memory footprint ends up being. It could be doing the same thing.
Of course keep in mind that if something goes wrong, it all fails. That's what tools like fluentd were created for (and that's yet another option: http://www.fluentd.org/plugins/all - it has several Redis plugins)...But again, it means you're backing data on disk somewhere to some degree. I've personally used Embulk to do this too (which required a file on disk), but it did not support mass inserts, so it was slow. It took nearly 2 hours for 30,000 records.
One benefit to a streaming approach (not backed by disk) is if you're doing a huge insert from another data source. Assuming that data source returns a lot of data and your server doesn't have the hard disk space to support all of it - you can stream it instead. Again, you risk failures.
I find myself in this position as I'm building a Docker image that will run on a server with not enough disk space to accommodate large data sets. Of course it's a lot easier if you can fit everything on the server's hard disk...But if you can't, streaming to redis-cli
may be your only option.
If you are really pushing a lot of data around on a regular basis, I would probably recommend fluentd to be honest. It comes with many great features for ensuring your data makes it to where it's going and if something fails, it can resume.
One problem with all of these Node.js approaches is that if something fails, you either lose it all or have to insert it all over again.