save IP address in mongoDB

。_饼干妹妹 提交于 2019-12-28 16:51:30

问题


Currently in order to save an IP address I am converting it to number and store it in the collection. Basically I am doing this for logging purposes. This means that I care to store information as fast as possible and with smallest amount of space.

I will be rarely using it for querying.

My ideas that

  • Storing as strings is for sure inefficient.
  • Storing as 4 digits will be slower and will take more space.

Nonetheless I think that this is an adequate method, but is there a better one for my purpose?


回答1:


Definitely save IP addresses as numbers, if you don't mind the extra bit of work that it takes, especially if you need to do queries on the addresses and you have large tables/collections.

Here's why:

Storage

  • An IPv4 address is 4 bytes if stored as unsigned integer.
  • An IPv4 address varies between 10 bytes and 18 bytes when written out as a string in dotted octed form. (Let's assume the average is 14 bytes.)

That is 7-15 bytes for the characters, plus 2-3 bytes if you're using a variable length string type, which varies based on the database you're using. If you have a fixed length string representation available, then you must use a 15-character fixed width field.

Disk storage is cheap, so that's not a factor in most use cases. Memory, however, is not as cheap, and if you have a large table/collection and you want to do fast queries, then you need an index. The 2-3x storage penalty of string encoding drastically reduces the amount of records you can index while still keeping the index resident in memory.

  • An IPv6 address is 16 bytes if stored as an unsigned integer. (Likely as multiple 4 or 8 byte integers, depending on your platform.)
  • An IPv6 address ranges from 6 bytes to 42 bytes when encoded as a string in abbreviated hex notation.

On the low end, a loop back address (::1) is 3 bytes plus the variable length string overhead. On the high end, an address like 2002:4559:1FE2:1FE2:4559:1FE2:4559:1FE2 uses 39 bytes plus the variable length string overhead.

Unlike with IPv4, it's not safe to assume the average IPv6 string length will be mean of 6 and 42, because the number of addresses with a significant number of consecutive zeroes is a very small fraction of the overall IPv6 address space. Only some special addresses, like loopback and autoconf addresses, are likely to be compressible in this way.

Again, this is a storage penalty of >2x for string encoding versus integer encoding.

Network Math

Do you think routers store IP addresses as strings? Of course they don't.

If you need to do network math on IP addresses, the string representation is a hassle. E.g. if you want to write a query that searches for all addresses on a specific subnet ("return all records with an IP address in 10.7.200.104/27", you can easily do this by masking an integer address with an integer subnet mask. (Mongo doesn't support this particular query, but most RDBMS do.) If you store addresses as strings, then your query will need to convert each row to an integer, then mask it, which is several orders of magnitude slower. (Bitwise masking for an IPv4 address can be done in a few CPU cycles using 2 registers. Converting a string to an integer requires looping over the string.)

Similarly, range queries ("return all records all records between 192.168.1.50 and 192.168.50.100") with integer addresses will be able to use indexes, whereas range queries on string addresses will not.

The Bottom Line

It takes a little bit more work, but not much (there are a million aton() and ntoa() functions out there), but if you're building something serious and solid and you want to future-proof it against future requirements and the possibility of a large dataset, you should store IP addresses as integers, not strings.

If you're doing something quick and dirty and don't mind the possibility of remodeling in the future, then use strings.

For the OP's purpose, if you are optimizing for speed and space and you don't think you want to query it often, then why use a database at all? Just print IP addresses to a file. That would be faster and more storage efficient than storing it in a database (with associated API and storage overhead).




回答2:


An efficient way to save a ip address as a int. If you want to tag a ip with cidr filter, a demo here:

> db.getCollection('iptag').insert({tags: ['office'], hostmin: 2886991873, hostmax: 2887057406, cidr: '172.20.0.0/16'})
> db.getCollection('iptag').insert({tags: ['server'], hostmin: 173867009, hostmax: 173932542, cidr: '10.93.0.0/16'})
> db.getCollection('iptag').insert({tags: ['server'], hostmin: 173932545, hostmax: 173998078, cidr: '10.94.0.0/16'})

Create tags index.

> db.getCollection('iptag').ensureIndex(tags: 1)

Filter ip with cidr range. ip2int('10.94.25.32') == 173938976.

> db.getCollection('iptag').find({hostmin: {$lte: 173938976}, hostmax: {$gte: 173938976}})



回答3:


An IPv4 is four bytes, so you can store it into a 32-bit integer (BSON type 16).

See http://docs.mongodb.org/manual/reference/bson-types




回答4:


Simplest way for IPv4 is to convert to int using the interesting Maths provided here.

I use the following function (js) to convert before matching with db

ipv4Number: function (ip) {
    iparray = ip.split(".");
    ipnumber = parseInt(iparray[3]) +
        parseInt(iparray[2]) * 256 +
        parseInt(iparray[1]) * Math.pow(256, 2) +
        parseInt(iparray[0]) * Math.pow(256, 3);
    if (parseInt(ipnumber) > 0)return ipnumber;
    return 0;
}


来源:https://stackoverflow.com/questions/15591234/save-ip-address-in-mongodb

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!