We\'ve got a healthy debate going on in the office this week. We\'re creating a Db to store proxy information, for the most part we have the schema worked out except for how
I would suggest looking at what type of queries you will be running to decide which format you adopt.
Only if you need to pull out or compare individual octets would you have to consider splitting them up into separate fields.
Otherwise, store it as a 4 byte integer. That also has the bonus of allowing you to use the MySQL built-in INET_ATON() and INET_NTOA() functions.
Storage:
If you are only going to support IPv4 addresses then your datatype in MySQL can be an UNSIGNED INT
which only uses 4 bytes of storage.
To store the individual octets you would only need to use UNSIGNED TINYINT
datatypes, not SMALLINTS
, which would use up 1 byte each of storage.
Both methods would use similar storage with perhaps slightly more for separate fields for some overhead.
More info:
Performance:
Using a single field will yield much better performance, it's a single comparison instead of 4. You mentioned that you will only run queries against the whole IP address, so there should be no need to keep the octets separate. Using the INET_*
functions of MySQL will do the conversion between the text and integer representations once for the comparison.
for both ipv4 and ipv6 compatibility, use VARBINARY(16) , ipv4's will always be BINARY(4) and ipv6 will always be BINARY(16), so VARBINARY(16) seems like the most efficient way to support both. and to convert them from the normal readable format to binary, use INET6_ATON('127.0.0.1'), and to reverse that, use INET6_NTOA(binary)
A BIGINT
is 8
bytes in MySQL
.
To store IPv4
addresses, an UNSINGED INT
is enough, which I think is what you shoud use.
I can't imagine a scenario where 4
octets would gain more performance than a single INT
, and the latter is much more convenient.
Also note that if you are going to issue queries like this:
SELECT *
FROM ips
WHERE ? BETWEEN start_ip AND end_ip
, where start_ip
and end_ip
are columns in your table, the performance will be poor.
These queries are used to find out if a given IP
is within a subnet range (usually to ban it).
To make these queries efficient, you should store the whole range as a LineString
object with a SPATIAL
index on it, and query like this:
SELECT *
FROM ips
WHERE MBRContains(?, ip_range)
See this entry in my blog for more detail on how to do it:
Efficient transformation of ip to int and int to ip (could be useful to you): (PERL)
sub ip2dec {
my @octs = split /\./,shift;
return ($octs[0] << 24) + ($octs[1] << 16) + ($octs[2] << 8) + $octs[3];
}
sub dec2ip {
my $number = shift;
my $first_oct = $number >> 24;
my $reverse_1_ = $number - ($first_oct << 24);
my $secon_oct = $reverse_1_ >> 16;
my $reverse_2_ = $reverse_1_ - ($secon_oct << 16);
my $third_oct = $reverse_2_ >> 8;
my $fourt_oct = $reverse_2_ - ($third_oct << 8);
return "$first_oct.$secon_oct.$third_oct.$fourt_oct";
}
Use PostgreSQL, there's a native data type for that.
More seriously, I would fall into the "one 32-bit integer" camp. An IP address only makes sense when all four octets are considered together, so there's no reason to store the octets in separate columns in the database. Would you store a phone number using three (or more) different fields?
Having seperate fields doesn't sound particularly sensible to me - much like splitting a zipcode into sections or a phone number.
Might be useful if you wanted specific info on the sections, but I see no real reason to not use a 32 bit int.