GeoIP table join with table of IP's in MySQL

前端 未结 4 1736
终归单人心
终归单人心 2021-01-02 23:38

I am having a issue finding a fast way of joining the tables looking like that:

mysql> explain geo_ip;
+--------------+------------------+------+-----+---         


        
相关标签:
4条回答
  • 2021-01-02 23:40

    Can't comment yet, but user1281376's answers is wrong and doesn't work. the reason you only use the first octet is because you aren't going to match all ip ranges otherwise. there's plenty of ranges that span multiple second octets which user1281376s changed query isn't going to match. And yes, this actually happens if you use the Maxmind GeoIp data.

    with aleksis suggestion you can do a simple comparison on the fîrst octet, thus reducing the matching set.

    0 讨论(0)
  • 2021-01-02 23:41

    I found a easy way. I noticed that all first ip in the group % 256 = 0, so we can add a ip_index table

    CREATE TABLE `t_map_geo_range` (
      `_ip` int(10) unsigned NOT NULL,
      `_ipStart` int(10) unsigned NOT NULL,
      PRIMARY KEY (`_ip`)
    ) ENGINE=MyISAM
    

    How to fill the index table

    FOR_EACH(Every row of ip_geo)
    {
        FOR(Every ip FROM ipGroupStart/256 to ipGroupEnd/256)
        {
            INSERT INTO ip_geo_index(ip, ipGroupStart);
        }
    }
    

    How to use:

    SELECT * FROM YOUR_TABLE AS A
    LEFT JOIN ip_geo_index AS B ON B._ip = A._ip DIV 256
    LEFT JOIN ip_geo AS C ON C.ipStart = B.ipStart;
    

    More than 1000 times Faster.

    0 讨论(0)
  • 2021-01-02 23:54

    This approach has some scalability issues (should you choose to move to, say, city-specific geoip data), but for the given size of data, it will provide considerable optimization.

    The problem you are facing is effectively that MySQL does not optimize range-based queries very well. Ideally you want to do an exact ("=") look-up on an index rather than "greater than", so we'll need to build an index like that from the data you have available. This way MySQL will have much fewer rows to evaluate while looking for a match.

    To do this, I suggest that you create a look-up table that indexes the geolocation table based on the first octet (=1 from 1.2.3.4) of the IP addresses. The idea is that for each look-up you have to do, you can ignore all geolocation IPs which do not begin with the same octet than the IP you are looking for.

    CREATE TABLE `ip_geolocation_lookup` (
      `first_octet` int(10) unsigned NOT NULL DEFAULT '0',
      `ip_numeric_start` int(10) unsigned NOT NULL DEFAULT '0',
      `ip_numeric_end` int(10) unsigned NOT NULL DEFAULT '0',
      KEY `first_octet` (`first_octet`,`ip_numeric_start`,`ip_numeric_end`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    

    Next, we need to take the data available in your geolocation table and produce data that covers all (first) octets the geolocation row covers: If you have an entry with ip_start = '5.3.0.0' and ip_end = '8.16.0.0', the lookup table will need rows for octets 5, 6, 7, and 8. So...

    ip_geolocation
    |ip_start       |ip_end          |ip_numeric_start|ip_numeric_end|
    |72.255.119.248 |74.3.127.255    |1224701944      |1241743359    |
    

    Should convert to:

    ip_geolocation_lookup
    |first_octet|ip_numeric_start|ip_numeric_end|
    |72         |1224701944      |1241743359    |
    |73         |1224701944      |1241743359    |
    |74         |1224701944      |1241743359    |
    

    Since someone here requested for a native MySQL solution, here's a stored procedure that will generate that data for you:

    DROP PROCEDURE IF EXISTS recalculate_ip_geolocation_lookup;
    
    CREATE PROCEDURE recalculate_ip_geolocation_lookup()
    BEGIN
        DECLARE i INT DEFAULT 0;
    
        DELETE FROM ip_geolocation_lookup;
    
        WHILE i < 256 DO
           INSERT INTO ip_geolocation_lookup (first_octet, ip_numeric_start, ip_numeric_end) 
                    SELECT  i, ip_numeric_start, ip_numeric_end FROM ip_geolocation WHERE 
                    ( ip_numeric_start & 0xFF000000 ) >> 24 <= i AND 
                    ( ip_numeric_end & 0xFF000000 ) >> 24 >= i;
    
           SET i = i + 1;
        END WHILE;
    END;
    

    And then you will need to populate the table by calling that stored procedure:

    CALL recalculate_ip_geolocation_lookup();
    

    At this point you may delete the procedure you just created -- it is no longer needed, unless you want to recalculate the look-up table.

    After the look-up table is in place, all you have to do is integrate it into your queries and make sure you're querying by the first octet. Your query to the look-up table will satisfy two conditions:

    1. Find all rows which match the first octet of your IP address
    2. Of that subset: Find the row which has the the range that matches your IP address

    Because the step two is carried out on a subset of data, it is considerably faster than doing the range tests on the entire data. This is the key to this optimization strategy.

    There are various ways for figuring out what the first octet of an IP address is; I used ( r.ip_numeric & 0xFF000000 ) >> 24 since my source IPs are in numeric form:

    SELECT 
        r.*, 
        g.country_code
    FROM 
        ip_geolocation g,
        ip_geolocation_lookup l,
        ip_random r
    WHERE 
        l.first_octet = ( r.ip_numeric & 0xFF000000 ) >> 24 AND 
        l.ip_numeric_start <= r.ip_numeric AND      
        l.ip_numeric_end >= r.ip_numeric AND 
        g.ip_numeric_start = l.ip_numeric_start;
    

    Now, admittedly I did get a little lazy in the end: You could easily get rid of ip_geolocation table altogether if you made the ip_geolocation_lookup table also contain the country data. I'm guessing dropping one table from this query would make it a bit faster.

    And, finally, here are the two other tables I used in this response for reference, since they differ from your tables. I'm certain they are compatible, though.

    # This table contains the original geolocation data
    
    CREATE TABLE `ip_geolocation` (
      `ip_start` varchar(16) NOT NULL DEFAULT '',
      `ip_end` varchar(16) NOT NULL DEFAULT '',
      `ip_numeric_start` int(10) unsigned NOT NULL DEFAULT '0',
      `ip_numeric_end` int(10) unsigned NOT NULL DEFAULT '0',
      `country_code` varchar(3) NOT NULL DEFAULT '',
      `country_name` varchar(64) NOT NULL DEFAULT '',
      PRIMARY KEY (`ip_numeric_start`),
      KEY `country_code` (`country_code`),
      KEY `ip_start` (`ip_start`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    
    
    # This table simply holds random IP data that can be used for testing
    
    CREATE TABLE `ip_random` (
      `ip` varchar(16) NOT NULL DEFAULT '',
      `ip_numeric` int(10) unsigned NOT NULL DEFAULT '0',
      PRIMARY KEY (`ip`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    
    0 讨论(0)
  • 2021-01-03 00:05

    Just wanted to give back to the community:

    Here's an even better and optimized way building on Aleksi's solution:

    DROP PROCEDURE IF EXISTS recalculate_ip_geolocation_lookup;
    
    DELIMITER ;;
    CREATE PROCEDURE recalculate_ip_geolocation_lookup()
    BEGIN
      DECLARE i INT DEFAULT 0;
    DROP TABLE `ip_geolocation_lookup`;
    
    CREATE TABLE `ip_geolocation_lookup` (
      `first_octet` smallint(5) unsigned NOT NULL DEFAULT '0',
      `startIpNum` int(10) unsigned NOT NULL DEFAULT '0',
      `endIpNum` int(10) unsigned NOT NULL DEFAULT '0',
      `locId` int(11) NOT NULL,
      PRIMARY KEY (`first_octet`,`startIpNum`,`endIpNum`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    
    INSERT IGNORE INTO ip_geolocation_lookup
    SELECT  startIpNum DIV 1048576 as first_octet, startIpNum, endIpNum, locId
    FROM ip_geolocation;
    
    INSERT IGNORE INTO ip_geolocation_lookup
    SELECT  endIpNum DIV 1048576 as first_octet, startIpNum, endIpNum, locId
    FROM ip_geolocation;
    
      WHILE i < 1048576 DO
        INSERT IGNORE INTO ip_geolocation_lookup
            SELECT i, startIpNum, endIpNum, locId 
            FROM ip_geolocation_lookup 
            WHERE first_octet = i-1
            AND endIpNum DIV 1048576 > i;
       SET i = i + 1;
      END WHILE;
    END;;
    DELIMITER ;
    
    CALL recalculate_ip_geolocation_lookup();
    

    It builds way faster than his solution and drills down more easily because we're not just taking the first 8, but the first 20 bits. Join performance: 100000 rows in 158ms. You might have to rename the table and field names to your version.

    Query by using

    SELECT ip, kl.*
    FROM random_ips ki
    JOIN `ip_geolocation_lookup` kb ON (ki.`ip` DIV 1048576 = kb.`first_octet` AND ki.`ip` >= kb.`startIpNum` AND ki.`ip` <= kb.`endIpNum`)
    JOIN ip_maxmind_locations kl ON kb.`locId` = kl.`locId`;
    
    0 讨论(0)
提交回复
热议问题