All posts by Markus Popp

Locate IP addresses

It’s very easy to log the visitor’s IP address in a web application. In PHP, for example, a simple $_SERVER[‘REMOTE_ADDR’] returns the desired value.

But the IP address alone doesn’t tell us much about the person, sometimes it’s not even a person, but a search robot, who just entered the website. More interesting would be, where he (the person or the machine) lives.

I’d like to show you how MySQL 5 can help you to easily query the country from some source data that can be downloaded for free from MaxMind at http://www.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip.

First I’d recommand you to create a new database, if you have the privilege to do so. I have called my database geoips. Download the file mentioned above and extract the file GeoIPCountryWhois.csv from this zip file. The file looks like this:

"2.6.190.56","2.6.190.63","33996344","33996351","GB","United Kingdom"
"3.0.0.0","4.17.135.31","50331648","68257567","US","United States"
"4.17.135.32","4.17.135.63","68257568","68257599","CA","Canada"
"4.17.135.64","4.17.142.255","68257600","68259583","US","United States"
...

First we have to get this data into a new table, which I call ip_ranges. We create it in the geoips database using this CREATE command:

CREATE TABLE ip_ranges (
   ip1_temp char(16), 
   ip2_temp char(16), 
   ip_from int unsigned NOT NULL PRIMARY KEY, 
   ip_to int unsigned NOT NULL UNIQUE, 
   code char(2) NOT NULL, 
   country varchar(100) NOT NULL, 
   INDEX (code)) ENGINE = InnoDB;

I didn’t take much care about the ip1_temp and ip2_temp columns, because we only need them temporarily. We will make our searches based on the values in the third and forth column of the data in the csv file. Later we will put the code and country pairs into a separate table to normalize the data. Because we will put a foreign key on the code column, we put an index on it and we use the InnoDB storage engine. This isn’t a must, but it’s a good way to also show you how to accomplish this task ;-).

To import the csv file into the table we just created we have two possibilities, either via a SQL command or via the mysqlimport program that’s included in MySQL. I’ll show you both ways. You need the FILE privilege for both variants.

Open a console window and change to the directory where the GeoIPCountryWhois.csv file resides. Start your mysql client program with the command

mysql -h [host] -u [username] -p[password] geoips

Then execute this SQL statement:

LOAD DATA LOCAL INFILE 'GeoIPCountryWhois.csv' 
   INTO TABLE ip_ranges 
   FIELDS TERMINATED BY ',' 
   ENCLOSED BY '"' 
   LINES TERMINATED BY '\n';

Instead of starting your mysql client program, you can also use the mysqlimport program. Therefore, the name of the input file must be equal to the table name, so we rename the file GeoIPCountryWhois.csv to ip_ranges.csv.

move GeoIPCountryWhois.csv ip_ranges.csv (in Windows)
mv GeoIPCountryWhois.csv ip_ranges.csv (in Linux)

Then we can use this command (please write it in one line) to import the csv file:

mysqlimport -h [host] -u [username] -p[password] --local 
--fields-terminated-by="," --fields-enclosed-by="\"" 
--lines-terminated-by="\n" geoips ip_ranges.csv

You should now have somthing like 74,000 records in your ip_ranges table.

Change back to your mysql client and the geoips database. We can now remove the columns ip1_temp and ip2_temp from the ip_ranges table, because we won’t need them for searching. If you want to leave them to verify the results, that’s no problem either.

ALTER TABLE ip_ranges
   DROP ip1_temp,
   DROP ip2_temp;

Now we have an interesting task ahead of us. We want to move a list of country codes and country names into a separate table called ip_countries. That’s very easy to do with only one command.

CREATE TABLE ip_countries
   SELECT DISTINCT code, country
      FROM ip_ranges
      ORDER BY code;

We can now

* add a primary key to the code column of the ip_countries table
* remove the country column from the ip_ranges table
* add a foreign key constraint to the code column of the ip_ranges table, which refers to the code column of the ip_countries table.

We need two more SQL commands to do this:

ALTER TABLE ip_countries ADD PRIMARY KEY (code);
ALTER TABLE ip_ranges 
   DROP country, 
   ADD FOREIGN KEY (code) 
      REFERENCES ip_countries(code);

Now we have all the data we need to start writing a User Defined Function which queries the country based on an IP address that we pass as parameter. Of course we can’t pass the IP address as such a number like we have the values in the ip_from and ip_to columns of the ip_ranges table. So we need to calculate the corresponding number from the IP address.

The page http://www.maxmind.com/app/csv tells us how this value can be calculated. We split up the 4 parts of our IP address and calculate it like this (ip1 = 1st part, ip2 = 2nd part, ip3 = 3rd part, ip4 = 4th part):

value = 16777216 x ip1 + 65536 x ip2 + 256 x ip3 + ip4

I use my current IP address 62.46.14.132 to try this out and get the value 1,043,205,764.

As soon we have this value, we can query the country like this:

SELECT b.country
   FROM ip_ranges a INNER JOIN ip_countries b
   ON a.code = b.code
   WHERE ip_from = 1043205764

The result is Austria, which is the place where I live.

Even though we query the data from two tables with one storing more than 74,000 records, the query only takes about 0.0025 seconds on my machine. That’s because of the indexes we’ve set. Prefixing the keyword EXPLAIN to the last query shows us that MySQL was able to query the data very efficiently.

mysql> EXPLAIN SELECT b.country
    -> FROM ip_ranges a INNER JOIN ip_countries b
    -> ON a.code = b.code
    -> WHERE ip_from  ip_to >= 1043205764 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: range
possible_keys: PRIMARY,ip_to,code
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 2322
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 6
          ref: geoips.a.code
         rows: 1
        Extra:
2 rows in set (0.00 sec)

Finally, we write the User Defined Function, which puts this all together:

DELIMITER $$

DROP FUNCTION IF EXISTS `geoips`.`getCountry`$$
CREATE FUNCTION `geoips`.`getCountry` (pIp CHAR(16)) 
RETURNS VARCHAR(100)
BEGIN
  DECLARE _ip1, _ip2, _ip3, _ip4, _ip_value INT UNSIGNED;
  DECLARE _country VARCHAR(100);

  SELECT substring_index(pIp, '.', 1),
     substring_index(pIp, '.', -3),
     substring_index(pIp, '.', -2),
     substring_index(pIp, '.', -1) INTO _ip1, _ip2, _ip3, _ip4;

  SELECT 16777216 * _ip1 + 65536 * _ip2 + 256 * _ip3 + _ip4 
    INTO _ip_value;

  SELECT b.country
    FROM ip_ranges a INNER JOIN ip_countries b
    ON a.code = b.code
    WHERE ip_from = _ip_value
    INTO _country;

  RETURN _country;

END$$

DELIMITER ;

I used the substring_index function to extract the 4 parts of the IP address. With ‘help substring_index’ we can ask the mysql client to give us an explanation of this function:

mysql> help substring_index;
Name: 'SUBSTRING_INDEX'
Description:
   SUBSTRING_INDEX(str,delim,count)
Returns the substring from string str before count
occurrences of the delimiter delim.
If count is positive, everything to the left of the final 
delimiter (counting from the left) is returned.
If count is negative, everything to the right of the final 
delimiter (counting from the right) is returned.
Examples:
mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
        -> 'www.mysql'
mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
        -> 'mysql.com'

Finally, a little test whether the getCountry UDF really works:

mysql> SELECT getCountry('62.46.14.132');
+----------------------------+
| getCountry('62.46.14.132') |
+----------------------------+
| Austria                    |
+----------------------------+
1 row in set (0.00 sec)

Great, it does – and that very quickly :-)!

Menu management with stored procedures (without Nested Set Model)

I’ve tried to create a database driven menu management. The menu items are stored in the database with an ID (menuitem) and a mother ID (motherid) which refers to its upper menu item or to 0, if it is a main menu item. Further we store the description, that’s the text that should be displayed in the menu and a filename, which we can use to store, where we want to link the menu item to. Finally we have a column itemsorder where we store, in which order the menu items should be displayed.

This table can store more than one separate menus, so we use the first column mainmenu to indicate, which menu we want to display.

DROP TABLE IF EXISTS `menu`;
CREATE TABLE `menu` (
`mainmenu` int(10) unsigned NOT NULL,
`menuitem` int(10) unsigned NOT NULL,
`motherid` int(10) unsigned NOT NULL,
`description` varchar(100) NOT NULL,
`filename` varchar(100) NOT NULL,
`itemsorder` int(10) unsigned NOT NULL,
PRIMARY KEY  (`mainmenu`,`menuitem`),
KEY `motherid` (`motherid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT INTO `menu` VALUES (1, 1, 0, 'Menu 1', 'menu1', 10);
INSERT INTO `menu` VALUES (1, 2, 0, 'Menu 2', 'menu2', 20);
INSERT INTO `menu` VALUES (1, 3, 2, 'Menu 2/1', 'menu2_1', 30);
INSERT INTO `menu` VALUES (1, 4, 2, 'Menu 2/2', 'menu2_2', 40);
INSERT INTO `menu` VALUES (1, 5, 4, 'Menu 2/2/1', 'menu2_2_1', 50);
INSERT INTO `menu` VALUES (1, 6, 4, 'Menu 2/2/2', 'menu2_2_2', 60);
INSERT INTO `menu` VALUES (1, 7, 6, 'Menu 2/2/2/1', 'menu2_2_2_1', 70);
INSERT INTO `menu` VALUES (1, 8, 4, 'Menu 2/2/3', 'menu2_2_3', 80);
INSERT INTO `menu` VALUES (1, 9, 2, 'Menu 2/3', 'menu2_3', 90);
INSERT INTO `menu` VALUES (1, 10, 2, 'Menu 2/4', 'menu2_4', 100);
INSERT INTO `menu` VALUES (1, 11, 10, 'Menu 2/4/1', 'menu2_4_1', 110);
INSERT INTO `menu` VALUES (1, 12, 11, 'Menu 2/4/1/1', 'menu2_4_1_1', 120);
INSERT INTO `menu` VALUES (1, 13, 11, 'Menu 2/4/1/2', 'menu2_4_1_2', 130);
INSERT INTO `menu` VALUES (1, 14, 13, 'Menu 2/4/1/2/1', 'menu2_4_1_2_1', 140);
INSERT INTO `menu` VALUES (1, 15, 11, 'Menu 2/4/1/3', 'menu2_4_1_3', 150);
INSERT INTO `menu` VALUES (1, 16, 10, 'Menu 2/4/2', 'menu2_4_2', 160);
INSERT INTO `menu` VALUES (1, 17, 2, 'Menu 2/5', 'menu2_5', 170);
INSERT INTO `menu` VALUES (1, 18, 0, 'Menu 3', 'menu3', 180);
INSERT INTO `menu` VALUES (2, 1, 0, 'Menu 1', 'menu1', 10);
INSERT INTO `menu` VALUES (2, 2, 1, 'Menu 1/1', 'menu1_1', 20);
INSERT INTO `menu` VALUES (2, 3, 2, 'Menu 1/1/1', 'menu1_1_1', 30);
INSERT INTO `menu` VALUES (2, 4, 1, 'Menu 1/2', 'menu1_2', 40);
INSERT INTO `menu` VALUES (2, 5, 4, 'Menu 1/2/1', 'menu1_2_1', 50);
INSERT INTO `menu` VALUES (2, 6, 4, 'Menu 1/2/2', 'menu1_2_2', 60);
INSERT INTO `menu` VALUES (2, 7, 1, 'Menu 1/3', 'menu1_3', 70);
INSERT INTO `menu` VALUES (2, 8, 0, 'Menu 2', 'menu2', 80);
INSERT INTO `menu` VALUES (2, 9, 0, 'Menu 3', 'menu3', 90);
INSERT INTO `menu` VALUES (2, 10, 9, 'Menu 3/1', 'menu3_1', 100);

Up to now, MySQL users had to do quite a lot of work on the client side to get all the information that’s necessary to build the menu. We need to know at which level a menu item resides and which menu items should be displayed and which should not. This example should only make the top level items, the items at the same level as the active item and the items right below the active menu item visible.

One choice would be to store redundant data, but if we want to have good database design, we don’t get along without doing some extra work. I’ve tried to get this done by a stored procedure. It was not easy, because there are still some limits on SPs that require some tricky parts to pass by some troubles. Some of them were solved by temporary tables.

Here’s the code of the stored procedure called getMenu. It takes 2 parameters, first which menu should be used (column mainmenu) and the second, which menu item should be active. If the second parameter is set to 0, the complete menu will be displayed.

DELIMITER $$

DROP PROCEDURE IF EXISTS getMenu$$
CREATE PROCEDURE getMenu(IN p_menu INT, IN p_active INT)
BEGIN
 declare activeMotherId INT;
 declare countrec INT;
 declare counter INT default 1;

 drop temporary table if exists t1;

 create temporary table t1
   select menuitem,
       0 as level,
       'n' as display,
       '-' as haschildren,
       motherid,
       description,
       filename,
       itemsorder
     from menu
     where mainmenu = p_menu
     order by itemsorder;

 update t1 set level=1, display='y' where motherid = 0;

 update t1 set display='y' where menuitem = p_active;

 drop temporary table if exists t2;

 create temporary table t2 select * from t1;
 update t1 set haschildren='+' where (select count(*)
   from t2 where t1.menuitem=t2.motherid) > 0;

 drop temporary table if exists t2;

 set countrec = countrec - (select count(*)
   from t1 where motherid = 0);

 select count(*) from t1 into countrec;

 while countrec > 0 do
   drop temporary table if exists t2;

   create temporary table t2
     select menuitem from t1 where level = counter;

   update t1 set level = (counter + 1)
     where motherid in (select menuitem from t2);

   drop temporary table if exists t2;

   select count(*) from t1 where level=0 into countrec;

   set counter = counter + 1;
 end while;

 if p_active = 0 then
   update t1 set display='y';
 else

   select motherid from t1
     where menuitem = p_active into activeMotherId;

   update t1 set display='y'
     where motherid = activeMotherId or motherId = p_active;

   while activeMotherId != 0 do
     update t1 set display='y'
       where motherid = activeMotherId;

     select motherid
       from t1
       where menuitem = activeMotherId
       into activeMotherId;
   end while;
 end if;
 select * from t1;
 drop temporary table t1;
END$$

DELIMITER ;

First I move the relevant data to a temporary table called t1. This temporary table has 3 additional fields: level (which indicates the level of the menu item), display (‘y'(es)/’n'(o) – whether the menu item should be displayed or not) and haschildren (‘+’ – has child items, ‘-‘ – has no child items).

The following are some quite complex processes to get the right data into these new fields. Therefore I had to create 2 more temporary tables to store some data for a short amount of time. At the end, the result is being displayed with all data that’s required for comfortable processing in a PHP script like this:

<?php
ini_set('display_errors', 1);

$mysqli = 
   new mysqli('localhost', 'username', 'password', 'dbname');

if (mysqli_connect_error())
die ("Connection error: ".mysqli_connect_errno());

$activeId = isset($_GET['activeid']) ? $_GET['activeid'] : 1;

$sql = "call getMenu(1, ".$activeId.")";

$mysqli->multi_query($sql);

$result = $mysqli->use_result();

echo "<table>\n";

while ($data = $result->fetch_array(MYSQLI_ASSOC))
{
if ($data['display'] == 'y')
   {
   echo "<tr>\n";    
   echo "<td>(".$data['menuitem'].")</td>\n<td>".
      $data['haschildren']."</td>\n<td>";
 
   for ($i = 0; $i < $data['level']; $i++)
      echo "   ";
 
   if ($data['menuitem'] == $activeId)
      echo "<b>";
   
   echo "<a href='sp_menutest.php?activeid=".$data['menuitem']."'>".
      $data['description']."</a>";
 
   if ($data['menuitem'] == $activeId)
      echo "</b>";
 
   echo "</td>\n";
   echo "</tr>\n";
   }
}
echo "</table>\n";

$result = $mysqli->next_result();
$result = null;

$mysqli->close();
?>

Enjoy! Further examples using the Nested Set Model will follow.

INSERT … ON DUPLICATE KEY UPDATE

At the moment, I’m preparing for the core certification exam. Although I thought that I knew almost everything about the new features in MySQL 4.1, I sometimes find some fantastic “goodies” while I’m learning for the exam. One of them is the INSERT … ON DUPLICATE KEY UPDATE command.

This command is ideal for logging. You can insert into a table, but do an update, if the row (based on the primary key) already exists. An example might explain this best:

mysql> CREATE TABLE logdata (
    ->    id INT UNSIGNED NOT NULL,
    ->    count INT NOT NULL,
    ->    PRIMARY KEY (id)) ENGINE=MyISAM;
Query OK, 0 rows affected (0.03 sec)

mysql> INSERT INTO logdata (id, count)
    ->    VALUES (1, 1)
    ->    ON DUPLICATE KEY UPDATE
    ->    count = count + 1;
Query OK, 1 row affected (0.02 sec)

mysql> INSERT INTO logdata (id, count)
    ->    VALUES (2, 1)
    ->    ON DUPLICATE KEY UPDATE
    ->    count = count + 1;
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO logdata (id, count)
    ->    VALUES (1, 1)
    ->    ON DUPLICATE KEY UPDATE
    ->    count = count + 1;
Query OK, 2 rows affected (0.00 sec)

mysql> SELECT * FROM logdata;
+----+-------+
| id | count |
+----+-------+
|  1 |     2 |
|  2 |     1 |
+----+-------+
2 rows in set (0.00 sec)

mysql>

First, the key values 1 and 2 have been inserted. Then we inserted the value 1 again, the command executed an UPDATE instead incrementing the count value to 2.