IF you are new to mariadb or mysql, you might want to take a sneak peek at the Introduction To MariaDB, I covered a couple of interesting concept in that post.
Also, I wrote a guide on:
If you don't know how to create a database, tables, and alter tables, take a look at the above guides. In this guide, we would talk about indexing in mariadb, plus a way to manipulate or rename indexes.
Indexes are like a table of contents that is used by mariadb to locate data quickly and efficiently. A table without an index is like searching for a single image in a list of thousands of images, yes, you can still search it, but it would be slow as the rows of data are searched sequentially, but when you used an index, you can map all the images to a unique index number, so, instead of searching the images sequentially, mariadb would search for the index number instead and link it up the actual image.
So, what differences does it make?
- An index is smaller and is structured to be traversed quickly, it can be searched rapidly
- It can speed up your SELECT query
- Unique indexes guarantee uniquely identifiable records in the database.
- Suppose you ever need to remove a duplicate image, how would you identify which one to remove and which to keep? Index to your rescue
Note that an index is not the same as a column, but they are similar, let's understand the stricture of the following index:
SHOW INDEX FROM marn_melodic.cmusic;
We are showing the index from the cmusic table in the marn_melodic database, the result is as follows:
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| cmusic | 0 | PRIMARY | 1 | music_id | A | 7 | NULL | NULL | | BTREE | | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
The output above shows there is an index in the table cmusic, the index is not the music_id but the data from which the index is drawn.
Yes, the name of the column and name of the index are the same and the index is attached to the column but they are not the same.
To get a better idea of what I am saying, here is a table I created in the previous guide:
SELECT * FROM marn_melodic.cmusic;
+----------+--------+-------+-----------------+-------------------------+----------+-------------+--------+----------+
| music_id | status | mood | artiste | title | genre_id | description | writer | producer |
+----------+--------+-------+-----------------+-------------------------+----------+-------------+--------+----------+
| 1 | 0 | Dance | Avicii | Silhouette | NULL | NULL | NULL | NULL |
| 2 | 1 | Dance | Beautiful Nubia | Higher Steps | NULL | NULL | NULL | NULL |
| 3 | 0 | Sad | Beautiful Nubia | Tables Turn | NULL | NULL | NULL | NULL |
| 4 | 0 | Dance | Fela | Observation Is No Crime | NULL | NULL | NULL | NULL |
| 5 | 1 | Dance | Avicii | Levels | NULL | NULL | NULL | NULL |
| 6 | 0 | Dance | Fela | Colonial Mentality | NULL | NULL | NULL | NULL |
| 7 | 1 | Sad | Asa | Jailer | NULL | NULL | NULL | NULL |
+----------+--------+-------+-----------------+-------------------------+----------+-------------+--------+----------+
Suppose that users of the cmusic table want to search based on the artise name. Without an index, mariadb will search the artiste name column sequentially which would be slow when you have a large data set.
Consider we want to only return 'Fela', we can do:
SELECT * FROM marn_melodic.cmusic WHERE artiste = 'Fela';
+----------+--------+-------+---------+-------------------------+----------+-------------+--------+----------+
| music_id | status | mood | artiste | title | genre_id | description | writer | producer |
+----------+--------+-------+---------+-------------------------+----------+-------------+--------+----------+
| 4 | 0 | Dance | Fela | Observation Is No Crime | NULL | NULL | NULL | NULL |
| 6 | 0 | Dance | Fela | Colonial Mentality | NULL | NULL | NULL | NULL |
+----------+--------+-------+---------+-------------------------+----------+-------------+--------+----------+
If you do not have any index, MariaDB must read all the records in the cmusic table and compare the artiste field with the string “Fela” to see whether they match. Clearly, that’s not efficient. As the number of records increases, so does the effort necessary to find a given record.
A good way to simulate this is by using the EXPLAIN statement coupled with the SELECT statement, it will return information on how the SELECT statement searches the table and on what basis.
The good thing about this is that, it will explain what the server did when executing the SELECT statement, it won’t return any rows from the table per se, but information on how the index would be used had it been you executed only the SELECT statement
Run the following against any table:
EXPLAIN SELECT * FROM marn_melodic.cmusic WHERE artiste = 'Fela' \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: cmusic
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 7
Extra: Using where
The EXPLAIN statement describes what would have happened had it been you entered the SELECT statement. The possible_keys field would show the keys that the SELECT statement could have used.
In this case, there is no index related to the artiste column, it showed NULL has it didn't use any key.
If you have thousands of data set, you should incorporate index real quick, but it doesn't make much difference if you have few data, however, it is a good idea to get it up right off the bat.
Without any indexes, MariaDB must examine every row in a table, you can imagine how slow this would be if you have a large set of data, it won't only be time-consuming, it would also use a lot of disks I/O. Indexes exist so that the database can quickly eliminate possible rows from the result set when executing a query.
Aside from creating a single index, you can create an index that is based on the artiste name and the name of the song, this would improve performance for a time when the table will have many records, let's create an index that combines the two-column:
ALTER TABLE marn_melodic.cmusic ADD INDEX song_names (artiste, title);
If you run this and you have a similar table has mine, you might get the following error:
Warning (Code 1071): Specified key was too long; max key length is 767 bytes
Let's understand where the warning is coming from, first let's see our table structure:
SHOW CREATE TABLE marn_melodic.cmusic \G
*************************** 1. row ***************************
Table: cmusic
Create Table: CREATE TABLE `cmusic` (
`music_id` int(11) NOT NULL AUTO_INCREMENT,
`status` tinyint(1) DEFAULT '1',
`mood` enum('Dance','Sad','Energetic') DEFAULT NULL,
`artiste` varchar(255) DEFAULT NULL,
`title` varchar(255) DEFAULT NULL,
`genre_id` int(11) DEFAULT NULL,
`description` text,
`writer` varchar(150) DEFAULT NULL,
`producer` varchar(100) DEFAULT NULL,
PRIMARY KEY (`music_id`)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8mb4
First thing first, we are using CHARSET or Character Set - utf8mb4
- a character set defines how and which characters are stored to support a particular language or languages.
A utf8mb4
is a variable-length encoder that stores characters in 1 - 4 bytes. Encode is a way of converting one thing to the other, Words and sentences in text are created from characters. Examples of characters include the English letter A
or a dollar sign $
.
Characters that are needed for a specific purpose are grouped into a character set. Each character is associated with a number, called a code point.
The characters are stored in the computer as one or more bytes, in the case of utf8mb4, it can use up to 4 bytes to store just a character, that is the point I am trying to make. The maximum index length that an InnoDB Engine
can have is 767 bytes (at least, in my current version of mariadb), now I am trying to add an index using:
ALTER TABLE marn_melodic.cmusic ADD INDEX song_names (artiste, title);
artiste
and title
both have a maximum length of 255 characters, if you recall I said, utf8mb4 can use up to 4 bytes per character, now multiple 255 x 4 = 1020
. You can see why it returns the warning, it is over the limit, for utf8mb4 191 characters is the maximum index size you can specify per column, why? because 191 x 5 = 764
, which is below the 767 bytes mark.
In short, mariadb would go ahead and add the index for you, but would reduce the character, check using limit:
SHOW CREATE TABLE marn_melodic.cmusic \G
*************************** 1. row ***************************
Table: cmusic
Create Table: CREATE TABLE `cmusic` (
`music_id` int(11) NOT NULL AUTO_INCREMENT,
`status` tinyint(1) DEFAULT '1',
`mood` enum('Dance','Sad','Energetic') DEFAULT NULL,
`artiste` varchar(255) DEFAULT NULL,
`title` varchar(255) DEFAULT NULL,
`genre_id` int(11) DEFAULT NULL,
`description` text,
`writer` varchar(150) DEFAULT NULL,
`producer` varchar(100) DEFAULT NULL,
PRIMARY KEY (`music_id`),
KEY `song_names` (`artiste`(191),`title`(191))
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8mb4
Again, for utf8mb4 191 characters is the maximum index size you can specify per column, which is why we had the error in the first place. There are a couple of ways we can fix the error, the first way is increasing the limit of the maximum index length, you can do that by changing your mariadb config like so:
innodb_file_format = Barracuda
innodb_file_per_table = on
innodb_default_row_format = dynamic
innodb_large_prefix = 1
innodb_file_format_max = Barracuda
On Linux, add it in /etc/mysql/mariadb.conf.d/50-server.cnf under the [mariadb] section header.
This would change the limit to 3072 bytes.
Lastly, run the queries one at a time in your mariadb client:
SET GLOBAL innodb_file_format = Barracuda;
SET GLOBAL innodb_file_per_table = on;
SET GLOBAL innodb_default_row_format = dynamic;
SET GLOBAL innodb_large_prefix = 1;
SET GLOBAL innodb_file_format_max = Barracuda;
The second way is reducing the length of the column character, since we are using utf8mb4, make sure they are not over 191, e.g varchar(191).
The third way, and which is my preferred way is using partial indexes, mariadb gives you a lot of control over how much space is used by the indexes.
First DROP the index using:
DROP INDEX song_names ON marn_melodic.cmusic;
If for example, our cmusic table contains 50 million rows, adding an index on just the artiste field will require more space as the row grows. If the average artiste field is 15 bytes long, it would require around 750MB of space for the data portion of the index, and this is just an assumption, it might be way larger than this if you have a couple of more indexes. The row pointers would always add a couple of bytes per record.
Since we are trying to index the artiste column and the title column, we can index only the first 4 bytes of each column instead of indexing the entire artiste and title column:
ALTER TABLE marn_melodic.cmusic ADD INDEX song_names (artiste(4), title(4));
This will reduce the space requirements for the data portion of the index. The con of this is that mariadb can’t eliminate quite as many rows using this index, but that is only true if, for instance, you are using one index.
if you are using a multi-column index as in the above example, having the artiste field and the title field index together means that MariaDB can eliminate rows based on both fields, thereby greatly reducing the number of rows it must consider.
For example, if we have the following query:
EXPLAIN SELECT * FROM marn_melodic.cmusic WHERE artiste = 'Beautiful Nubia' AND title = 'Higher Steps';
+----------+--------+-------+-----------------+--------------+----------+-------------+--------+----------+
| music_id | status | mood | artiste | title | genre_id | description | writer | producer |
+----------+--------+-------+-----------------+--------------+----------+-------------+--------+----------+
| 2 | 1 | Dance | Beautiful Nubia | Higher Steps | NULL | NULL | NULL | NULL |
+----------+--------+-------+-----------------+--------------+----------+-------------+--------+----------+
It would take only the first 4 portions on each field as that is what we specified when building the index, so, it would store 'Beaut", and "High". There are fewer chances you would have both combos in other fields, for this reason, it would be able to return the result faster even if you have millions of rows. This is a great gem when optimizing your indexes.
Now, if you run the SHOW CREATE TABLE statement, you would have:
SHOW CREATE TABLE marn_melodic.cmusic;
*************************** 1. row ***************************
Table: cmusic
Create Table: CREATE TABLE `cmusic` (
`music_id` int(11) NOT NULL AUTO_INCREMENT,
`status` tinyint(1) DEFAULT '1',
`mood` enum('Dance','Sad','Energetic') DEFAULT NULL,
`artiste` varchar(255) DEFAULT NULL,
`title` varchar(255) DEFAULT NULL,
`genre_id` int(11) DEFAULT NULL,
`description` text,
`writer` varchar(150) DEFAULT NULL,
`producer` varchar(100) DEFAULT NULL,
PRIMARY KEY (`music_id`),
KEY `song_names` (`artiste`(4),`title`(4))
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8mb4
The results show a new KEY after the list of columns. The key, or index, is called song_names
and is based on the values of the two columns listed in parentheses (artiste and title).
Let's see more information about this new index:
SHOW INDEX FROM marn_melodic.cmusic WHERE Key_name = 'song_names' \G
*************************** 1. row ***************************
Table: cmusic
Non_unique: 1
Key_name: song_names
Seq_in_index: 1
Column_name: artiste
Collation: A
Cardinality: 7
Sub_part: 4
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
*************************** 2. row ***************************
Table: cmusic
Non_unique: 1
Key_name: song_names
Seq_in_index: 2
Column_name: title
Collation: A
Cardinality: 7
Sub_part: 4
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
The results show two rows with information on the columns that were used to create the index. You don't have to understand the info listed there, the point here is the index is based on two different columns (artiste and title), and they both have the same key (song_names).
If you actually want to see if the SELECT statement is using an index, you can run the EXPLAIN...SELECT statement to see if it is using an index or not:
EXPLAIN SELECT * FROM marn_melodic.cmusic WHERE artiste = 'Beautiful Nubia' AND title = 'Higher Steps';
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: cmusic
type: ref
possible_keys: song_names
key: song_names
key_len: 38
ref: const,const
rows: 1
Extra: Using where
As shown in the results, the possible_keys field indicates that the song_names key could be used. If there were more than one possible key that could be used, the line would list them there. The line after the possible_keys (keys) shows that the song_names index was actually used.
Now, when a user wants to search the table based on the artiste name or the combo of the artiste and the title, MariaDB will use the song_names index that we created, and not search the artiste or title column sequentially. That will make for a better and optimize search.
Renaming a Column That Has an Index
To rename a column has an index you do:
ALTER TABLE tablename
DROP PRIMARY KEY,
CHANGE old_columnname new_columnname INT PRIMARY KEY AUTO INCREMENT;
the index must be dropped before the column with which it’s associated can be renamed. You won;t lose any data in the column, only the index will be deleted, and that will be recreated with the new easily by mariadb.
Note: You don’t have to give the name of the associated column when dropping a PRIMARY KEY. There is and can be only one primary key.
The difference between an efficient and inefficient database is how you think of the structure, not consuming unnecessary resources, your indexing strategy, and the likes.