Lesson 15 of the SQL Optimization Course: Index Design (MySQL Index Structures)

For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.
This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.
🧑💻 Target audience:
We will use MySQL as the demonstration database.
In the previous article (Episode 15: Index Design (B+ Tree Organization)), we discussed why databases predominantly use B+ trees for index storage: they are well-suited for disk storage, leverage the properties of balanced multi-way trees, support disk pre-reading, and efficiently handle equality, range, and sequential scans. This article focuses on the index organization methods of MySQL’s two commonly used engines: MyISAM and InnoDB. Understanding these storage mechanisms is crucial for database optimization.
Clustered Index: Also known as a Clustered Index, this type of index ensures that the physical order of records in a table matches the logical order of the index. Since a table can only be stored in one physical order, a table can have at most one clustered index. Compared to non-clustered indexes, clustered indexes offer faster retrieval speeds.
In MySQL, only InnoDB tables support clustered indexes. The data in InnoDB tables is itself a clustered index, often referred to as an Index-Organized Table (IOT). Non-leaf nodes are ordered by the primary key, and leaf nodes contain the primary key and the corresponding row data. As a result, performing a full table scan in sequence is very fast for InnoDB tables.
Non-Clustered Index: Also called a Secondary Index, this type of index stores non-leaf nodes ordered by the index key values, while leaf nodes contain the index key values and the corresponding primary key values. In MySQL, apart from the primary key in InnoDB tables, all other indexes are secondary indexes. Engines like MyISAM and memory tables use non-clustered indexes. In simple terms, the index and row data are stored separately. A table can have multiple secondary indexes.
MyISAM tables are typical examples of tables where data and indexes are stored separately, with no fundamental difference between the primary key and secondary indexes. For instance, in a MyISAM table, the primary key and unique indexes are essentially the same.
The above figure illustrates a 3rd-order B+ tree where non-leaf nodes are stored in order based on the primary key values, and leaf nodes are similarly ordered by the primary key values and contain pointers to the physical data rows on disk.
The above figure shows the B+ tree for the age column index, which is also a 3rd-order B+ tree. Non-leaf nodes are stored in order based on the age column values, and leaf nodes store the age values and pointers to the physical data rows on disk.
From the above two figures, it is evident that the biggest drawback of MyISAM tables is the lack of physical data row order. This means that both primary key and secondary index retrievals require an additional sorting step.
Let’s demonstrate this with a simple example:
The following SQL 1 statement outputs records in an unordered manner. To output them in ID order, we need to use SQL 2 with an explicit ORDER BY
clause.
|
|
Next, let’s examine the composition of primary key indexes and secondary indexes in InnoDB tables.
InnoDB tables are index-organized tables, meaning the index is the data. The following diagram illustrates the data rows of table T1 in a clustered index format. Non-leaf nodes store the primary key values, leaf nodes store the primary key values and the corresponding data rows, and each page includes pointers to the previous and next pages.
Unlike MyISAM, InnoDB tables have their own page management system with a default page size of 16KB. MyISAM tables rely on the file system for data management, which typically has a default block size of 4KB, and MyISAM blocks are also 4KB. MyISAM tables do not have their own crash recovery mechanism and depend entirely on the file system.
The design of InnoDB tables has two main advantages:
t1
, the data is directly retrieved in primary key ID order.
|
|
Now, let’s look at the secondary indexes in InnoDB tables, as shown in the following diagram:
In InnoDB secondary indexes, non-leaf nodes store the index field values (e.g., the age field in the above example), and leaf nodes contain the index field values and the corresponding primary key values.
The advantages of this design are:
However, there are also disadvantages:
Secondary indexes are larger in size: Since they store primary key values, this can be particularly problematic if the primary key is poorly designed (e.g., using UUIDs as primary keys). The next article will discuss how to design effective primary keys.
Secondary index lookups require two index tree searches: First, the secondary index leaf node is searched to find the filtered row’s primary key value. Then, the clustered index is searched using this primary key value to retrieve the corresponding row.
For example, consider the following SQL statement to retrieve records where age is 23:
|
|
This operation is effectively split into two steps:
First, find the primary key value corresponding to the age field:
|
|
Then, retrieve the required data row from the clustered index using the primary key ID = 10005:
|
|
Although this requires an additional lookup, MySQL optimizes this process with data preheating (details of which can be found in the MySQL manual).
This concludes the content of this article. To summarize, we have explored the index organization methods and their respective advantages and disadvantages in MySQL’s two common engines: MyISAM and InnoDB. Feel free to leave comments or questions, and in the next article, we will discuss how to design effective primary keys in MySQL.
👋 See you in the next lesson.
SQLFlash is your AI-powered SQL Optimization Partner.
Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.
Join us and experience the power of SQLFlash today!.