current position:Home>Go deep into the smallest storage unit of InnoDB storage engine and analyze mysql. In fact, indexing is not difficult

Go deep into the smallest storage unit of InnoDB storage engine and analyze mysql. In fact, indexing is not difficult

2022-01-26 21:53:31 A sharer who loves Java

Index can be said to be a necessary skill point for every engineer , Understanding the principle of indexing is important for writing high-quality SQL crucial , Today we'll start from 0 To 1 To understand the principle of indexing , I believe that after reading the index, you will be right MySQL in InnoDB The smallest storage unit of the storage engine 「 page 」 There will be a deeper understanding

Proceed from the actual needs

Suppose there is the following user table :

CREATE TABLE `user` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `name` int(11) DEFAULT NULL COMMENT ' full name ',
  `age` tinyint(3) unsigned DEFAULT NULL COMMENT ' Age ',
  `height` int(11) DEFAULT NULL COMMENT ' height ',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT=' User table ';

You can see that the storage engine uses InnoDB, Let's first look at the commonly used... For this table SQL The statements have this , After all, technology is to serve business needs ,

1. select * from user where id = xxx
2. select * from user order by id asc/desc
3. select * from user where age = xxx
4. select age from user where age = xxx
5. select age from user order by age asc/desc

Since we want to query, let's insert some data first , After all, there is no data to query

insert into user ('name', 'age', 'height') values (' Zhang San ', 20, 170);
insert into user ('name', 'age', 'height') values (' Li Si ', 21, 171);
insert into user ('name', 'age', 'height') values (' Wang Wu ', 22, 172);
insert into user ('name', 'age', 'height') values (' Zhao Liu ', 23, 173);
insert into user ('name', 'age', 'height') values (' Qian Qi ', 24, 174);

The data in the inserted table is as follows :

 thorough InnoDB Analysis of the minimum storage unit of the storage engine MySQL, In fact, indexing is not difficult

I wonder if you find that we didn't specify when inserting id value , but InnoDB One... Is added for each record by default id value , And this id The value is increasing , Every time a record is inserted ,id Increasing 1,id Why increase it , Mainly for the convenience of query , Press... For each record id The order from small to large is connected with a linked list , So every time you find id = xxx From the id = 1 Start looking back in turn

 thorough InnoDB Analysis of the minimum storage unit of the storage engine MySQL, In fact, indexing is not difficult

Now suppose we want to perform the following SQL sentence ,MySQL How will you query

select * from user where id = 3

page

As mentioned earlier , First of all, from the id The smallest record is id = 1 Read up , Read one record at a time , Put it id The value is compared with the value to be queried , Read the record three times in a row and found the record 3, Pay attention to this read operation , First, you need to read the records stored on the disk into memory, and then compare id Of , Read from disk to memory once IO, In other words, there are three times in this process IO, It's okay if it's just a few records , However, if the number of items to be compared is large, it is a very serious challenge to performance , If I want to query for id = 100 That's not going to produce 100 Time IO? Since the bottleneck is IO, How to improve it , It's simple , Our current design once IO Only one record can be read , That's changed to once IO Can read 100 Even more don't just happen once IO Did you? , The idea behind this is The principle of program locality : When a data item is used , It is likely to use adjacent data , So simply load the dependent data together ( You from id = 1 Start reading , That's likely to use id = 1 The next element , So just put id = 1 ~ id = 100 All the records are loaded in )

Of course, once IO Reading records is not the more the better , You can't load a lot of irrelevant data into memory for a query record , That will cause a great waste of resources , So we adopted a more compromise scheme , We stipulate that once IO Read 16 K The data of , Assuming that 100 There's a piece of data , So if we want to query id = 100 The record of , Only once IO read (id=1~id=100 The record of ), Compared to the original 100 Time IO Promoted 100 The performance of The Times

 thorough InnoDB Analysis of the minimum storage unit of the storage engine MySQL, In fact, indexing is not difficult

We put this 16KB The combination of records is called a page

Page directory

once IO Will read a page , Then look up the records in the page in memory , Searching in memory is really much faster than disk , But we are still not satisfied , Because if you want to find id = 100 The record of , First of all id = 1 Our records compare , And then there was id=2,…,id=100, Need to compare 100 Time , Could it be faster ?

You can refer to binary search , To find the first id = (1+100)/2 = 50, because 50 < 100, And then 50~100 Check the records of , And then in 75~100 Intermediate investigation , Go through like this 7 You can find id = 100 The second record , Compared to the original 100 This comparison has improved a lot of performance . But now the problem is , Find... For the first time id = 50 Your records have to start from id = 1 To traverse the 50 Time to find , Can you locate id=50 What about your records , If not , Even the first time from id = 30 or 40 It's OK to start looking

What data structure can meet this demand , Remember to skip the watch , every other n One element is extracted to form a primary index , every other 2*n Two elements form a secondary index ...

 thorough InnoDB Analysis of the minimum storage unit of the storage engine MySQL, In fact, indexing is not difficult

Such as graphic , Take building a primary index as an example , When we search, we first search in the primary index , Locate it in the primary index, and then look it up in the linked list , For example, we are looking for 7 This number , If you don't need to skip the list, look it up directly in the linked list , Need to compare 7 Time , If a jump table is used, we first look it up in the primary index , Found that as long as you compare 3 Time , Four times less , So we can use the idea of jump table to reduce the number of queries , The specific operation is as follows , Every time 4 A group of elements form a slot (slot), The slot only records the record with the largest element in this group and how many records there are in this group

 thorough InnoDB Analysis of the minimum storage unit of the storage engine MySQL, In fact, indexing is not difficult

Now suppose we want to locate id = 9 The record of , How do you do that , It's simple : First locate which slot the record is in , Then traverse the elements in this slot

  1. Position in which slot , First, take the... Corresponding to the minimum slot and the maximum slot id( Respectively 4, 12), First, find their middle value by binary search (4+12)/2 = 8,8 Less than 9, And slot 2 Maximum id by 12, So we know id = 9 The record is in the slot 2 in
  2. Traverse slot 2 The elements in , Now comes the question , We know that each record forms a single linked list , Each slot points to the largest slot in this group id value , How do I traverse from the first element of this slot , It's simple , Slave slot 1 Just start traversing , Because it points to the next element of the element, which is the slot 2 The starting element of , After traversal, a slot was found 2 Of The first element is what we found id by 9 The elements of

You can see that our elements are quickly located in the page in this way ,MySQL Specify the elements in each slot in 1~8 strip , So just locate which slot , The rest of the comparison is not a problem , Of course, a page of records is limited after all , If the page is full , It's about to open up another page to hold records , Pages are connected by linked lists , But look at the picture below , Why use a two-way linked list to connect , Don't forget what we listed at the beginning 「order by id asc 」 and 「order by id desc 」 These two query criteria , In other words, records need to support both positive and negative search , That's why we use a two-way linked list

 thorough InnoDB Analysis of the minimum storage unit of the storage engine MySQL, In fact, indexing is not difficult

B+ The birth of the tree

Now comes the question , If there are many pages , How to locate elements , If the element happens to be on the first few pages , It's a big deal. It's also fast to traverse the first few pages , But if you want to check id = 100w Such elements need to be traversed page by page 1w page ( Suppose that every page 100 Bar record ), That is clearly unacceptable , How to improve , In fact, the previously built on page directory has inspired us , Since in the page, we can first locate the slot of the element in the form of creating a page directory for the record, and then find , That's for multiple pages , Can you locate the page on which the element is located first , In other words, we can also create a directory for pages , Each record in this directory corresponds to the page and the smallest record in the page , Of course, this directory also exists in the form of pages , For the sake of distinguishing , We call the page corresponding to the directory generated for the page Catalog page , And previously stored Complete record Your page is called Data pages

 thorough InnoDB Analysis of the minimum storage unit of the storage engine MySQL, In fact, indexing is not difficult

Voice over : The contents page is the same as the data page , The interior is also grooved , The above is for the convenience of showing , There is no drawing of , Directory pages are different from data except for recording data , Other structures are consistent

Now if you want to find id = xxx The record is very simple , Just go to the table of contents page to locate its starting page, and then look it up in turn , Because there are slots in both catalog pages and data pages , Therefore, it is very fast to locate both the page number of the table of contents page and the records in the data page .

Yes, of course , With the increase of pages , More and more records are stored in the directory page , The contents page will eventually be full , Then build another table of contents page , So now the question comes , How to locate what you're looking for id Which directory page is it on , It's not enough to make a table of contents page for the table of contents page again , as follows

 thorough InnoDB Analysis of the minimum storage unit of the storage engine MySQL, In fact, indexing is not difficult

What do you think of when you see the structure above ? you 're right , This is one B+ Trees ! Now I believe you have understood B+ The evolution of trees , I also understand its principle , You can see this B+ The tree has three layers , We call the top table of contents page The root node , The lowest page that stores the complete record is called Leaf node ,

Now let's look at how to find id = 55 The record of , First, the root node will be loaded , Found that the page number should be 30 Look for , So load the page 30, On the page 30 I found that it should be on page 4 In the middle of the investigation , So I put the page again 4 Load into memory , And then on the page 4 Search... In turn , You can see the total experience 3 Time IO(B+ There will be several times when the tree has several layers IO), After the page is read, it will be cached in memory , If you hit a page in memory, you will get it directly from memory . Someone might ask , If B+ There are many layers of trees , Isn't that possible many times IO, Let's simply calculate , Suppose that data pages can store 100 Bar record , The table of contents page can store 1000 Bar record ( The directory page only stores the primary key , Do not store complete data , So you can store more records ), that

  • If B+ The tree has nothing but 1 layer , It's just 1 Nodes for storing user records , Can store at most 100 Bar record .
  • If B+ The tree has 2 layer , Can store at most 1000×100=100000 Bar record .
  • If B+ The tree has 3 layer , Can store at most 1000×1000×100=100000000 Bar record .
  • If B+ The tree has 4 layer , Can store at most 1000×1000×1000×100=100000000000 Bar record !

So general 3~4 Layer of B+ Trees are enough to meet our requirements , And it will be cached in memory after each read ( Of course, it will also be replaced out of memory according to a certain algorithm ), So on the whole 3~4 layer B+ Trees are enough to meet our needs

Clustered index and non clustered index

I believe you have found , The example we mentioned above B+ The tree example is for id That is, the index of the primary key , It is not difficult to find that the leaf node in the primary key index stores the complete SQL Record , We call this kind of index that stores complete records clustered index , As long as you define the primary key , Then the primary key index is the cluster index .

So what is the form of indexes created for non primary key columns , The form of non leaf nodes is exactly the same , But the storage of leaf nodes is somewhat different , The non primary key column index stores on the leaf node Index column and primary key value , For example, let's assume that age This column is indexed , Then its index tree is as follows

 thorough InnoDB Analysis of the minimum storage unit of the storage engine MySQL, In fact, indexing is not difficult

You can see that non leaf nodes save 「age value + Page number 」, The leaf node saves 「age value + Primary key value 」, Then you may wonder , as follows SQL How did you take out the complete record

select * from user where age = xxx

The first step we all know , Above SQL Can hit age The index corresponding to the column , Then find the corresponding record on the leaf node ( If any ), But the records on the leaf node are only age and id These two columns , And you use select *, Means looking for user All column information for , What to do , The answer is based on id Then go to the cluster index to find id Corresponding complete record , That's what we're talking about Back to the table , If there are too many back tables, it will obviously cause some performance problems , because id May be distributed in different pages , This means reading different pages from disk into memory , These pages are probably not adjacent , Which means a lot of Random IO, Can seriously affect performance , Seeing this, I believe it is not difficult for you to understand a high-frequency interview question : Why is it set to hit the index but still cause a full table scan , One reason is that although the index is hit, the leaf node needs to return a large number of tables after querying the records , This leads the optimizer to think that this situation is not as fast as full table scanning

Someone might ask , Why don't secondary indexes store complete records , Of course, to save space , After all, complete data is very space consuming , If you add an index, you need to store additional complete records , That will cause a lot of data redundancy .

How to avoid this situation ? Index overlay , If the following SQL Meet your needs , Then it is suggested to adopt the following form

select age from user where age = xxx
select age,id from user where age = xxx

It's not hard to find this kind of SQL The characteristic of is the column to get (age) Is the index column itself ( Include id), This is based on age After finding the corresponding record on the leaf node , Because the record itself contains these columns , You don't need to go back to your watch , Can improve performance

Disk read ahead

Next, let's discuss a problem that many people on the Internet can't understand , We know that the operating system manages memory in pages , stay Linux in , The default size of a page is 4 KB, That is, whether you load data from disk to memory or write memory back to disk , The operating system will operate in pages , Even if you only write one byte to an empty file , The operating system also assigns a page size to it ( 4 KB)

 thorough InnoDB Analysis of the minimum storage unit of the storage engine MySQL, In fact, indexing is not difficult

Such as graphic , Two... Were written to disk byte , But the operating system still assigns it a page (4 KB) Size

innoDB It is also stored and read in page units , and innoDB The default page size is 16 KB, So many people on the Internet question whether this means that it needs to be implemented 4 Time IO To put innoDB After reading the page ? No, it isn't , Just once IO, Why? ? This requires a little understanding of how disk reading works

Construction of disk

First, let's look at the physical structure of the disk

 thorough InnoDB Analysis of the minimum storage unit of the storage engine MySQL, In fact, indexing is not difficult

The main part of hard disk is disk 、 Drive magnetic arm 、 Read and write head and shaft , Data is mainly written to the disk on the disk , The disc consists of several A sector Composed of , Data writing and reading are based on sectors , In addition, take the center of the disc as the center of the circle , Divide the disc into several concentric circles , That's every circle “ line ”, It's called Magnetic track

So how is data read and written , There are three main steps

  1. Seek the way : Since the data is stored on the sector , First of all, we need to know which sector it is in , This requires moving the head to the track where the sector is located , We call it seek time , The average seek time is generally 3-15ms
  2. Rotation delay : When the disk is moved to the disk where the sector is located , At this time, the magnetic head is not aligned with the sector corresponding to the data we want , So you need to wait for the disc to rotate for a moment , Wait until the sector corresponding to the data we want falls under the head , The rotation delay depends on the disk speed , It usually takes time to rotate a disk for one revolution 1/2 Express . such as :7200rpm The average disk rotation delay of is about 60*1000/7200/2 = 4.17ms, And the speed is 15000rpm The average rotation delay of the disk is 2ms
  3. The data transfer : After the first two steps , The head finally began to read and write data , at present IDE/ATA Can achieve 133MB/s,SATA II Accessible 300MB/s The interface data transfer rate , Data transmission time is usually far less than the time consumed by the first two parts . Negligible

Note that neglect in data transmission is a prerequisite , That is, you need to read Continuous adjacency Sector of , That's what we often call the order IO, Disk order IO The read and write speed can be comparable to or even exceed the random memory IO, So this part of the time can be ignored ,( You know Kafka The reason why the performance is strong , One important reason is to use the sequential read-write of the disk ), But if the data to be read is distributed in different sectors , It becomes random IO, Random IO There is no doubt that the seek time and rotation delay are increased , Performance is very worrying ( A typical representative is the one mentioned above When returning to the table, a large number of id Distributed on different pages , Caused a lot of random IO)

 thorough InnoDB Analysis of the minimum storage unit of the storage engine MySQL, In fact, indexing is not difficult

Such as graphic : The picture is from a famous academic journal ACM Queue Performance comparison chart on , You can see the disk order IO(Sequential Disk) Is faster than random read and write in memory (Random memory) faster

That's reading innoDB Why is a page in a page counted once IO Well , I believe you have guessed , Because this page is allocated continuously , That means that their sectors are adjacent , So it's sequential IO

The operating system manages memory in pages , It can load multiple pages at a time , and innoDB The page size of is 16KB, Just the operating system page (4KB) Of 4 times , Therefore, it can be specified to read continuously at the starting address of reading 4 Operating system pages , namely 16 KB, That's what we're talking about Disk read ahead , So far, I believe it is not difficult for you to understand why reading a page is actually only one time IO instead of 4 Time

summary

After reading this article, I believe you can understand the origin of the index , In addition, you should also know a lot about the performance improvement of page and disk pre reading , Actually MySQL The page structure of is slightly different from the structure we deduced , However, it does not affect the overall understanding .

copyright notice
author[A sharer who loves Java],Please bring the original link to reprint, thank you.
https://en.cdmana.com/2022/01/202201262153292845.html

Random recommended