current position:Home>[Special Topic on redis core principles] (1) "technology improvement series" analyzes and explores how to realize the hot key discovery mechanism of LFU and the principle of internal scan technology

[Special Topic on redis core principles] (1) "technology improvement series" analyzes and explores how to realize the hot key discovery mechanism of LFU and the principle of internal scan technology

2022-01-26 23:56:43 InfoQ

Introduction

It is inevitable that there are access hotspots in the business ,redis We'll have this problem as well , However, how to find hot spots key Has been bothering many users ,redis4.0 It has brought us many new features , Among them is based on LFU The hot key Discovery mechanism .

Least Frequently Used

Least Frequently Used—— abbreviation LFU, Least frequently used , yes redis4.0 A new type of memory eviction strategy , from LFU We can easily associate with key Frequency of visits , however 4.0 The original version was only used for memory eviction , There is no good record of the frequency of visits , Well, after some transformation ,redis On 4.0.3 The version is officially supported based on LFU The hot key Discovery mechanism .
LFU Algorithm is introduced
Redis Every object in has 24 bits Space to record LRU/LFU Information :

typedef struct redisObject {
 unsigned type:4;
 unsigned encoding:4;
 unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
 * LFU data (least significant 8 bits frequency
 * and most significant 16 bits access time). */
 int refcount;
 void *ptr;
} robj;

When this 24 bits Used as a LFU when , It is divided into two parts :

null
  • high 16 Bits are used to record the access time ( In minutes )LRU
  • low 8 Bits are used to record access frequency , abbreviation counter LFU
counter: Probability based logarithmic counter
8 bits The maximum value is 255, Only 8 Is it enough to record the frequency of visits ?

about counter,redis One was used trick The means of ,counter It's not a simple linear counter , Instead, it is implemented with a probability based logarithmic counter , Algorithm is as follows :

uint8_t LFULogIncr(uint8_t counter) {
 if (counter == 255) return 255;
 double r = (double)rand()/RAND_MAX;
 double baseval = counter - LFU_INIT_VAL;
 if (baseval < 0) baseval = 0;
 double p = 1.0/(baseval*server.lfu_log_factor+1);
 if (r < p) counter++;
 return counter;
 }
The corresponding probability distribution formula is :
1/((counter-LFU_INIT_VAL)*server.lfu_log_factor+1)

among LFU_INIT_VAL by 5, If we look at the probability distribution diagram, we will have a more intuitive understanding , By default server.lfu_log_factor=10 For example :

null
You can see from the above picture that ,counter The bigger it is , The smaller the probability of its increase ,8 bits It's also enough to record a high frequency of visits , The following table shows the different probability factors server.lfu_log_factor And access frequency counter Correspondence of :

null
in other words , Default server.lfu_log_factor by 10 Under the circumstances ,8 bits Of counter Can be said 1 Millions of visits .
counter Attenuation factor of
counter The growth function LFULogIncr We can see that , With key Traffic growth ,counter Will eventually converge to 255, This raises a question , If counter You can't distinguish hot spots if you just grow and don't decay key.

To solve this problem ,redis Provides the attenuation factor server.lfu_decay_time, Its unit is minutes , The calculation method is also very simple , If one key If you haven't accessed it for a long time, then its counter counter It's about reducing , The decrease is controlled by the attenuation factor :

unsigned long LFUDecrAndReturn(robj *o) {
 unsigned long ldt = o->lru >> 8;
 unsigned long counter = o->lru & 255;
 unsigned long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
 if (num_periods)
 counter = (num_periods > counter) ? 0 : counter - num_periods;
 return counter;
}

The default is 1 In the case of N No visit in minutes ,counter We have to reduce N.

Probability factor and attenuation factor can be configured , Recommended redis The default value of :

lfu-log-factor 10
lfu-decay-time 1
hotspot key Find out
Introduction after LFU Algorithm , Next is the hot spot we are concerned about key Discovery mechanism .

The core is to treat each time key During read-write access , to update LFU Of 24 bits The domain represents the access time and counter, So each of them key You can get the right LFU value :

void updateLFU(robj *val) {
 unsigned long counter = LFUDecrAndReturn(val);
 counter = LFULogIncr(counter);
 val->lru = (LFUGetTimeInMinutes()<<8) | counter;
}

So how do users get access frequency ?redis Provides OBJECT FREQ Subcommand to get LFU Information , But note that you need to set the memory eviction policy to allkeys-lfu perhaps volatile-lfu, Otherwise, an error will be returned :

127.0.0.1:6379> config get maxmemory-policy
1) &quot;maxmemory-policy&quot;
2) &quot;noeviction&quot;
127.0.0.1:6379> object freq counter:000000006889
(error) ERR An LFU maxmemory policy is not selected, access frequency not tracked. Please note that when switching between policies at runtime LRU and LFU data will take some time to adjust.

127.0.0.1:6379> config set maxmemory-policy allkeys-lfu
OK
127.0.0.1:6379> object freq counter:000000006889
(integer) 3

Bottom algorithm : Use scan Command traverses all key, Re pass OBJECT FREQ Get access frequency and sort , You can get hot spots key.

redis 4.0.3 It also provides redis-cli The hot key Discovery function , perform redis-cli When combined with --hotkeys Options can be , Examples are as follows :

$./redis-cli --hotkeys

# Scanning the entire keyspace to find hot keys as well as
# average sizes per key type. You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Hot key 'counter:000000000002' found so far with counter 87
[00.00%] Hot key 'key:000000000001' found so far with counter 254
[00.00%] Hot key 'mylist' found so far with counter 107
[00.00%] Hot key 'key:000000000000' found so far with counter 254
[45.45%] Hot key 'counter:000000000001' found so far with counter 87
[45.45%] Hot key 'key:000000000002' found so far with counter 254
[45.45%] Hot key 'myset' found so far with counter 64
[45.45%] Hot key 'counter:000000000000' found so far with counter 93

-------- summary -------

Sampled 22 keys in the keyspace!
hot key found with counter: 254 keyname: key:000000000001
hot key found with counter: 254 keyname: key:000000000000
hot key found with counter: 254 keyname: key:000000000002
hot key found with counter: 107 keyname: mylist
hot key found with counter: 93 keyname: counter:000000000000
hot key found with counter: 87 keyname: counter:000000000002
hot key found with counter: 87 keyname: counter:000000000001
hot key found with counter: 64 keyname: myset

You can see , The top few are hot spots key.


Scan Introduction of operation

be familiar with Redis Everyone knows , It's single threaded . Therefore, the time complexity is O(N) Be very careful with your orders . It may block the process if you are not careful , Lead to Redis There's a Caton .

  • Sometimes , We need to operate on some commands that meet the conditions , For example, delete to test_ At the beginning key. So how to get these key Well ? stay Redis2.8 Before the release , We can use keys Command according to regular matching to get what we need key. But this command has two disadvantages :
  • No, limit, We can only get all qualified... At one time key, If there are millions of results , Then what is waiting for you is “ inexhaustible ” String output .
  • keys The command is a traversal algorithm , The time complexity is O(N). As we just said , This command can easily lead to Redis Service carton . therefore , We should try to avoid using this command in the production environment .

In meeting the needs and existence Redis How to choose between carton ? Faced with this dilemma ,Redis stay 2.8 Version gives us a solution ——scan command .

Scan Advantages of operation

Compared with keys command ,scan Command has two obvious advantages :

  • scan Although the time complexity of the command is also O(N), But it's done in stages , Will not block threads .
  • scan The order provides limit Parameters , You can control the maximum number of results returned each time .

These two advantages help us solve the above problems , however scan Commands are not perfect , The result it returns may be repeated , Therefore, the client needs to redo .scan command

The summary has the following characteristics :

  • The returned results may be repeated , Need client to repeat , This is very important ;
  • If there is data modification during traversal , It is uncertain whether the modified data can be traversed ;
  • Just because the result of a single return is empty doesn't mean the end of the traversal , It depends on whether the returned cursor value is zero ;

scan Command usage example

//  The first parameter specifies the cursor , The second parameter specifies the matching pattern , The third parameter specifies the number of returned data //  Be careful : count The parameter is to limit the number of dictionary slots traversed by the server in a single time , Instead of limiting returns key The number of

127.0.0.1:6379> scan 0 match key99* count 1000
1) &quot;13976&quot;
2) 1) &quot;key9911&quot;
 2) &quot;key9974&quot;
 3) &quot;key9994&quot;
 4) &quot;key9910&quot;
 5) &quot;key9907&quot;
 6) &quot;key9989&quot;
 7) &quot;key9971&quot;
 8) &quot;key99&quot;
 9) &quot;key9966&quot;
 10) &quot;key992&quot;
 11) &quot;key9903&quot;
 12) &quot;key9905&quot;
127.0.0.1:6379> scan 13976 match key99* count 1000
1) &quot;1996&quot;
2) 1) &quot;key9982&quot;
 2) &quot;key9997&quot;
 3) &quot;key9963&quot;
 4) &quot;key996&quot;
 5) &quot;key9912&quot;
 6) &quot;key9999&quot;
 7) &quot;key9921&quot;
 8) &quot;key994&quot;
 9) &quot;key9956&quot;
 10) &quot;key9919&quot;
127.0.0.1:6379> scan 1996 match key99* count 1000
1) &quot;12594&quot;
2) 1) &quot;key9939&quot;
 2) &quot;key9941&quot;
 3) &quot;key9967&quot;
 4) &quot;key9938&quot;
 5) &quot;key9906&quot;
 6) &quot;key999&quot;
 7) &quot;key9909&quot;
 8) &quot;key9933&quot;
 9) &quot;key9992&quot;
......
127.0.0.1:6379> scan 11687 match key99* count 1000
1) &quot;0&quot;
2) 1) &quot;key9969&quot;
 2) &quot;key998&quot;
 3) &quot;key9986&quot;
 4) &quot;key9968&quot;
 5) &quot;key9965&quot;
 6) &quot;key9990&quot;
 7) &quot;key9915&quot;
 8) &quot;key9928&quot;
 9) &quot;key9908&quot;
 10) &quot;key9929&quot;
 11) &quot;key9944&quot;...

Mainly from the perspective of the underlying structure and source code scan How it works .

Redis Structure

Redis Used Hash Table as the underlying implementation , The reason is that it is efficient and easy to implement . Speaking of Hash surface , quite a lot Java The programmer's first reaction is HashMap. you 're right ,Redis Bottom key The storage structure of is similar to HashMap That array + The structure of the list . The array size of the first dimension is 2n(n>=0). Each expansion doubles the length of the array .

scan The command is to traverse this one-dimensional array . The cursor value returned each time is also the index of the array .limit The parameter indicates how many elements of the array are traversed , Return the qualified results attached under these elements . Because the linked list attached under each element has different sizes , So the number of results returned each time is different .
SCAN The traversal order of
About scan Traversal order of commands , We can use a small chestnut to see .

127.0.0.1:6379> keys *
1) &quot;db_number&quot;
2) &quot;key1&quot;
3) &quot;myKey&quot;
127.0.0.1:6379> scan 0 MATCH * COUNT 1
1) &quot;2&quot;
2) 1) &quot;db_number&quot;
127.0.0.1:6379> scan 2 MATCH * COUNT 1
1) &quot;1&quot;
2) 1) &quot;myKey&quot;
127.0.0.1:6379> scan 1 MATCH * COUNT 1
1) &quot;3&quot;
2) 1) &quot;key1&quot;
127.0.0.1:6379> scan 3 MATCH * COUNT 1
1) &quot;0&quot;
2) (empty list or set)

our Redis There is 3 individual key, We only traverse the elements of one-dimensional array at a time . As shown above ,SCAN The traversal order of the command is

0->2->1->3

This order looks strange . It's easier for us to understand by converting it into binary .

00->10->01->11

We find that every time this sequence is high plus 1 Of . Ordinary binary addition , It's right to left 、 carry . And this sequence is added from left to right 、 Carry on . This is where we are redis It is also confirmed in the source code of .

stay dict.c Of documents dictScan The cursor is processed as follows in the function

v = rev(v);
v++;
v = rev(v);

intend , Invert the cursor , After one , Turn it upside down , That's what we're talking about “ High and high 1” The operation of .

You may have questions here , Why use this order for traversal , Instead of using normal 0、1、2…… In this order , This is because we need to consider the expansion and contraction of the dictionary during traversal ( I have to admire the comprehensiveness of developers' consideration ).

Let's take a look at SCAN During traversal , In case of capacity expansion , How the traversal will proceed . Adding our original array has 4 Elements , That is, the index has two digits , At this time, it needs to be expanded into 3 position , And carry on rehash.

It was hung on xx All elements under are assigned to 0xx and 1xx Next . In the diagram above , When we are about to traverse 10 when ,dict the rehash, At this time ,scan Command from 010 To traverse the , and 000 and 100( primary 00 The next attached element ) It will not be traversed repeatedly .

Let's take a look at the shrinkage . hypothesis dict from 3 Bit shrink to 2 position , When you are about to traverse 110 when ,dict Shrinkage occurred , At this time scan Can traverse 10. At this time 010 The next attached element will be iterated , but 010 The previous elements will not be traversed repeatedly . therefore , There may still be some repeated elements during volume reduction .
Redis Of rehash
rehash It's a complicated process , In order not to block Redis The process of , It adopts a gradual rehash The mechanism of .

/*  Dictionaries  */
typedef struct dict {
 //  Type specific function
 dictType *type;
 //  Private data
 void *privdata;
 //  Hashtable
 dictht ht[2];
 // rehash  Indexes
 //  When  rehash  When not in progress , The value is  -1
 int rehashidx; /* rehashing not in progress if rehashidx == -1 */
 //  Number of security iterators currently running
 int iterators; /* number of iterators currently running */
} dict;

stay Redis In the dictionary structure of , There are two hash surface , A new watch , An old watch . stay rehash In the process of ,redis Gradually migrate the elements in the old table to the new table , Let's take a look dict Of rehash Operation source code .

/* Performs N steps of incremental rehashing. Returns 1 if there are still
 * keys to move from the old to the new hash table, otherwise 0 is returned.
 *
 * Note that a rehashing step consists in moving a bucket (that may have more
 * than one key as we use chaining) from the old to the new hash table, however
 * since part of the hash table may be composed of empty spaces, it is not
 * guaranteed that this function will rehash even a single bucket, since it
 * will visit at max N*10 empty buckets in total, otherwise the amount of
 * work it does would be unbound and the function may block for a long time. */
int dictRehash(dict *d, int n) {
 int empty_visits = n*10; /* Max number of empty buckets to visit. */
 if (!dictIsRehashing(d)) return 0;

 while(n-- && d->ht[0].used != 0) {
 dictEntry *de, *nextde;

 /* Note that rehashidx can't overflow as we are sure there are more
 * elements because ht[0].used != 0 */
 assert(d->ht[0].size > (unsigned long)d->rehashidx);
 while(d->ht[0].table[d->rehashidx] == NULL) {
 d->rehashidx++;
 if (--empty_visits == 0) return 1;
 }
 de = d->ht[0].table[d->rehashidx];
 /* Move all the keys in this bucket from the old to the new hash HT */
 while(de) {
 uint64_t h;

 nextde = de->next;
 /* Get the index in the new hash table */
 h = dictHashKey(d, de->key) & d->ht[1].sizemask;
 de->next = d->ht[1].table[h];
 d->ht[1].table[h] = de;
 d->ht[0].used--;
 d->ht[1].used++;
 de = nextde;
 }
 d->ht[0].table[d->rehashidx] = NULL;
 d->rehashidx++;
 }

 /* Check if we already rehashed the whole table... */
 if (d->ht[0].used == 0) {
 zfree(d->ht[0].table);
 d->ht[0] = d->ht[1];
 _dictReset(&d->ht[1]);
 d->rehashidx = -1;
 return 0;
 }

 /* More to rehash... */
 return 1;
}

Through the notes, we can understand ,rehash The process is based on bucket To migrate for the base unit . So-called bucket In fact, it is the element of the one-dimensional array we mentioned earlier . Migrate one list at a time .

  • First, judge whether it is going on rehash, If it is , I'm going to continue ; Otherwise go straight back to .
  • Then there's the minute n Step by step rehash. It also determines whether there are any remaining elements , To ensure safety .
  • It's going on rehash Before , First, judge the to be migrated bucket Is it across the border? .
  • Then skip the empty bucket, Here's one empty_visits Variable , Represents the maximum accessible empty space bucket The number of , This variable is mainly used to ensure that there is no more blocking Redis.
  • The next step is the migration of elements , Will the current bucket All elements of rehash, And update the number of elements in the two tables .
  • One at a time bucket, You need to replace the... In the old table bucket Point to NULL.
  • Finally, judge whether all the migration is completed , If it is , Then reclaim space , Reset rehash Indexes , Otherwise, tell the caller , There is still data not migrated .
  • because Redis It's progressive rehash Mechanism , therefore ,scan The command scans both new and old tables when needed , Return the results to the client .

copyright notice
author[InfoQ],Please bring the original link to reprint, thank you.
https://en.cdmana.com/2022/01/202201262356372077.html

Random recommended