current position:Home>"Diminishing marginal benefit" theory in Computer Science

"Diminishing marginal benefit" theory in Computer Science

2022-01-27 04:59:34 Cold spring HQ

In Computer Science “ Diminishing marginal benefit ” theory

Diminishing marginal benefit , It's not just an economic word , In the field of computer hardware , The same thing .

In Economics , The diminishing marginal benefit is influenced by people's Psychology , When consuming something , The initial stimulation must be great , Thus, people's satisfaction is high , But constantly consuming the same thing , That is, when the same stimulus is repeated repeatedly , People's psychological excitement or satisfaction is bound to decrease .

In computer science , Whether it's a single machine or a distributed cluster , There's a problem : The larger , The higher the maintenance cost .

In Economics , The reason for this result is human psychology . In computer science , The reason for this result is some bottlenecks .

In Economics “ Keep consuming the same thing ”, It's the increase of items . Corresponding to Computer Science , Is an extension of the system .

System scalability

Shared structure

When the load increases and more processing power is required , The easiest way is to buy more powerful machines ( Sometimes called vertical expansion ). More... Managed by one operating system CPU, Memory and disk , Through the high-speed internal bus, each CPU Can access all storage or disks . In such a shared memory architecture , The collection of all these components can be regarded as a big machine .

Shared memory architecture The problem is thousands of , The cost increases too fast and even exceeds the linear : That is, if you put... In a machine CPU Double the number , Double the memory expansion , Double the size of the disk , The final total cost will more than double . And by thousands of performance bottlenecks , Although such a machine has twice the hardware index, it may not be able to handle twice the load .
Shared memory architecture can provide limited fault tolerance , For example, high-end servers can hot plug many components ( Replace the disk without shutting down the machine , Memory module , Even CPU) . But clearly , It is still limited to a specific geographical location , Unable to provide remote fault tolerance .

The other way is Shared disk architecture , It has multiple servers , Each server has its own CPU And memory , Then store the data on a disk array that can be shared and accessed , Servers and disk arrays are often connected through high-speed networks . This architecture is suitable for thousands of data warehouses and other loads , However, its further expansion ability is usually limited by thousands of resource competition and lock overhead .

No shared structure

by comparison , No shared architecture ( Also known as horizontal expansion ) Got a lot of attention . When using this architecture , The machine or virtual machine running the database software is called a node . Each node independently uses local CPU, Memory and disk . All tasks such as coordination and communication between nodes run in the traditional network ( Ethernet ) And the core logic mainly depends on software .

Shareless systems do not require specialized hardware , With high cost performance . It can distribute data across multiple geographic regions , So as to reduce the user's access delay , Continue to work even in the event of a disaster in the entire data center . Through the deployment mode of cloud computing virtual machine , Even if there is no Google A small company of scale , You can also easily have cross regional distributed architecture and service capabilities .

Shared structure vs No shared structure

When thousands of nodes do one thing at the same time , It always has to share something , To coordinate something . The way of sharing can be the memory mentioned above / disk , It can also be the Internet . But we usually call the architecture based on network sharing “ No shared architecture ”.

Although distributed shareless architecture has many advantages , But it also brings more complexity to the application , Sometimes it even limits the actual available data model . For example, in some extreme cases , A simple single threaded program may be better than one with 100 Multiple CPU The cluster performance of the core is better . And on the other hand , Shareless systems can also achieve very powerful performance .

Understand distributed systems and systems from another perspective CAP

Reference resources :

The essence of distributed computing

Generation of distributed systems , It comes from people's increasing performance requirements and backward x86 The contradiction between architectures .

People try to make use of the Internet and a large number of cheap tickets PC machine , Through a fierce mathematical operation , To build a macroscopically stronger performance 、 Higher load capacity computers , To replace expensive minicomputers 、 The mainframe .

Single server VS Distributed computing : System scale problem

Design of distributed system , Get rid of the expensive x86 The server , But it didn't get rid of von Neumann structure . The original stand-alone bottleneck , In distributed systems, there will still be . The specific term ,

  • The single computer uses bus communication , Bus becomes the bottleneck of data transmission rate .
  • Network based distributed computing , Its essence is to treat the network as a bus , Still can't get rid of the data transmission bottleneck of communication and coordination between nodes .
    • Each machine is equivalent to an arithmetic unit plus a memory
    • master A node is a controller plus input and output devices
The bottleneck of distributed computing

Whether master-slave or master-slave , The traffic of the whole system will eventually fall on a specific resource . Of course, this resource may be multiple machines , But it still can't solve a serious problem : The larger the system , The greater the loss of its background performance . Because it involves communication and coordination between nodes , To make millions of nodes work together , The work of passing commands and data takes up most of the running time .

The performance problems of distributed systems may be manifested in many aspects , But in the end , It is the contradiction between people's growing performance requirements and data consistency . Once strong data consistency is needed (Consistency), Then there must be a bottleneck that limits performance , This bottleneck is the speed of information transmission (Availability).

that , Where is the bottleneck of information transmission speed ?

So , When the application scale is expanded from single system to distributed system , The basic contradiction of data intensive application design has been solved by “ People's increasing demand for performance and backward performance x86 The contradiction between architectures ” Turn it into “ The contradiction between people's growing performance requirements and data consistency ” 了 .

but ( It's like CAP According to the theorem ) There is no solution to the new contradiction . Why is it unsolvable ? Personally think that , The bottleneck of information transmission is determined by the level of human hardware manufacturing , Further to the bottom is Feng · The Neumann framework determines , Going to the bottom is determined by the logical model of Turing machine . But Turing machine is the theoretical basis for the feasibility of computer , therefore , Let's blame the entropy increasing universe , Since our universe is an entropy increasing universe , Then this problem cannot be solved . Why the larger the scale, the higher the maintenance cost , You are also a mature universe , It's time to learn to turn yourself into an entropy minus universe ( dog's head ).

copyright notice
author[Cold spring HQ],Please bring the original link to reprint, thank you.

Random recommended