本文是我在上UCSD的 CSE 120: Principles of Operating Systems (Winter 2020) 整理的笔记,这一课主要介绍了分布式系统中最基础的一些概念的方法。
Basic
-
What is a distributed system?
-
Cooperating processes in a computer network
-
Degree of integration
- Loose: Internet applications, email, web browsing
- Medium: remote execution, remote file systems
- Tight: process migration, distributed file systems
-
Advantage
- Speed: parallelism, less contention
- Reliability: redundancy, fault tolerance, “NSPF”
- Scalability: incremental growth, economy of scale
- Geographic distribution: low latency, reliability
-
Disadvantages
-
Fundamental problems of decentralized control
- State uncertainty: no shared memory or clock
- Action uncertainty: matually conflicting decisions
-
Distributed algorithms are complex
-
-
-
Is distribution better?
-
Single fast server with single queue
-
Multiple slower servers with separate queues
- Typically better than the first one
-
Multiple slower servers, single queue
- Better than the first two
-
Little’s Law:
$N = \lambda W$
- Use to calculate processing time (assuming stable)
-
Models used in cooperating system
-
The Client/Server Model
-
Client
- Short-lived process that makes requests
- “User-side” of application
-
Server
- Exports well-defined requests/response interface
- Long-lived process that waits for requests
- Upon receiving request, carries it out (may spawn)
-
-
Peer-to-Peer
-
A peer talks directly with another peer
- No intermediary (e.g., central servel) involved
- Symmetric (unlike asymmetric client/server)
-
In actuality, may be dynamic client/server
- A requests file from B; A acts as client, B as server
- C can now request file from A; A acts as server
-
Distributed Algorithms
-
Event ordering
-
Order events given no shared clock/memory
-
Happened-before relations:
$\rightarrow$
- A, B events in same process and A before B:
$A\rightarrow B$
- A is a send event, B is a receive event:
$A\rightarrow B$
- If
$A\rightarrow B$
and$B\rightarrow C$
, then$A\rightarrow C$
- A, B events in same process and A before B:
-
Implementation
- Timestamp all events based on local clock
- Upon receiving a message, advance local clock
- Resolve ties by ordering machines
-
Example
-
Example 1 (When timing conflicting, follow rules above):
-
Example 2:
-
-
-
Leader election
-
Many distributed algorithms rely on leader
-
Need to determine if leader exists; if not elect
-
Bully algorithm (elect leader L)
- Every process is numbered (priority): P1, P2, …
$P_j$
sends request to L, no reply; tries to elect itself$P_j$
sends “Can I be leader?” to all$P_{k>j}$
- No replies,
$P_j$
sends to all$P_{i<j}$
, “I am leader”, done - If some
$P_{k>j}$
replies,$P_j$
let$P_k$
try to elect itself - If no message from
$P_k$
,$P_j$
tries to elect itself again
-
-
Mutual exclusion
-
Centralized approach
- Single process acts as coordinator server
- Request, reply (to allow entrance), release
-
Distributed approach
- Process sends time-stamped request to all others
- Waits until it receives replies from all (ok to other)
- Enter critical section (may get requests, defers)
- Upon exiting, responds (to release) to all deferred
- Use timestamps to order “simultaneous” requests
-