Featured image of post Cache Coherence (3) - Coherence Protocols

Cache Coherence (3) - Coherence Protocols

Cache Coherence States | Snooping Protocol | Directory Protocol | MOESI/MSI

Finite State Machine

AKA coherence controller.

Cache Coherence Controller

Coherence Protocols

Specifing Protocols

  • State: For example, Not readable or writable (N), Read-only (RO), Read-write (RW). These is the state of the cache.
  • Events: For example, Load, Store, Incoming Coherence request to optain read-write state.
  • Transitions: Actions
LoadStoreIncoming Coherence request to optain read-write state
NIssue request to get ROIssue request to get RWNA
ROGive dataIssue request to get RWNA/N
RWGive dataWrite dataSend Block to requestor/N

States

Implemented ststes shoudl consider:

  • Validity: Has the up-to-date value. Can read, but can write if it is also exclusive.
  • Dirtiness: Write to up-to-date, but not yet push to lower memory.
  • Exclusivity: No other private cache has a copy of the block.
  • Ownership: Owner is responsible for responding to coherence reqeust for that block.

MOESI/MSI

  • Modified: Valid, exclusive, owned, and maybe dirty. Can read and write.
  • Owned: Valid, and owned, and maybe dirty. Read-only. Other core also have read-only copy but not owners.
  • Exclusive: Valid, exclusive, and clean. Read-only.
  • Shared: Valid and clean. Read-only.
  • Invalid: Invalid.

MOESI/MSI States

Transiten States

States that is inbetween two states. ie., $IV^D$ (in Invalid going to Valid, waiting for DataResq).

Transactions

TransactionGoal of Requestor
GetShared (GetS)Obtain block in Shared (read-only) state
GetModified (GetM)Obtain block in Modified (read-write) state
Upgrade (Upg)Upgrade block state from read-only (Shared or Owned) to read-write (Modified); Upg (unlike GetM) does not require data to be sent to requestor
PutShared (PutS)Evict block in Shared state*
PutExclusive (PutE)Evict block in Exclusive state*
PutOwned (PutO)Evict block in Owned state
PutModified (PutM)Evict block in Modified state

*Some protocols do not require a coherence transaction to evict a Shared block and/or an Exclusive block (i.e., the PutS and/or PutE are “silent”).

EventResponse of (Typical) Cache Controller
LoadIf cache hit, respond with data from cache; else initiate GetS transaction
StoreIf cache hit in state E or M, write data into cache; else initiate GetM or Upg transaction
Atomic read-modify-writeIf cache hit in state E or M, atomically execute RMW semantics; else GetM or Upg transaction
Instruction fetchIf cache hit (in I-cache), respond with instruction from cache; else initiate GetS transaction
Read-only prefetchIf cache hit, ignore; else may optionally initiate GetS transaction*
Read-Write prefetchIf cache hit in state M, ignore; else may optionally initiate GetM or Upg transaction*
ReplacementDepending on state of block, initiate PutS, PutE, PutO, or PutM transaction

*A cache controller may choose to ignore a prefetch request from the core.

Major Protocal Design Options

Directory vs. Snooping

  • Directory protocol: A directory is used to track the state of each block. Cache controller send request to the home of that block.
  • Snooping protocol: A shared bus is used to broadcast the state of each block. Assume requests arrive in totol order.

Invalidate vs. Update

  • Invalidate protocol: Write will invalidate the block, so other core cannot read.
  • Update protocol: Write update all copies of the block.

Snooping Protocol

Usually using a bus to broadcast the requests, but not necessarily has to use a bus.

MSI: Transitions between stable states at cache controller

SIMPLE SNOOPING SYSTEM MODEL: ATOMIC REQUESTS, ATOMIC TRANSACTIONS

Simple snooping for cache controller, labeled “(A)” denotes that this transition is impossible because transactions are atomic on bus:

StatesProcessor Core EventsBus Events
Own TransactionOther Transaction
LoadStoreReplacementOwn-GetSOwn-GetMOwn-PutMDataOther-GetSOther-GetMOther-PutM
IIssue GetS /ISDIssue GetM /IMD
ISDStall LoadStall StoreStall EvictCopy data into cache, load hit /S(A)(A)(A)
IMDStall LoadStall StoreStall EvictCopy data into cache, store hit /M(A)(A)(A)
SLoad hitIssue GetM /SMD- /I- /I
SMDLoad hitStall StoreStall EvictCopy data into cache, store hit /M(A)(A)(A)
MLoad hitStore hitIssue PutM, send Data to memory /ISend Data to req and memory /SSend Data to req /I
  • Atomic request: Issue request ensures that another core’s request will not ordered ahead of its, when this cache controller seeks to upgrade permission (I->S, S->M, or I->M). Thus, controller transition immediately to state $IS^D$, $IM^D$, or $SM^D$.
  • Atomic transaction: No subsequent requests from a block will occur until the current transaction completes. Resquest always comes with reponse in pair.

BASELINE SNOOPING SYSTEM MODEL: NON-ATOMIC REQUESTS, ATOMIC TRANSACTIONS

MESI

  • Rely on atomic transactions.
  • Use E instead of S: when GetS occurs when no other cache has access to the block.
  • Upgrade from E to M is silent. No transactions needed. Elinimate half of the transactions.

MESI at cache controller

Note: in the figure when it says, “mem in I”, it means in the lower level memory (ie,. LLC or DRAM), that cache block is in I which states that no private cache has a copy of that block. It does not mean the cache in the private cache is in I state.

MOSI

  • Rely on atomic transactions.
  • Eliminate extra data to update LLC when a cache receives a GetS request in the M/E state.
  • Using dirty bit, eliminate unnecessary write to LLC (if block is written again before write back to LLC).
  • Owner is used to reduce the access latency. Without owner, even a core’s private has a copy in S, it cannot forward the copy to other because no one know who is the owner. Only the defined owner can forward the copy.

MOSI at cache controller

Non-atomic bus

  • Send request without waiting for previous response.
  • Pipelined bus: response in same order as request.
  • Split transaction bus: response in random order depending on latency.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
Atomic bus:
Address Bus:  Request 1 ----------- Request 2 ------------ Request 3
Data Bus:               Response 1 ----------- Response 2 ----------- Response 3

Pipelined (non-atomic) bus:
Address Bus:  Request 1 Request 2 Request 3
Data Bus:               Response 1 Response 2  Response 3

Split transaction (non-atomic) bus:
Address Bus:  Request 1 Request 2 Request 3
Data Bus:               Response 2 Response 3  Response 1
  • non-atomic system model:
    • can use FIFO queue to buffer messages.

Directory Protocol

  • Directory: a global view of the coherence state of each block.
  • Forwarding, the directory controller can forward the request to the owner of the block.

A directory entry can be like (N: number of nodes):

stateownerstarer list (one-host bit vector)
2-bit$Log_2(N)$-bit$N$-bit

MSI Directory Protocal

Avoiding Deadlocks

  • Deadlock: Event A causes event B and both require resource allocation. Deadlock can occur when the resources are both not available until one of the event completes. For example, GetS can causes Fwd-GetS which uses the same resources.
1
2
3
4
5
6
7
8
    Full
    | Queue | ------> C1 ----->
       ^                       |
       |                       |
       |                       |
       |                       |
       -----C2<--------------| Queue |
                                Full

To avoid deadlock, we can use different netwroks for different calss of messages. This avoid dependence between different messages. ie,. response and request.

Detailed Protocal Specification

MSI Directory Protocol - directory controller

GetSGetMPutS-NotLastPutS-LastPut M+data from OwnerPutM+data from Non-OwnerData
ISend data to Req, add Req to Sharers/SSend data to Req, set Owner to Req/MSend Put-Ack to ReqSend Put-Ack to ReqSend Put-Ack to Req
SSend data to Req, add Req to SharersSend data to Req, send Inv to Sharers, clear Sharers, set Owner to Req/MRemove Req from Sharers, sent Put-Ack to ReqRemove Req from Sharers, send Put-Ack to Req/IRemove Req from Sharers, send Put-Ack to Req
MSend Fwd-GetS to Owner, add Req and Owner to Sharers, clear Owner/SDSend Fwd-GetM to Owner, set Owner to ReqSend Put-Ack to ReqSend Put-Ack to ReqCopy data to memory, clear Owner, send Put-Ack to Req/ISend Put-Ack to Req
SDStallStallRemove Req from Sharers, send Put-Ack to ReqRemove Req from Sharers, send Put-Ack to ReqRemove Req from Sharers, send Put-Ack to ReqCopy data to memory/S

References

Sorin, D. J., Hill, M. D., Wood, D. A., & Nagarajan, V. (2020). A primer on memory consistency and cache coherence (2nd ed., pp. 91-191). Springer Nature. https://doi.org/10.1007/978-3-031-01764-3

comments powered by Disqus