Reference: https://slack.engineering/real-time-messaging
Overview
Slack sends millions of messages every day across millions of channels in real time all around the world
Server overview
Channel Servers (CS):
CS is stateful and in-memory, holding some amount of history of channels.
CS is mapped to a subset of channels based on consistent hashing.
Consistent Hash-Ring Managers (CHARM):
Define and manage the consistent hash ring for CSs.
Replace unhealthy CSs very quickly ( new CS is ready in under 20 seconds).
Admin Server (AS):
Stateless and in-memory.
Interface between our Webapp backend and CSs
Gateway Server (GS):
Are stateful.
Hold users’ information and web socket channel subscriptions.
Are deployed across multiple geographical regions. Slack clients can swiftly connect to the nearest GS host.
Presence Server (PS):
Are in-memory
Keep track of which users are online.
Slack client setup
Slack client has a persistent Websocket connection to Slack’s servers to receive real-time events.
Step 1: On boot up, the client fetches the user token and WebSocket connection setup information from the Webapp backend.
Step 2: Slack client initiates a Websocket connection to the Envoy in the nearest edge region. Envoy is an open-source edge and service proxy, designed for cloud-native applications. Envoy is used at Slack for:
Load-balancing
TLS termination
Step 3: Gateway Servers (GS) fetches the user information, including all the user’s channels, from Webapp and sends the first message to the client.
Step 4: GS then subscribes to all the channel servers that hold those channels based on consistent hashing asynchronously.
Send a message to a million clients in real time
Step 1: The client sends a message to Webapp API via HTTP POST.
Step 2: WebApp stores the message in the DB storage that uses Vitess for sharding solution.
Step 3: Webapp then sends that message to Admin Server (AS).
Step 4: AS looks at the channel ID in this message, discovers CS through a consistent hash ring, and routes the message to the appropriate CS.
Step 5: CS sends out the message to every GS across the world that is subscribed to that channel.
Step 6: Each GS that receives that message sends it to every connected client subscribed to that channel id.
Send Events
Event:
a user sends a reaction to a message
a bookmark is added
a member joins a channel
user typing
etc.
Slack client A and B are in the same edge region, and C is in a different region.
Step 1: Slack client A is typing in a channel and this is notified to other users B and C in the channel. Client A sends this message via Websocket to GS
Step 2: GS looks at the channel ID in the message and routes it to the appropriate CS based on a consistent hash ring.
Step 3: CS then sends to all GSs across the world subscribed to this channel.
Step 4: Each GS, on receiving this message, broadcasts to all the users’ Websockets subscribed to this channel.