# cloud/cloud_worker — Message Queue Processor

The background worker that drains the `BigscreenCloud` and `WebsocketInbox` SQS queues and applies their effects to Redis. It has no HTTP endpoints, no public-facing surface — it's a polling loop. Most of its work is reacting to media-server lifecycle events (register, heartbeat, room created, user connected) and websocket-server events (active-connection updates, removed connections).

**Entry:** [cloud/cloud_worker/cloud_worker_v2.ts](../../cloud/cloud_worker/cloud_worker_v2.ts).
**Package:** `bigscreen_cloud_worker`.
**Depends on:** [`@bigscreen/lib`](../libraries/lib.md), [`@bigscreen/cloud`](../libraries/cloud-handlers.md).

> There's an in-flight refactoring proposal for this service at [../cloud_worker_refactoring_proposal.md](../cloud_worker_refactoring_proposal.md) — read it before making non-trivial changes here.

## Message Flow

```mermaid
flowchart TB
    subgraph Senders
        WSS[cloud/ws_server]
        MS[media_server_next]
        CAPI[cloud/cloud_api]
    end

    BCQ[[SQS BigscreenCloud]]
    WIN[[SQS WebsocketInbox]]

    WSS --> BCQ
    MS --> BCQ
    CAPI --> BCQ
    WSS --> WIN

    subgraph Worker["cloud/cloud_worker (polling loop, 100 ms)"]
        POLL[receiveMessages]
        DISPATCH{switch command}
        POLL --> DISPATCH
    end

    BCQ --> POLL
    WIN --> POLL

    DISPATCH -->|RegisterMediaServer| H1[Cloud.registerMediaServer]
    DISPATCH -->|Heartbeat| H2[Cloud.onMediaServerHeartbeat]
    DISPATCH -->|UserConnected| H3[Cloud.onUserConnectedToRoom]
    DISPATCH -->|ScreenConnected| H4[Cloud.onScreenConnected]
    DISPATCH -->|RemoveConnection| H5[Cloud.removeBigscreenUser]
    DISPATCH -->|"...(+ more)"| HX[other Cloud.* handlers]

    H1 --> REDIS[(Redis)]
    H2 --> REDIS
    H3 --> REDIS
    H4 --> REDIS
    H5 --> REDIS

    subgraph Periodic["Periodic tasks"]
        T1[removeOldMediaServers<br/>every 20 s]
        T2[checkRoomsIntegrity<br/>every 60 s, dev only]
    end

    T1 --> REDIS
    T2 --> REDIS
```

## Inbound Commands

Driven by the `BigscreenCloudCommands` and `MediaServerCommands` enums in [cloud/src/MessageQueue.ts](../../cloud/src/MessageQueue.ts). The dispatcher is a `switch` at [cloud_worker_v2.ts:17-29](../../cloud/cloud_worker/cloud_worker_v2.ts). The current live set of handled cases is a subset of the enum — see the table:

| Command | Source | Effect |
|---------|--------|--------|
| `RegisterMediaServer` | media_server_next startup | `Cloud.registerMediaServer()` — Redis `mediaServers:<id>` |
| `DeregisterMediaServer` | media_server_next shutdown | Remove from registry |
| `Heartbeat` (media-server) | media_server_next periodic | Update `lastHeartbeatAt` in registry |
| `RoomCreated` | media_server_next | Mark Redis room as ready |
| `RoomDeleted` | media_server_next | Clean up Redis room + users |
| `UserConnected` | media_server_next | Mark user as in room |
| `UserDisconnected` | media_server_next | Clear user → room mapping |
| `ScreenConnected` | media_server_next | Mark screen share active |
| `ScreenDisconnected` | media_server_next | Clear screen share |
| `RequestFailed` | media_server_next | Record failure + unwind state |
| `RemoveConnection` | ws_server | Expire the user in Redis |
| `MessageReceived` | ws_server | (inbox msg — currently no-op) |
| `UpdateActiveConnections` | ws_server | Update per-server user count |

## Outbound Commands

These are the messages the cloud **sends** (mostly by `cloud_api`, not by the worker) to media servers:

| Command | Recipient | Purpose |
|---------|-----------|---------|
| `MediaServerRegistered` | media_server_next | Acknowledge registration |
| `Heartbeat` (cloud → ms) | media_server_next | Keep-alive ack |
| `RoomCreationRequest` | media_server_next | Provision a new mediasoup router |
| `RoomDeletionRequest` | media_server_next | Tear down a router |
| `UserConnectionRequest` | media_server_next | Admit a user |
| `UserDisconnectionRequest` | media_server_next | Remove a user |
| `ScreenConnectionRequest` | media_server_next | Enable screen share |
| `ScreenDisconnectionRequest` | media_server_next | Disable screen share |
| `MediaServerDeregistered` | media_server_next | Shutdown ack |

## Startup

```mermaid
sequenceDiagram
    autonumber
    participant P as Node process
    participant MQ as MessageQueue
    participant C as Cloud
    participant POLL_LOOP as polling loop

    P->>MQ: initQueue(BigscreenCloud)
    P->>MQ: initQueue(WebsocketInbox)
    P->>C: Cloud.initialize({ watchExpired: true })
    P->>C: restoreExistingBigscreenUsers()<br/>(recover from restart)
    P->>POLL_LOOP: setInterval(receiveMessages, 100)
    P->>C: setInterval(removeOldMediaServers, 20000)

    alt NETWORK_NAME == "dev"
        P->>C: setInterval(checkRoomsIntegrity, 60000)
    end
```

## Periodic Housekeeping

- **Every 20 s** — `Cloud.removeOldMediaServers()` sweeps the Redis media-server registry for entries whose `lastHeartbeatAt` is older than the expiry window and evicts them. This is what actually triggers room re-homing when a media server dies.
- **Every 60 s (dev only)** — `Cloud.checkRoomsIntegrity()` walks room state looking for orphans. Dev-only because the walk is O(N) in room count.

## Why in-process workers exist *too*

`cloud_api` runs an in-process media-server worker for latency-sensitive paths (heartbeat ingestion). `cloud_worker` handles the rest out-of-process. The refactoring proposal discusses whether this split is worth keeping — check the proposal before adding a new command handler.

## Further reading

- What `Cloud.*` functions actually mutate → [libraries/cloud-handlers.md](../libraries/cloud-handlers.md)
- Full media-server heartbeat flow → [data-flows.md#3-media-server-heartbeat](../data-flows.md#3-media-server-heartbeat)
- Room creation end-to-end → [data-flows.md#2-room-creation](../data-flows.md#2-room-creation)
