Introduction to WebSocket Server Development


Why WebSocket?

This summer I was working on this interesting project. I believe WebSocket is going to be very popular in the near future. With WebSocket, we can build web-based interactive games, stream dynamic media or even bridge existing network protocols. These things are almost impossible to be done using HTTP and AJAX. If you want to know more about the benefits of WebSocket, you can read this article. Here I mainly focus on the implementation of a WebSocket Server. My WebSocket server is based on RFC6455.

Opening Handshake

Once a connection to the server has been established, the client MUST send an opening handshake to the server.

        GET /chat HTTP/1.1
        Host: server.example.com
        Upgrade: websocket
        Connection: Upgrade
        Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
        Origin: http://example.com
        Sec-WebSocket-Version: 13
  1. An HTTP/1.1 or higher GET request.
  2. A Host header field containing the servers authority.
  3. An Upgrade header field containing the value ”websocket”, treated as an ASCII case- insensitive value.
  4. A Connection header field that includes the token ”Upgrade”, treated as an ASCII case- insensitive value.
  5. The value of header Sec-WebSocket-Key MUST be a nonce consisting of a randomly se- lected 16-byte value that has been base64-encoded. The nonce MUST be selected randomly for each connection.
  6. Optionally, an Origin header field. This header field is sent by all browser clients. A connection attempt lacking this header field SHOULD NOT be interpreted as coming from a browser client.

Once the clients opening handshake has been sent, the client MUST wait for a response from the server before sending any further data. The client MUST validate the servers response as follows:

  1. If the response lacks an Upgrade header field or the Upgrade header field contains a value that is not an ASCII case-insensitive match for the value ”websocket”, the client MUST fail the WebSocket connection.
  2. If the response lacks a Connection header field or the Connection header field doesnt contain a token that is an ASCII case-insensitive match for the value ”Upgrade”, the client MUST fail the WebSocket connection.
  3. If the response lacks a Sec-WebSocket-Accept header field or the Sec-WebSocket- Accept contains a value other than the base64-encoded SHA-1 of the concatenation of the Sec-WebSocket-Key with the string ”258EAFA5-E914-47DA-95CA-C5AB0DC85B11” but ignoring any leading and trailing whitespace, the client MUST fail the WebSocket connection.

If the server chooses to accept the incoming connection, it MUST reply with a valid HTTP response indicating the following.

        HTTP/1.1 101 Switching Protocols
        Upgrade: websocket
        Connection: Upgrade
        Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
  1. The first line is an HTTP Status-Line, with the status code 101.
  2. The Connection and Upgrade header fields complete the HTTP Upgrade.
  3. To prove that the handshake was received, the server has to take two pieces of informa- tion and combine them to form a response. The first piece of information comes from the Sec-WebSocket-Key header field in the client handshake. For this header field, the server has to take the value and concatenate this with the Globally Unique Identi- fier (GUID) ”258EAFA5-E914-47DA- 95CA-C5AB0DC85B11” in string form. The server would then take the SHA-1 hash of this, which is then base64-encoded to give the value ”s3pPLMBiTxaQ9kYGzzhZRbK+xOo=”.

This completes the server’s handshake. If the server finishes these steps without aborting the WebSocket handshake, the server considers the WebSocket connection to be established and that the WebSocket connection is in the OPEN state. At this point, the server may begin sending (and receiving) data.

Closing Handshake

Either peer can send a control frame to begin the closing handshake. The Close frame contains an opcode of 0x8. Upon receiving such a frame, the other peer sends a Close frame in response, if it hasnt already sent one. Close frames sent from client to server must be masked. Upon receiving the control frame, the first peer then closes the connection.

After sending a control frame indicating the connection should be closed, a peer does not send any further data; after receiving a control frame indicating the connection should be closed, a peer discards any further data received.

Data Framing

A high-level overview of the framing is given in the following figure.

  • FIN: When set indicates that this is the final fragment in a message.
  • RSV1, RSV2, RSV3: MUST be 0 unless an extension is negotiated that defines meaningsfor non-zero values.
  • Opcode:– %x1 denotes a text frame
    – %x2 denotes a binary frame
    – %x8 denotes a connection close
  • Mask: Defines whether the ”Payload data” is masked. All frames sent from client to server have this bit set to 1.
  • Payload length: The length of the ”Payload data” in bytes: if 0-125, that is the payload length. If 126, the following 2 bytes interpreted as a 16-bit unsigned integer are the payload length. If 127, the following 8 bytes interpreted as a 64-bit unsigned integer (the most significant bit MUST be 0) are the payload length. The payload length is the length of the ”Extension data” + the length of the ”Application data”. For this implementation, you don’t have to worry about the ”Extension data”. So we assume the length of the ”Extension data” is zero, in which case the payload length is the length of the ”Application data”.
  • Masking-key: All frames sent from the client to the server are masked by a 32-bit value that is contained within the frame. The masking key is a 32-bit value chosen at random by the client. When preparing a masked frame, the client MUST pick a fresh maskingkey from the set of allowed 32-bit values. To convert masked data into unmasked data, or vice versa, the following algorithm is applied.Octet i of the transformed data (”transformed-octet-i”) is the XOR of octet i of the original data (“original-octet-i”) with octet at index i modulo 4 of the masking key (“masking-key- octet-j”):
           j                   = i MOD 4
           transformed-octet-i = original-octet-i XOR masking-key-octet-j

    • Payload data: The “Payload data” is defined as “Extension data” concatenated with “Application data”. It is important to note that the representation of this data is binary, not ASCII characters.

Screenshot