A Primer on TCP

Transmission Control Protocol (TCP) is one of the backbones of the internet, and networking in general for application-level software. It’s important to know how it enables software to talk to each other, and how exactly you write software to utilize it. This post will discuss how TCP enables data to be passed between machines or software, and how the OS and applications make use of it effectively, but not about the inner machinations of TCP implementations, or the three-way handshake, or how flow control works, etc.

The stack

Below TCP lies the other layers of the common networking stack:

  1. Physical (Ethernet cables, WiFi wireless specifications)
  2. Data Link (Ethernet 802.3, WiFi 802.11)
  3. Network (IP)
  4. Transport (TCP, UDP)
  5. Application (HTTP, FTP)

It’s important to know how the Physical and Data Link layers operate, because it defines how everything up the stack operates as well. The key detail is that most data is transmitted as groups of bits.

Physical

At the Physical Layer, there actually is no grouping of bits, as far as I can recall. The Physical Layer is mainly concerned with how to get bits from one end to the other. In most implementations, this involves copper wires, with details on how to best shield them and what bit rates are acceptable for each host to synchronize with, in a way.

Now, it’s entirely possible to connect two computers with only a few wires and add only one more layer on top of this for a protocol of sorts, and have the machines talk to each other. They could use a constant stream of bits for this communication and it could work fine. But the restriction in this situation is that this connection is only allowed to connection the two machines, not one more, and only work for a single purpose. Adding another machine to the physical layer, or switching between sending information for one application vs. another brings up an important question: how do you determine which bits are for which machine, and for which application?

+---------+          +---------+
| Machine |----------| Machine |
|         |   Wire   |         |
+---------+          +---------+

Figure a) A physical network two machines

To answer the first question, we have the Data Link Layer. It’s common for hosts to connect to more than one other physical host using the same connector. Think of Ethernet hubs or switches, or WiFi connections between devices. They use the same physical medium, such as a series of copper wires, or a specific band of the electromagnetic spectrum in the air, and it’s typical for any transmitted bits to be received by all hosts connected and listening. Thus, the data link layer groups bits into frames of data, prefixed with a header to tell all of the hosts which host the frame is destined for (the MAC address). Now, that single stream of bits is separated into different chunks of (usually) constant sizes, with extra information padding it.

What does this mean? Well, we no longer have a simple stream of information we can use, but it begins to open up the possibility for more advanced networking capabilities. We can now direct information from one host machine to another over a physical medium. this doesn’t quite add up to the entire Internet quite yet, though.

+---------+               +---------+
| Machine |---------------| Machine |
|         |       |Wire   |         |
+---------+       |i      +---------+
                  |r
                  |e
                  |
            +---------+
            | Machine |
            |         |
            +---------+

Figure b) A physical network with more than two machines

+----------------------------+
| Header (22 bytes)          |
|----------------------------|
| Frame data (~1500 bytes)   |
|                            |
|----------------------------|
| Trailer (4 bytes)          |
+----------------------------+

Figure c) A rough Ethernet frame

Network

The next step is to be able to direct information not just from one host machine to another over a single physical medium, but over several physical mediums. After all, we can’t have every single computer hooked up to a giant ethernet switch for the entire world (for a lot of reasons that we can’t really get into). We separate out different groups of physical machines into their own networks. And this is where routers come into play. A router can connect one group of machines on an ethernet switch to another, and relay messages back and forth as needed.

This is where the Network Layer comes into play. We want to send a data link frame of data to the router, and have the router send it to a host on the other network it’s connected to. Well, the frame has the header information to have the router pick it up, but now we need more header information so that the router knows where to forward it to. This is represented by the IP packet, which contains IP addresses. We can now connect several networks together, with data link frames to get data from one physical machine to another, and network packets to get data from one machine on a network to another machine on another network.

+---------+          +-----------+
| Router  |----------| Machine   |
|         |   Wire   | Network 1 |
+---------+          +-----------+
     |
     | Wire
     |
+-----------+
| Machine   |
| Network 2 |
+-----------+

Figure d) Two networks, connected via a router

+-----------------------------+
| Eth Header (22 bytes)       |
|-----------------------------|
| IP Header (~20 bytes)       |
|-----------------------------|
| Packet data (~1480 bytes)   |
|                             |
|-----------------------------|
| Eth Trailer (4 bytes)       |
+-----------------------------+

Figure e) An Ethernet frame with an IP packet as the data

Transport Layer

Now that we can do that, the next step is getting information from one application on a machine to another _application on another machine. The third identifying mechanism for this is the port (such as 80 for HTTP, or 21 for FTP). In order to identify where a group of data bits is destined for on this layer, we have another header containing the port (and other information), representing a Transport Layer’s segment (for TCP, or datagram for UDP).

+-----------------------------+
| Eth Header (22 bytes)       |
|-----------------------------|
| IP Header (~20 bytes)       |
|-----------------------------|
| TCP Header (20-60 bytes)    |
|-----------------------------|
| Segment data (~1420 bytes)  |
|                             |
|-----------------------------|
| Eth Trailer (4 bytes)       |
+-----------------------------+

Figure f) An Ethernet frame, with an IP packet as the data, with a TCP segment as the IP packet's data

Well that was a nice recap of my CompSci classes, but what does this mean?

It means that, for all intents and purposes, data is always meant to be sent between computers in frames/packets/segments of data, and not a true stream of data. What’s more, IP packets may be dropped, or arrive in a different order than they were sent, to the receiving computer. So, not only was our original and simple stream of bits broken up but it might also not come at all, or come in random chunks. Putting all of that information back together is quite difficult, which is where TCP shines.

TCP handles the hard work of re-sending IP packets until they finally arrive, and putting everything back into the right order once they do arrive, but sending that data up to the application. In other words, TCP is restoring the stream-like behavior of the data we send back and forth. It has other goodies like minimizing congestion on the network and setting up the initial connection, but this article isn’t concerned with that.

The implementation of this behavior relies heavily on buffers of data. If TCP receives segment #2 before segment #1, then it needs to hold onto #2 for a short while before #1 arrives. Once that is done, it can copy the raw data of the segments in the correct order to a data buffer for the application. If these buffers are full when more segments come in, they may be dropped, and TCP handles that cleanly as well. However, when new data arrives in the data buffer for the application, the kernel can alert the application of this, the app can read the data into its own buffers or such, the buffer can then be cleared, and TCP can put more segments' data into the buffer. This buffer acts like a queue or a stream of data for the application.

The buffers discussed tend to be circular buffers, I would imagine.

Implications

Buffers are of fixed size, usually. If an application is sending a 10MB file’s worth of data to another application, the following happens:

  1. Assume the connection has been established and there has been no activity for a while, and that all buffers are pretty much empty.
  2. Application A streams data from a 10MB chunk of memory to the TCP connection’s data buffer for TCP to send.
  3. The streams work (roughly) by attempting to copy all 10MB of data to the buffer’s incoming function, specifying where in memory the data begins, and how many bytes it wants to send over.
    1. The buffer may only be 1KB. The function responds saying 1KB of information has been copied over.
    2. Later, once the buffer has some more space, the application attempts to again send the data, this time pointing to 1KB into the 10MB memory region, and giving a remaining length of 10MB-1KB.
  4. TCP is alerted to the new data to be sent, and takes a smaller chunk of data from the first buffer and puts it into a segment. Due to size restrictions based on the MTU of the link layer, and the header sizes of the network layer and transport layer, the maximum amount of data to be sent at any time maybe be quite small compared to the buffer.
  5. TCP sends out the segment, holding onto the original segment until it knows that the receiving end has correctly received it. Once that is done, it doesn’t hold onto the sent segment anymore, and can delete that data from the application’s data buffer.
  6. On the receiving end, TCP stores the incoming segment, acknowledges it, and puts the data into Application B’s data buffer, then drops the segment from memory.
  7. This continues, but let’s assume that Application B is taking a while to read the data. After time, Application B’s data buffer becomes full, which means the receiving TCP’s segment buffer is full, and the sending TCP segment buffer is full, and Application A’s data buffer is full.
  8. Application B reads a small chunk of the TCP data buffer.
  9. TCP is alerted of this, fills up the buffer with the next segment’s worth of data, tells the sending end that it can receives more data, the sending TCP sends the next segment in its buffer, frees up space in the Application A’s data buffer, and Application A can stream more data into the buffer.
  10. Repeat until all data has been sent.
+-------------+   +--------+   +---------+   +---------+   +---------+   +--------+   +-------------+
| Application |   | Data   |   | Segment |   | Network |   | Segment |   | Data   |   | Application |
| A           |-->| Buffer |-->| Buffer  |-->|         |-->| Buffer  |-->| Buffer |-->| B           |
+-------------+   +--------+   +---------+   +---------+   +---------+   +--------+   +-------------+

Figure g) The flow of data

I’m sure I messed up some information, and left out tons of critical points as well, but this gives you a rough estimate of what roles buffers play in TCP, which needs to ensure segments arrive in the correct order and that no data is lost.

The application

With this information in mind, we can now focus on what this means for the application use of a TCP connection.

Hiding away all of the implementation details, once an application has a new connection with another application over TCP, it is granted a send and receive circular buffer of data, usually of a fixed size.

A circular buffer, as a refresher, is a fixed size of memory, with a pointer to the beginning of the queued data, and a pointer to the end of the queued data. Writing into the buffer means navigating to the end pointer and writing data beginning at that point in memory, and possibly wrapping around from the end of the memory region to the beginning of the memory region, up to the start pointer. Reading data means collecting data from the start pointer, possibly wrapping around, up to the end pointer.

+-----------------------------------+
|H|I| | | | | | | | | | |A|B|C|E|F|G|
+-----------------------------------+
     ^                   ^
     |                   |
     End                 Start

Figure h) A circular buffer

From the application’s perspective, it has a buffer that it can write to for sending data, and an buffer it can read from for receiving data. When writing data, it asks to write as much information as it can, but is told only so many bytes of data have been sent. In a naive implementation, it may be a spinning/blocking while loop that constantly attempts to write all of the information (e.g. one entire HTTP request) until it has all been put into the buffer. The same works for receiving; keep reading an entire HTTP response until the entire header and the Content-Length’s worth of the body has been received. Applications may use a much larger buffer that grows to store the entire request before processing it (with exceptions).

===Server Response===
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf8
Content-Length: 39

<html><body>Hello, world!</body></html>

===Server Data Buffer===
</body></html>

===Server TCP Segment Buffer===
Segment 1: Hello, 
Segment 2: world!

===Client TCP Segment Buffer===
Segment 1: Content-Length: 39\n
Segment 2: <html><body>

===Client Data Buffer===
Content-Type: text/html; charset=utf8

===Client HTTP Response (so far)===
HTTP/1.1 200 OK

Figure i) An HTTP response in transit (broken up neatly by line for easy reading)

while currentResponseLength <= totalResponseLength {
	let data = connection.read(totalResponseLength - currentResponseLength)
	buffer.append(at: currentResponseLength, data)
	currentResponseLength += data.size
}
let response = Http.Parse(buffer)

Figure j) A blocking receiving network connection

But this naive implementation locks up that thread of execution, perhaps attempting to write data to a full buffer, and being told 0 bytes were written, thousands or millions of times before TCP can complete sending a new segment out, getting an acknowledgement that it was received, dropping the segment from its buffer, and freeing up some space in the application’s data buffer.

This is where non-blocking IO comes into play, which will be a new article.