The Internet: Under the Hood, Part 1
Background and Overview
5 min read
"Any sufficiently advanced technology is indistinguishable from magic." - Arthur C. Clark
My first memories of the internet include things like AOL instant messenger, the video game StarCraft, and dial-up modems, which is to say that I've been using the internet for a while. All this time I've known that it was not in fact magic powering the things we do online everyday but I didn't possess an understanding of how the internet works beyond a surface-level . Until recently that is.
With this series of posts, I want to share what I have learned about how the internet works from the perspective of a budding web developer. My focus will be on building a conceptual model of what actually happens when applications communicate with each other over networks. Some topics will be explored in more depth (HTTP) than others (ethernet) because a developer's influence is generally limited to how he or she utilizes the higher-level protocols. Below is a brief overview of the internet and a preview of what's to come in later posts.
At a fundamental level, a network is comprised of two or more devices that are connected in such a way that they can exchange data with each other. An example of this is when two devices are connected with a physical cable. Multiple computers in a home or office environment can communicate with each other with the use of a network bridging device such as a hub or switch, forming a local area network (LAN). The critical concept here is that the scope of communications is limited to devices that are connected to the network switch or hub, which imposes some geographic limitations (i.e. the range of your home Wi-Fi signal).
But we all know that the internet allows us to easily communicate with people on the other side of the planet. How does that happen? Routers are network devices that can route network traffic to other networks. In the context of a LAN, a router effectively acts as a gateway into and out of the network.
Cables + Rules = Internet?
The internet is essentially a network of networks, comprised of both physical infrastructure like the cables, switches and routers discussed above, as well as protocols that govern the exchange or transmission of data. To use an analogy, infrastructure is like the muscles in your body, handling the physical transmission of data in the form of electrical signals, light, or radio waves. The protocols are like your brain, in that they instruct the infrastructure on where and how to transmit that data. If we're missing one or the other, the data isn't getting very far.
To stretch this body analogy even further, the variety of communication conducted via the internet is so vast and complex that a single brain is not up the task of specifying the rules for every type of message. Instead, we have multiple protocols to address the different aspects of network communication. We also have multiple protocols to address the same aspect of network communication but in different ways, for different use cases.
You've probably heard of a number of these protocols: HTTP, FTP, IP, DNS, etc. The acronym soup can be dizzying but each of these protocols has a specific scope, or task in the overall context of networking.
Note: The internet != the web. Rather, the web is a service that can be accessed via the internet. Look for more on this in a later post on HTTP.
A Layered System
Understanding where each protocol fits into the overall scheme of networking can be daunting at first. Here, the Internet Protocol Suite, or "TCP/ IP" for short can be helpful in forming a mental model. From wikipedia:
The Internet protocol suite provides end-to-end data communication specifying how data should be packetized, addressed, transmitted, routed, and received. This functionality is organized into four abstraction layers, which classify all related protocols according to each protocol's scope of networking
- Application Layer: processes create user data and communicate this data to other applications (HTTP)
- Transport Layer: provides a channel for the communication needs of applications (TCP/ UDP)
- Internet Layer: exchanges data across network boundaries (IP)
- Link Layer: defines the networking methods within the scope of the local network (Ethernet)
Rather than laying out hard and fast rules, the TCP/ IP model is useful for gaining a broad-brush view of how a communication system works as a whole, and for modularizing different levels of responsibility within that system.
PDUs and Encapsulation
A Protocol Data Unit, or "PDU" for short, is a block of data transferred over a network. Each protocol refers to the PDU in its domain by a slightly different name. For example, IP transfers packets while TCP transfers segments. But the different names are not important at this point. In all cases, the concept is fairly straightforward: a PDU, regardless of which protocol we're working with, should contain at least a header and a data payload. The header contains protocol-specific metadata about the PDU. The data payload contains the data that we want to transport over the network.
At each level of our layered system, the data payload of a PDU consists of the entire PDU from the layer above. For example, the PDU generated at the transport layer becomes the data payload of the internet layer PDU. This is called encapsulation. The major benefit of this approach is that the different protocols do not have to understand how other protocols were implemented, rather lower level protocols simply provide a service to the protocol from the layer above.
At this point, we've built the scaffolding for our networking mental model. In future posts, I plan to do a deeper dive on each layer of the TCP/ IP model and the most relevant protocols therein. Thanks for reading!