CMU 15441 Computer Network

Circuit switch and packet switch

Packet-switched networks typically use a strategy called store-and-forward. As the name suggests, each node in a store-and-forward network first receives a complete packet over some link, stores the packet in its internal memory, and then forwards the complete packet to the next node. In contrast, a circuit-switched network first establishes a dedicated circuit across a sequence of links and then allows the source node to send a stream of bits across this circuit to a destination node. The major reason for using packet switching rather than circuit switching in a computer network is efficiency, discussed in the next subsection.

Gateway/Router

Given the enormous diversity of network types, we also need a way to interconnect disparate networks and links (i.e., deal with heterogeneity). Devices that perform this task, once called gateways, are now mostly known as routers, or alternatively, Layer 3 (L3) switches. The protocol that was invented to deal with interconnection of disparate network types, the Internet Protocol (IP), is the topic of our second section.

Datagrams vs Virtual Circuit Switching vs Souce routing https://book.systemsapproach.org/internetworking/switching.html

Datagrams: The idea behind datagrams is incredibly simple: You just include in every packet enough information to enable any switch to decide how to get it to its destination. That is, every packet contains the complete destination address. Consider the example network illustrated in Figure 57, in which the hosts have addresses A, B, C, and so on. To decide how to forward a packet, a switch consults a forwarding table (sometimes called a routing table), an example of which is depicted in Table 5. This particular table shows the forwarding information that switch 2 needs to forward datagrams in the example network. It is pretty easy to figure out such a table when you have a complete map of a simple network like that depicted here; we could imagine a network operator configuring the tables statically. It is a lot harder to create the forwarding tables in large, complex networks with dynamically changing topologies and multiple paths between destinations. That harder problem is known as routing and is the topic of a later section. We can think of routing as a process that takes place in the background so that, when a data packet turns up, we will have the right information in the forwarding table to be able to forward, or switch, the packet.
Virtual Circuit: A second technique for packet switching uses the concept of a virtual circuit (VC). This approach, which is also referred to as a connection-oriented model, requires setting up a virtual connection from the source host to the destination host before any data is sent. To understand how this works, consider Figure 58, where host A again wants to send packets to host B. We can think of this as a two-stage process. The first stage is “connection setup.” The second is data transfer. We consider each in turn. (Used in VPC, ATM)
Source Routing: A third approach to switching that uses neither virtual circuits nor conventional datagrams is known as source routing. The name derives from the fact that all the information about network topology that is required to switch a packet across the network is provided by the source host.

Ethernets

To begin, suppose you have a pair of Ethernets that you want to interconnect. One approach you might try is to put a repeater between them. This would not be a workable solution, however, if doing so exceeded the physical limitations of the Ethernet. (Recall that no more than two repeaters between any pair of hosts and no more than a total of 2500 m in length are allowed.) An alternative would be to put a node with a pair of Ethernet adaptors between the two Ethernets and have the node forward frames from one Ethernet to the other. This node would differ from a repeater, which operates on bits, not frames, and just blindly copies the bits received on one interface to another. Instead, this node would fully implement the Ethernet’s collision detection and media access protocols on each interface. Hence, the length and number-of-host restrictions of the Ethernet, which are all about managing collisions, would not apply to the combined pair of Ethernets connected in this way. This device operates in promiscuous mode, accepting all frames transmitted on either of the Ethernets, and forwarding them to the other.

VLANs:

One limitation of switches is that they do not scale. It is not realistic to connect more than a few switches, where in practice few typically means “tens of.” One reason for this is that the spanning tree algorithm scales linearly; that is, there is no provision for imposing a hierarchy on the set of switches. A second reason is that switches forward all broadcast frames. While it is reasonable for all hosts within a limited setting (say, a department) to see each other’s broadcast messages, it is unlikely that all the hosts in a larger environment (say, a large company or university) would want to have to be bothered by each other’s broadcast messages. Said another way, broadcast does not scale, and as a consequence L2-based networks do not scale.

Internet (IP)

Datagram Delivery

IP address format:

How does router deliver packages based on IP address?

Subnetting and Classless Addressing

We have now seen the basic mechanisms that IP provides for dealing with both heterogeneity and scale. On the issue of heterogeneity, IP begins by defining a best-effort service model that makes minimal assumptions about the underlying networks; most notably, this service model is based on unreliable datagrams. IP then makes two important additions to this starting point: (1) a common packet format (fragmentation/reassembly is the mechanism that makes this format work over networks with different MTUs) and (2) a global address space for identifying all hosts (ARP is the mechanism that makes this global address space work over networks with different physical addressing schemes). On the issue of scale, IP uses hierarchical aggregation to reduce the amount of information needed to forward packets. Specifically, IP addresses are partitioned into network and host components, with packets first routed toward the destination network and then delivered to the correct host on that network

Does every host have a public IP address?

Address Translation (ARP)

How to get a datagram to a particular host or router on that network. The main issue is that IP datagrams contain IP addresses, but the physical interface hardware on the host or router to which you want to send the datagram only understands the addressing scheme of that particular network. Thus, we need to translate the IP address to a link-level address that makes sense on this network (e.g., a 48-bit Ethernet address). We can then encapsulate the IP datagram inside a frame that contains that link-level address and send it either to the ultimate destination or to a router that promises to forward the datagram toward the ultimate destination.

DHCP

DHCP is to help a host dynamically get an IP address when it starts.

ICMP

ICMP defines a collection of error messages that are sent back to the source host in IP layer.

VPN

We want more controlled connectivity.

One implementation way is to use Virtual Circuit. Another is to use IP tunnel. We can think of an IP tunnel as a virtual point-to-point link between a pair of nodes that are actually separated by an arbitrary number of networks. The virtual link is created within the router at the entrance to the tunnel by providing it with the IP address of the router at the far end of the tunnel. Whenever the router at the entrance of the tunnel wants to send a packet over this virtual link, it encapsulates the packet inside an IP datagram. The destination address in the IP header is the address of the router at the far end of the tunnel, while the source address is that of the encapsulating router. (simply wraps the packet inside an IP header that is destined for the foreign agent and transmits it into the internetwork). ssh tunnel: https://www.youtube.com/watch?v=N8f5zv9UUMI

NetworkNum	NextHop
1	Interface 0
2	Virtual interface 0
Default	Interface 1

You might wonder why anyone would want to go to all the trouble of creating a tunnel and changing the encapsulation of a packet as it goes across an internetwork. One reason is security. Supplemented with encryption, a tunnel can become a very private sort of link across a public network. Another reason may be that R1 and R2 have some capabilities that are not widely available in the intervening networks, such as multicast routing. By connecting these routers with a tunnel, we can build a virtual network in which all the routers with this capability appear to be directly connected. A third reason to build tunnels is to carry packets from protocols other than IP across an IP network. As long as the routers at either end of the tunnel know how to handle these other protocols, the IP tunnel looks to them like a point-to-point link over which they can send non-IP packets. Tunnels also provide a mechanism by which we can force a packet to be delivered to a particular place even if its original header—the one that gets encapsulated inside the tunnel header—might suggest that it should go somewhere else. Thus, we see that tunneling is a powerful and quite general technique for building virtual links across internetworks. So general, in fact, that the technique recurses, with the most common use case being to tunnel IP over IP.

https://security.stackexchange.com/questions/237967/what-is-the-privacy-benefit-of-vpns

VPN to mitigate DDOS attacks: https://www.security.org/vpn/ddos/#:~:text=A%20primary%20benefit%20of%20a,internet%20service%20provider%20(ISP).

Routing:

Intradomain: (small network)

Given cost of any two nodes, to find the shortest path.

Distance-Vector (RIP)

Bellman-Ford distributed algo to sycn among nodes. https://book.systemsapproach.org/internetworking/routing.html#distance-vector-rip

Link State (OSPF)

A protocal to create a complete topology of the network and then calculate the shortest path.

Metrics:

How to measure cost. A variant of Delay = (DepartTime - ArrivalTime) + TransmissionTime + Latency

Interdomain: (Scaling to Billions)

Even though each router does not need to know about all the hosts connected to the internet, it does, in the model described so far, need to know about all the networks connected to the internet

ABR: area border router;

BGP: Let’s return to the real question: How does all this help us to build scalable networks? First, the number of nodes participating in BGP is on the order of the number of autonomous systems, which is much smaller than the number of networks. Second, finding a good interdomain route is only a matter of finding a path to the right border router, of which there are only a few per AS. Thus, we have neatly subdivided the routing problem into manageable parts, once again using a new level of hierarchy to increase scalability. The complexity of interdomain routing is now on the order of the number of autonomous systems, and the complexity of intradomain routing is on the order of the number of networks in a single AS.

iBGP & eBGP combined together to propagate border rounter info both across AS and inside AS.

NAT: https://www.comptia.org/content/guides/what-is-network-address-translation

Several internal addresses can be NATed to only one or a few external addresses by using a feature called Port Address Translation (PAT) which is also referred to as “overload”, a subset of NAT functionality.

PAT uses unique source port numbers on the Inside Global IP address to distinguish between translations. Because the port number is encoded in 16 bits, the total number could theoretically be as high as 65,536 per IP address.

PAT will attempt to preserve the original source port, if this source port is already allocated PAT will attempt to find the first available port number starting from the beginning of the appropriate port group 0-511, 512-1023 or 1024-65535.

If there is still no port available from the appropriate group and more than one IP address is configured, PAT will move to the next IP address and try to allocate the original source port again. This continues until it runs out of available ports and IP addresses.

example

E2E protocols:

TCP&UDP

Flow control involves preventing senders from over-running the capacity of

. Congestion control involves preventing too much data from being injected into the network, thereby causing switches or links to become overloaded.

RPC

DNS:

https://book.systemsapproach.org/applications/infrastructure.html

9.1 Traditional Applications.

SMTP

Encoding: email is encoded in ASCII, the problem is that, for some data type (a JPEG image), any given 8-bit byte in the image might contain one of 256 different values. Only a subset of these values are valid ASCII characters. It is important that email messages contain only ASCII, because they might pass through a number of intermediate systems (gateways, as described below) that assume all email is ASCII and would corrupt the message if it contained non-ASCII characters. To address this issue. MIME uses a strightforward encoding of binary data into the ASCII charcter set. The encoding is called base64. The idea is to map every three bytes of the original binary data into four ASCII charaters. This is done by grouping the binary data into 24-bit units and breaking each such unit into 4 6-bit pieces. Each 6-bit piece maps onto one of 64 valid ASCII characters; for example, 0 maps onto A, 1 maps onto B, and so on. If you look at a message that has been encoded using the baee64 encoding scheme, you’ll notice only the 52 upper- and lowercase letters. the 10 digits 0 through 9, and the special characters + and /. There are the first 64 values in the ASCII character set.

9.4 Overlay Networks

In the last few years, however, the distinction between packet forwarding and application processing has become less clear

New application are being distributed across the Internet, and in many cases there applications make their own fowarding decisions.

Overlay is a logical network implemented on top of some underlying network.

Why VPN:

VPNs, as described in Section 3.3, were one early success for virtual networking. They allowed carriers to present corporate customers with the illusion that they had their own private network, even though in reality they were sharing underlying links and switches with many other users

However, there are many situations where more controlled connectivity is required. An important example of such a situation is the virtual private network (VPN).

P2P network

So we are back to the original question: What’s interesting about peer-to-peer networks? One answer is that both the process of locating an object of interest and the process of downloading that object onto your local machine happen without your having to contact a centralized authority, and at the same time the system is able to scale to millions of nodes. A peer-to-peer system that can accomplish these two tasks in a decentralized manner turns out to be an overlay network, where the nodes are those hosts that are willing to share objects of interest (e.g., music and other assorted files), and the links (tunnels) connecting these nodes represent the sequence of machines that you have to visit to track down the object you want. This description will become clearer after we look at two examples.

Chord

UIUC cloud computing: https://www.coursera.org/learn/cloud-computing/home/week/3

Content Distribution Networks

We have already seen how HTTP running over TCP allows web browsers to retrieve pages from web servers. However, anyone who has waited an eternity for a Web page to return knows that the system is far from perfect. Considering that the backbone of the Internet is now constructed from 40-Gbps links, it’s not obvious why this should happen. It is generally agreed that when it comes to downloading Web pages there are four potential bottlenecks in the system:

The first mile. The Internet may have high-capacity links in it, but that doesn’t help you download a Web page any faster when you’re connected by a 1.5Mbps DSL line or a poorly performing wireless link.
The last mile. The link that connects the server to the Internet can be overloaded by too many requests, even if the aggregate bandwidth of that link is quite high.
The server itself. A server has a finite amount of resources (CPU, memory, disk bandwidth, etc.) and can be overloaded by too many concurrent requests.
Peering points. The handful of ISPs that collectively implement the backbone of the Internet may internally have high-bandwidth pipes, but they have little motivation to provide high-capacity connectivity to their peers. If you are connected to ISP A and the server is connected to ISP B, then the page you request may get dropped at the point where A and B peer with each other.

The idea of a CDN is to geographically distribute a collection of server surrogates that cache pages normally maintained in some set of backend servers. Thus, rather than having millions of users wait forever to contact when a big news story breaks—such a situation is known as a flash crowd—it is possible to spread this load across many servers. Moreover, rather than having to traverse multiple ISPs to reach www.cnn.com, if these surrogate servers happen to be spread across all the backbone ISPs, then it should be possible to reach one without having to cross a peering point. Clearly, maintaining thousands of surrogate servers all over the Internet is too expensive for any one site that wants to provide better access to its Web pages. Commercial CDNs provide this service for many sites, thereby amortizing the cost across many customers.

Although we call them surrogate servers, in fact, they can just as correctly be viewed as caches. If they don’t have a page that has been requested by a client, they ask the backend server for it. In practice, however, the backend servers proactively replicate their data across the surrogates rather than wait for surrogates to request it on demand. It’s also the case that only static pages, as opposed to dynamic content, are distributed across the surrogates. Clients have to go to the backend server for any content that either changes frequently (e.g., sports scores and stock quotes) or is produced as the result of some computation (e.g., a database query).

Mechanisms

First, redirection could be implemented by augmenting DNS to return different server addresses to clients
when returning an embedded link, the server can rewrite the URL, thereby effectively pointing the client at the most appropriate server for that specific object
Another possibility is to use the HTTP redirect feature: The client sends a request message to a server, which responds with a new (better) server that the client should contact for the page (the server could be a proxy coordinated with client or DNS server)

Policies

network proximity:
Locality: requests for the same URL are forwarded to the same servers.

An alternative is to use the same consistent hashing algorithm discussed in the previous section. Specifically, each redirector first hashes every server into the unit circle. Then, for each URL that arrives, the redirector also hashes the URL to a value on the unit circle, and the URL is assigned to the server that lies closest on the circle to its hash value. If a node fails in this scheme, its load shifts to its neighbors (on the unit circle), so the addition or removal of a server only causes local changes in request assignments. Note that unlike the peer-to-peer case, where a message is routed from one node to another in order to find the server whose ID is closest to the objects, each redirector knows how the set of servers map onto the unit circle, so they can each, independently, select the “nearest” one.

Amazon https://aws.amazon.com/cloudfront/