How Do Routers Work, Really? | Hacker News


Just watched the whole video, amazing, nostalgic but also subtly wrong in a number of annoying ways!

For someone with only a passing understanding of router innards, what should I watch out for from this talk to avoid coming away with an incorrect understanding of how things work?

In my experience programmers are very friendly and kind and are always eager to help everybody understand what they do understand about programming.

In contrast with that, people from the “networking” world often look with despise to people who don’t understand what they do and want to prevent them from learning, they love to just say what is wrong and never point to what would be right and why, they also will most of the times just keep saying they must hire someone to do the job instead of learning.

That is my experience on Networking Stack Exchange, on ##networking channel on Freenode and also the impression I have from a friend that deals with networking, although I try to not talk about it with him for the reasons above.

I’ve been working alongside network engineers for thirty years in a variety of ISP, IXP, RIR, corporate, carrier, DC and public cloud environments, and do not recognise the people you are describing. These colleagues run the usual gamut of human personalities, but invariably the most respected and senior contributors are those that enable others through sharing their knowledge and experience. They were never anything but helpful and patient even when I was just getting started and full of basic questions about BGP and mixing up my fibre modes.

However, I have also contributed to Stack Overflow and managed IRC channels and servers. The negative traits you’ve described do correlate to the hostile attitudes endemic within many StackExchange and IRC communities. They are not correlated to my workplace experience of network engineers.

There’s definitely a personality type that is attracted to networking and security who is motivated by control. Usually the end up in management roles. I’ve run into my share of people like this, more so than with software people.

On the whole, network engineers are a cool bunch though. They’re often called in to make stuff work without any real background or understanding of wtf is going on in advance. As a profession, they don’t get the respect they are do.

People do all kinds of things for different reasons, and when talking about large groups — network engineers, security professionals, &c — you really can’t boil down the group and distill the traits of the individual.

Early in my career, when I did a mixture of systems administration and security, my mentor on both of those things was a super-chill, skinny-as-they-come mega-pothead. Exact opposite of a control freak.

Dude was wicked smart, though, and the security mindset that he helped me build has paid dividends over the years.

Personally, I went into management precisely because I worked for a few “control freak” types, and felt that I had a sort of moral duty to build teams free of that sort of environment, even if it meant that I had to swap my text editor for a calendar.

I know many other managers with a similar backstory. None of us want to be the PHB.

As an aside, if anyone reading this is looking at management: look to nudge, rather than control. We learn by making mistakes, and sometimes, you actually do need a report — or maybe even the entire team! — to make a mess and clean it up, because the process of doing so will make them stronger, and will benefit your organization in the medium-to-long term.

As with all things, there are trade-offs to be made and balances to be struck! But one of the biggest mistakes I see new managers make is investing the bulk of their energy in preventing mistakes, instead of building a team that can recover-and-adapt quickly.

(Also, a nit, which you might not have noticed: “respect they are due”)

To counter your anecdote with my own, I’ve been working for 10 years on a team which has seen Network Engineers, Systems Engineers and Software Engineers come and go, and I’ve seen three(!) very arrogant Software Engineers who as it turned out didn’t know what they were doing. But the same goes for other disciplines, we’ve had a straight up antisocial Network Engineer who only worked from home and never answered his phone (he did excellent diagrams though). We’ve had an arrogant Systems Engineer that refused to document anything. These people were fired, but my point is that you should blame the person rather than the discipline.

You’ve described a typical personality on StackExchange – nothing to do with networking people.

That typical personality doesn’t show up at all in StackOverFlow, ServerFault, Databases, WebApps, Bitcoin, Mathematics and other StackExchange communities.

Or if they show up, it’s counterbalanced by a giant number of nice programmers willing to help. While on the Networking StackExchange they are the only ones.

>In my experience programmers are very friendly and kind and are always eager to help everybody understand what they do understand about programming.

You haven’t met enough programmers, then. 🙂

“accidents happen [in LAN]”, “at least the router is exact (for the most part)”

What does this mean?

Then towards the end… “the packet is recycled”. What?

I don’t know about packet recycling, but at least with the ‘for the most part’, packet collision and packet loss used to be a lot more common for some reason. Nowadays the only times I see them on local networks is when cables get badly kinked or terminations are poorly done.

I watched this decades ago and forgot just enough about it that I couldn’t find it again recently when I tried. Thank you

Haha thanks for sharing. Interesting how much emphasis there is on “the ping of death” compared to literally any other exploit. Does anyone know if this was really such a big problem when this video came out?

What I remember is that the ping of death was extremely surprising in terms of the number of OSes affected, the ease of exploiting it, and the super-noticeable consequence of instantly crashing the target machine. And it came out at a time when there wasn’t as much vulnerability research and very few extensively cross-platform vulnerabilities.

Also, with the ping of death, the only way to use it was to very noticeably crash systems — not to secretly build a botnet or something, as might have been done with RCE vulnerabilities.

It was popular for booting people off IRC, but there were other exploits around the same era that did the same such as land and teardrop.

It wasn’t super notable. What was more horrific was the amount of windows machines that had tcp ports for various windows services open to the internet that led to not only crashing but remote compromise and rootkits/botnet stuff. That went on for years and only got mitigated by people deploying routers with fw/Nat functionality.

I do remember hearing about it causing issues here and there in the 90s/early 00s, but rarely. Never hear about it anymore.

But I do remember AppleTalk causing issues more frequently on a network I helped manage that had radio studios with two Macs per studio, but mostly Windows PCs through the rest of the building.

That place also had a Macintosh 512K running its phone system until around 2010!

>If that is the case, my condolences.

As a software engineer working on IOS-XR, that gave me a chuckle :p

In the case of enterprise- and SP-grade routers, the data-plane – i.e., where the actual forwarding and lookups take place – runs entirely on a dedicated network processor (NP), mainly for performance reasons. Information on the NP is populated by the router’s operating system in response to user configuration, network topology changes, or protocol state updates. On the other hand, the control plane runs mainly on the CPU(s). This is required so that the protocols running on the router OS (e.g., BGP) can receive and send out updates based on their state machines.

>As a software engineer working on IOS-XR, that gave me a chuckle :p

Good good 😀

Thanks for the clear data plane / control plane explanation, that’s a good way to summarise the distinction. May I link to it from the article?

I think the simplest way for people familiar with PCs to visualize it are the FirePOWER devices. Network cards plugged into some slot have embedded chips which can be programmed to, say, filter specific kinds of traffic, or pass it onto the host CPU for more advanced logic. While the machine’s central CPU runs a web interface, manages local databases, downloads updates, manages clusters, records metrics, etc. And either can even be hot-pluggable, interchangeable blades in a larger machine chassis.

Protocol-wise, isn’t it common now for the NP on higher end stuff to handle L4 and higher protocols? Or are those still largely managed by the CPU?

Yeah, NPs can handle L4 protocols, but I believe it’s usually a hybrid approach where the logic is split between CPU and NP.

NPs are generally ASICs so it depends on how flexible the code needs to be that is being executed. If it gets outside of the parameters of what the ASICs can handle it can severely limit performance.

An interesting side effect is a lot of the time the tools running on the main CPU don’t have visibility into what is happening on the ASICs as the code doesn’t have hooks into the data path at all- it compiles the code and sends it down but it doesn’t participate much after it starts executing.

>Note that the next hop’s IP address is in the router’s memory only: it does not appear in the packet at any time.

This clears some points that always puzzled me:

If the gateway is identified by an IP address, but the destination host is also an IP address, which address exactly is put into the packet? And how can a packet be routed if the gateway’s IP is itself part of the subnet that’s supposed to be routed to it. (E.g. 192.168.0.0/24 with default gateway 192.168.0.1)

So the answer is, if I send the packet to host 1.1.1.1 but the routing table has 2.2.2.2 as the next hop, the packet will have 1.1.1.1 as the destination in the IP part but theMAC of 2.2.2.2as destination of the Ethernet part (or equivalent). It doesn’t matter which subnet the next hop’s IP is in, as the routing table isn’t consulted for it anyway – it’s only used in ARP)

This leaves the question, why the indirection and why the mucking around with ARP and IPs that are never used as the destination to anything?

Couldn’t you simply put the next hop’s MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

To give a simplified but largely accurate summation: IP and Ethernet were each designed in different time periods and largely without knowledge of the other. Ethernet was historically used in such a fashion that multiple hosts (more than 2) occupied the same collision domain, that is, they were physically connected to the same cable, or through hubs that repeated frames to all interfaces without routing. This means that Ethernet required an addressing scheme so that hosts on the same media knew which frames were for them (higher-level protocols at the time did not necessarily handle this).

Ethernet’s addressing scheme was not designed to accommodate large hierarchical networks and so is unsuitable for the IP use case, but more importantly, IP was designed completely separately from Ethernet, and was not used primarily with Ethernet until later, so IP could not “assume” that the layer below it handled addressing (typically there was either no layer below [point-to-point] or only a very simple one).

The result is that Ethernet and IP duplicate functionality to some extent. It is theoretically possible, although not common, to build a network which uses only layer 3 routing without any reliance on Ethernet addressing. A significant reason this is rare, arguablythemost significant reason, is that IP is now carried over Ethernet a significant majority of the time and L2 Ethernet devices (like switches) require the use of Ethernet addressing for the network to function. You usually see “pure IP” in virtual networking environments where the IP is encapsulated in, well, more IP, but even then Ethernet frames are sometimes used because, well, just like network hardware, operating system network stacks generally expect them (examine, e.g., the linux bridge implementation). It is completely possible to build network stacks and network appliances which do not require the use of Ethernet but it is expensive and there’s not much of a motivation to do so, and you’d run into issues with any kind of equipment not so designed.

Addressing is not the only duplicate functionality between Ethernet and IP, and it’s one of the less significant ones since Ethernet addressing does provide utility even if not strictly required. Ethernet frames are checksummed, and IP headers are also checksummed, even though the Ethernet checksum is already over them. The IP header checksum exists because IP was historically carried over lower layers that did not provide integrity checking. This is basically pure wasted space in typical networks, so IPv6 drops the header checksum to remove the overhead.

In general, though, network protocols tend to make more sense when you have some awareness of the history of their development, as when you try to view the modern internet as an elegant, monolithic design as some authors attempt, a lot of things won’t make sense because they simply are that way for historic reasons. Ethernet and IP were each designed in the ’70s, but separately, and their use has accumulated significant cruft since then, including some radical changes in the ways that they were used (for example the transition of Ethernet from shared media to point-to-point, which occurred de facto earlier but became largely formalized with the introduction of GbE which prohibits more than two hosts in a collision domain, and of course ironically the introduction of multiple hosts in a collision domain as an even larger issue with wireless protocols, which requires additional handling below, or actually in lieu of, the ethernet layer, 802.11 being a replacement for ethernet that happens to behave similarly in many ways for compatibility).

Finally, the OSI model is something that tends to add complexity and confusion to these discussions, which is why I doggedly discourage its use in teaching. The OSI Model describes the OSI protocols, which were contemporaries competitors to the TCP/IP protocols. Arguably, one of the reasons that the OSI protocols fell out of use (in favor of IP) is exactly because they assumed seven layers, and each was fairly complex. Some OSI protocols are still in use, for example IS-IS (OSI layer 2) in the telecom industry and some backbone IP transit, but in niches and generally being replaced with IP. IP is intentionally simpler, and can be fully described using four layers, what’s usually referred to as the TCP/IP model.

The OSI layers do not map 1:1 to the TCP/IP layers, even if you simply ignore the ones that map more poorly as instructors often do. Even worse, many instructors and textbook authors feel such a strong compulsion to map modern networks to the obsolete OSI model that they cram application-layer protocols into OSI layers 5 and 6 in order to have examples of them. I have seen cases as extreme as an instructor claiming that HTTP cookies represent the session layer. This kind of thing is nonsense and hinders understanding rather than contributing to it. If the OSI model is taught (not a bad idea at all as students should realize that TCP/IP is merely the popular way, and certainly not the only way), it should be taught specifically by contrasting it to the different TCP/IP model. Unfortunately few instructors and website authors today seem to even be aware that the OSI protocol stack existed separately from IP.

And, if you are wondering, yes, Ethernet can be used in a switched network completely independently from IP (although not really in a routed network unless you are generous about how you define routing). This was more common decades ago, the only equipment I have ever personally encountered that used bare Ethernet was a very outdated CNC setup.

Yes, that essay is outstanding! I largely left out mention of IPv6 because it’s a whole different can of worms, but as that article presents, it aims to make the situation radically simpler but in practice, well, doesn’t. Cue the XKCD about making a new standard.

A bit ago I touched on various competitors to IP on my blog-thing (https://computer.rip/) but I need to find time to give the topic a more thorough treatment. As with a lot of fields, you can probably learn more about what really matters in networking by studying the protocols that didn’t make it than by studying the ones that did. It’s hard for most people that entered the computing field in the last couple of decades to imagine IP and TCP/UDP not being the clearly correct design, but in the ’80s to early ’90s the expansion of microcomputers was accompanied by a flourishing of network protocols for use with them. There are multiple reasons that TCP/IP over Ethernet eventually became dominant but in the end it’s mostly happenstance, it’s pretty easy to imagine XNS becoming the norm if ARPANET had gone a little differently. Imagine the problems we’d be talking about today in that parallel universe, XNSv6 adoption is such a mess.

I’m honestly a bit sad to see the “all-IP” trend working its way through the telecom industry. It’s reducing use of protocols like MPLS that I think are very cool. But now software-defined networking brings a whole new world of strange network technologies that we’ll find ill-advised in fifty years.

Besides the choice between using IP or “bare” Ethernet, there are alternatives to IP as the layer on top of Ethernet that are used in routed networks. Two of the more-common examples historically are Novell Netware (IPX/SPX) and DECnet.

Another historic alternative was VINES IP which was used by Banyan Vines systems. Like IPX/SPX it was inspired by XNS.

What makes it particularly interesting is that Vines was based upon AT&T UNIX System V which means it is was a widely deployed commercial Unix implementation which did not use TCP/IP for it’s network stack.

Beautiful rant.

Request. Do TLS next (if it’s in your wheelhouse). I’ve been looking for a good summary of ECC and selected curves in tls 1.2

I don’t know, it’s hard to get that far with TLS because you get mired down pointing out all the problems and failed potential solutions in the CA infrastructure first. 😉

>It doesn’t matter which subnet the next hop’s IP is in, as the routing table isn’t consulted for it anyway – it’s only used in ARP)

You can only ARP for hosts on the same subnet as you, terrible hacks excluded.

>This leaves the question, why the indirection and why the mucking around with ARP and IPs that are never used as the destination to anything?

Because it was designed in layers so that different layers could be replaced. We didn’t know we’d end up with mostly only IP and Ethernet in LANs back then.

>Couldn’t you simply put the next hop’s MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

It could have been done in any number of ways. It’s not that much complexity through and it would bake Ethernet MACs into everything IP, even in the cases where it’s not needed.

Fiddling with ARO comes up more often that you’d think, especially as a quick easy way to handle HA.

IP addresses sharing a route have a common prefix. This is not true of MAC addresses. They are allocated essentially randomly. If you wanted to route solely using MAC addresses, every router in the world would need a lookup table containing every MAC address, route aggregation would be impossible

That’s not /the/ reason why a MAC address is involved. It’s because that’s the address for a physical device at a lower layer in the stack. As others mention, IP is media-independent, it cannot depend on a lower tier addressing scheme without becoming fused to that medium

In an alternative universe where Novell continued to dominate networking, we’d be talking about how IPX uses the MAC directly to ID the host and had a separate network ID to uniquely identify the LAN the host is connected to.

It is actually a pretty reasonable way of integrating hardware MACs directly into the internetworking stack.

>Couldn’t you simply put the next hop’s MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

One reason why using an IP is still important is the IP can move to a different router, so the MAC for that IP can change. Eg if a hardware swapout was performed, or the network admin manually moved the IP, or some HA system that dynamically moves IPs to other routers (and isn’t VRRP, which uses a virtual MAC).

Usability: it’s a lot easier imo to read a routing table with IP next hop than MAC as you don’t have to remember what MAC every machine is. The IP also conveys visually which port the traffic is (probably) going out. Eg
Port 1 – 192.168.1.0/24
Port 2 – 192.168.2.0/24

If my next hop for 1.1.1.1 is via 192.168.2.254 I know immediately it’s going out port 2. If it was a MAC I’d have no clue unless I memorised all MACs in my networks.

The reason for that is because IP is not ‘integrated’ with layer-2 tech like Ethernet. In fact, for a very long time Ethernet was only really used on local networks. Point-to-Point Protocol (PPP) [1] is a completely separate data link layer technology with no real concept of MAC addresses, because there can only be two devices on the bus.

Most of the very expensive ‘multilayer’ switches [2] do a form of this where they associate a next-hop IP with a MAC address entry and store that in the TCAM or data layer. It’s not used as much because Cisco has a ton of patents on this type of technology, and also because general purpose hardware has gotten quick enough that it’s not as important as it was ~15 years ago…

[1]https://en.wikipedia.org/wiki/Point-to-Point_Protocol

[2]https://en.wikipedia.org/wiki/Multilayer_switch#Layer-3_swit…

>Couldn’t you simply put the next hop’s MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

This is exactly what Cisco Express Forwarding (and similar layer 3 switching technology) does. The adjacency table keeps all of the layer 2 information to be used for fast routing of packets. This was implemented on the CPU back in the day, but now usually done in the switching ASICs.

However, you still need layer 3 next-hop information in the routing table (and dynamic routing protocols). The reason being 1. ethernet is one of many layer 2 technologies that IP supports and 2. MAC addresses can change for a particular IP address due to various reasons including hardware replacement and HA.

You can have network segments which do not use ethernet and therefor have no MAC addresses, but still use IP addressing and need to be routable. It doesn’t make sense to tie the next-hop in a table to MAC addresses which are an implementation detail on a lower layer. A good, popular, example of this you can test yourself without obscure hardware is wireguard.

A lot of protocols don’t end up using Ethernet as the physical layer, even ones you still use today.

Qemu (and I think Docker too?) use SLIRP internally for access between VMs which is ultimately an IP layer bridge.

On the WAN side (at least at one point, I could be out of date here) they didn’t use Ethernet, but instead IP layer routing as well, on top of stuff like PPP and SONET.

>Couldn’t you simply put the next hop’s MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

Several others have already answered your question — the key points being “the OSI model” (e.g., layer 2 vs. layer 3) and the multitude of other layer 2 protocols which don’t use MAC addresses — so I’ll mention one other important detail.

Although the Ethernet protocol itself has been around for ~40 years now, for the majority of that time it mostly only existed “in the LAN”.

In fact, when it comes to “on the WAN”, Ethernet is still a relative newcomer. Before ~15 years or so ago, pretty much no one was using Ethernet “on the WAN” — instead, it was X.25 and frame relay and HDLC and PPP and ATM and POS on analog “leased lines” and ISDN and DS-{1,3}s and OC-{3,12,48,192}s.

Along came MPLS, MetroE, EoMPLS, Carrier Ethernet, etc., and soon enough everyone was “tunneling” Ethernet between sites but we were still mostly using those “legacy” protocols “on the WAN”.

Over time, technology advanced to the point that “native” Ethernet eventually became feasible “on the WAN” — in no small part because 1) Ethernet speeds kept increasing by an order of magnitude (!) every few years, 2) standardizing on Ethernet everywhere drove the costs down, and 3) Ethernet was “easy” (compared to all of those “WAN” protocols we were using up until this point) — everybody already “knew” Ethernet because, by this time, everybody had been using it in their LANs for a decade or more!

Although ATM and SONET (at least) are still around in (some parts of) some service provider networks, they are now the exception and Ethernet — to butcher a phrase — “has eaten the world” but, as I mentioned, Ethernet “on the WAN” is still a relatively new thing.

So, I’ll offer an alternative answer to your question:

>Couldn’t you simply put the next hop’s MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

Sure, if you had done it about 30 years earlier!

Historically, some links didn’t have MAC addresses and different link types have different address types so it’s easier for the routing protocols to work in terms of IP addresses.

>Couldn’t you simply put the next hop’s MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

No, because MAC address only makes sense for ethernet-like layer 2 protocols and IP can run over any number of layer 2 protocols, including point to point protocols and some of the point to point protocols.

If you would put next hops MAC address in the routing table and the device fails and needs to be replaced, all the routing tables would need to be rewritten, because MACs are supposed to be unique. You couldn’t just take a spare device, configure it accordingly and be done with it.

IPV6 commonly does that. Your next hop is installed as a link-local fe80-entry which is derived from the mac address. Not exactly what you’re after, but removes the IP numbering need.

Hi, I’m the author. Uh hi w00t how why what’s it doing here?! 😀

I promise to make it better and actually finish it now! Check back in a day or two I guess? Also I should post the code I promised. Hello from the ADHD squirrel!

I would suggest expanding your terminology section. I know almost nothing about routers and I’m lost in the first sentence of the High Level Overview section.

 "A switch (or an L2 switch :-) ) is an L2-only thing."

I don’t know what L2 means. I suspect a definition of the various levels would expand the audience for this post.

It’s important to keep layering in mind when talking to people outside the IETF, but the IETF itself is not impressed:

https://en.wikipedia.org/wiki/Internet_protocol_suite#Compar…

>The IETF protocol development effort is not concerned with strict layering. Some of its protocols may not fit cleanly into the OSI model, although RFCs sometimes refer to it and often use the old OSI layer numbers. The IETF has repeatedly stated that Internet protocol and architecture development is not intended to be OSI-compliant. RFC 3439, referring to the Internet architecture, contains a section entitled: “Layering Considered Harmful”.

Anyway: People sometimes like to pretend that OSI is a model and TCP/IP implements the model, forgetting that OSI is/was a protocol stack and TCP/IP has no interest in being “compliant” with any other protocol stack to the extent it mimics its layering architecture.

This is one of those cases where both sides have some insight depending on viewpoint. The OSI model is like every other model. It isn’t reality (at least in TCP/IP) but instead is a helpful abstraction esp. around troubleshooting and understanding networking concepts. There comes a point where the model breaks down but that doesn’t mean it’s an unhelpful model just that it isn’t a complete picture. I try and work networking problems through the OSI layer model but am aware when things don’t really fit well into it (MPLS, MSS, ARP, Layer 5-7).

I agree with you, except that the use of the OSI model seems to be distorting history: TCP/IP went up against OSI and won, even though OSI was favored, because TCP/IP could get working systems faster. That’s a lesson which should be learned, but it gets obscured if you think that TCP/IP implemented OSI and there never was a competition.

Plus, the OSI model is rather complicated; there’s a “TCP/IP Model” with four layers which is a lot simpler:

https://www.geeksforgeeks.org/tcp-ip-model/

>Process/Application Layer

>Host-to-Host/Transport Layer

>Internet Layer

>Network Access/Link Layer

(This seems to be the RFC 1122 model, BTW.)

RFC 1122 and RFC 871 each have models, too.

RFC 871 has:

>Application/Process

>Host-to-host

>Network interface

https://en.wikipedia.org/wiki/Internet_protocol_suite

It’s just part of the lingo, a tool to communicate. The TCP/IP model ignores the physical layer making it a less useful tool.

For me the OSI tends to come up at work to talk about scope or areas of control. People will say “that happens in layer 3” (for instance) as shorthand, not as a referent that corresponds to any actual thing.

I don’t think the post is meant to be a beginners level introduction to networking, the author writes:

This is the inside view of how exactly a router operates. You only need to know this if you are poking inside a router implementation. If that is the case, my condolences.

If you’re poking inside a router implementation, it seems fair to expect that you have a basic understanding of OSI networking layers.

Reading the replies, I somewhat doubt whether you still know what L2 means. The danger of being a nerd is sometimes you say a lot of words but they don’t mean anything.

Ethernet. L2 means Ethernet (or WiFi). Ethernet is the envelope we put Internet traffic in (L3) and the layers above that are about nailing down how exactly a conversation is managed. Sometimes people get upset about what constitutes Layers 5-7, especially since that Tim Berners-Lee joker ruined all the pretty pictures with HTTP. So mostly we only talk about 2,3,4 and 7, in the same way you don’t bring up religion or politics at a family reunion.

“Tim Berners-Lee joker ruined all the pretty pictures with HTTP”

This is the first time I am reading this, I interpret this to mean HTTP is badly designed and Tim Berners-Lee caused it. Need more…

Except now we have session and application protocols built on top of HTTP using additional software, which according to OSI would be additional layers. You can in a lot of cases use the standard to achieve this, but frequently enough we don’t.

I think you need to know your audience and cater to them, trying to explain everything just ends in a book. L2 is especially googleable.

This is a good point. You have to havesomeassumptions of what your audience brings.

I’m aware there are levels of information in an IP packet, but I don’t know them offhand. If I have to google something on the first sentence in a high level overview, then I’m likely not going to read the piece and the author has lost me as a reader. Maybe I’m not the target audience, though I was interested. I’m providing that as feedback for the origial author since the piece mentions that’s it’s still a work in progress.

To be fair, L2 could be Layer 2 or Level 2 (cache) and it might be a crapshoot what you get. You might get confused trying to answer your own questions.

Discoverability lives in the space between overexplaining and underexplaining.

In a networking discussion, L2 always means Layer 2. If the subject of caching came up the author would say “I’m talking about L2 cache here.”

It’s like TTL. It means one thing in a networking context but something totally different in a digital logic context.

But granted, somebody with no networking background wouldn’t necessarily know that.

One can just add switch, router, network, etc to the query until it works. Supposedly they’ll all work. Weak google fu means no info today, and if OP and the author are not the same person, then the latter may not even have a clue that it was posted on hn, where such high standards apply. If someone brought an electronics forum wiki post, should one expect every TLA¹ to be explained there too?

¹ Three Letter Acronym/Abbreviation

Surely they would be aware the audience would know what ethernet is. To me, L2 refers to the level 2 cpu cache.

The IP stack has the concept of layers, which function as abstractions that hide the implementation of lower layers from the upper layers. Layer 2 (L2) is the physical link layer – it only cares about getting a packet between two devices. Layer 3 (L3) is where IP addresses live. As the article describes a router has functionality to send a packet towards its final destination as well as get it between ports.

>The IP stack has the concept of layers, which function as abstractions that hide the implementation of lower layers from the upper layers

Correction: thenetworkstack has layers, where IP is one of them, near the top.

Which is why most software targets IP. It’s a good abstraction and it’s portable.

GP may be referring to the “TCP/IP model” which does indeed define the layers used in common parlance. This model has 4 layers in contrast to the OSI model’s 7 layers. The TCP/IP model is closer to how most real life network stack implementations are defined.

Arguably even this layering system is too rigid for reality but it’s a decent model. See RFC 3439 section 3.

This refers to Layer 2 in the OSI model of the network stack. Seehttps://en.wikipedia.org/wiki/OSI_model

1. physical layer, 2. data link, 3.vnetwork, 4. transport, 5. session, 6. presentation, 7. application layer.

So, many switches are layer 2, but layer 3 switches are often referred to as switching routers. This can cause two different switches to act differently from each other in certain network environments. It isn’t that one switch “doesn’t work” but that it isn’t a router.

A router is nominally a L3 device, though most actually are L1-7. To work, you need L1 & L2, but in today’s world, there are applications and interfaces that move the router across L1-7, though not to the same depth as purpose built application devices for example. Topping this off, some routers will switch and some will not. It’s the same wide-world of words that we see across the whole computer industry.

The OSI model differs from the TCP model of networking, even though both use numbered layers.

You may want to read OSI 7 layers model. Those L1,L2,L3,L4 and L7 concept derived from that model.
L1 is the physical access. It is the cable, the fiber or the WiFi itself.
L2 is datalink. We use Ethernet for IP network. The device that mainly handle communication at this later called a switch.
L3 is network. In IP Network it handles the routings between IP Network. The device usually called as a router.

Some devices can do L2 and L3 at the same time. That’s why another term came up: L2 only switch.

And so on, you can read it more on [1].

1]https://en.m.wikipedia.org/wiki/OSI_model

Maybe a mention of other, non-ethernet, links. Serial PPP? Frame Relay? I realize these are mostly historical curiosities these days, but it might help to enforce the differences between L2 and L3.

When I first started working with routers, over 25 years ago, it was all ethernet LAN to serial WAN, usually point-to-point T1 or frame relay. On site had adualT1, load balanced on both ports of a Cisco 2501. Fun times.

I learned a lot about networking when setting up servers in racks. Had to deal with issues arising from terrible UI’s on a lot of the routers out there, so I just kept digging deeper and deeper into how it all works. Also, if more are looking into how packets are actually routed, look into BGP, and how CDN’s work. Great stuff.

I would start with how internal routing works before starting on WAN routing.

Id look at the cisco press and CCNA training materials

I believe this piece does a good job with forwarding, but would be improved by a discussion of termination.

Routing is only triggered when the packet is L2 terminated: the destination MAC of the packet is one of the router’s own MACs.

If the packet’s destination MAC does not belong to the router, it doesn’t matter what is in its IP header, it will be switched in the LAN it came in on.

This design also generalizes nicely to the case when the destination IP of a routed packet is one of the router’s IPs.

Good point. Incorporating that would require more brain that I have right now (bad timezone :D), but you’re right, I completely left that out. May I update the article with a link to this comment?

I teach computer networking class with lab using Linux Switch Appliance (LISA) and Quagga router (based on Zebra) on embedded computer running x86 CPU with multi-port Ethernet. The embedded router need to be dual-boot for its specific function because LISA is based on custom Linux kernel but Quagga is just using normal/vanilla kernel.

I am looking for a “layer 3 switch” than has switching and routing functionalities without rebooting. If anyone know any software based open source solution for this it will be very helpful. Preferably with Cisco IOS like user command interface but it is optional but not mandatory.

Based on the article, it is explaining router internal based on P4. Perhaps I should try to use P4 for the above mentioned requirements?

For labbing with quagga you can get pretty far with Linux containers to emulate multiple routers on a single host. (I’ve used both lxc and docker to manage containers.) You can create virtual ethernet device pairs (ip link add veth0 type veth peer name veth1) , and drop either end into running containers (ip link set veth0 netns.) Make sure to turn on the ip forwarding sysctls inside the containers and Linux will behave quite nicely as a virtual router.

Also, consider consider upgrading to the more active fork called Free Range Routing.

GNS3 and run actual vendor virtual images if you want to have the actual vendor interface, it’s made for this scenario.

VyOS supports bridging and routing although the config is more like a Linux host and unlike a real Cisco/Arista switch.

The Vyatta/VyOS/EdgeOS CLI took heavy inspiration from Juniper’s JunOS, so saying the config is unlike a “real” switch is factually incorrect.

It’s still a little odd, but as somebody quite comfortable with JunOS (I run Juniper switches in my homelab) it’s pretty easy to pick up any of the Vyatta forks and hit the ground running.

>”It needs to be routed: the router, based on L3 information, decides where it needs to go ,in L3 speak – it will decide which host to send it to, but not how. This corresponds to the routing table (or FIB).”

This is not correct. The FIB(forwarding information base) is concerned with layer 2. The RIB(routing information base) determines the next hop. The RIB is what is used to populate entries in the FIB with the correct outgoing interface. These two terms are basic router terms. It was kind of surprising to see this statement in a post titled “How Do Routers Work, Really?”

You’re right, I noticed it about an hour ago — no idea what was going on in my head then :-/ Fixed already. Thank you!

this is great if for no other reason that in section 1 it explains the difference between a switch and a router (which took me a decade? to really understand). I really wish someone could have laid it out clearly for me.

Read More