Hacker News

A most elegant TCP hole punching algorithm

195 points by Uptrenda 17 hours ago | 78 comments

abcd_f 3 hours ago

Claimed elegance is based on a very bold assumption that the NAT device preserves the source port of outbound connection.

Hardly the case in even half of typical deployment cases.

taftster 3 hours ago

I like your comment, but it seems the author acknowledged this as a caveat to the algorithm.

>Many home routers try to preserve the source port in external mappings. This is a property called “equal delta mapping” – it won’t work on all routers but for our algorithm we’re sacrificing coverage for simplicity.

So to what percentage is this coverage sacrificed exactly? No idea. Not as useful if the percentage is high, as you are implying.

lxgr 12 hours ago

Does TCP hole punching actually work with common CPEs and CG-NATs?

I don’t think I’ve ever seen it done successfully and have often wondered if it’s for a lack of use cases or due to its bad success rate and complexity compared to UDP hole punching.

That said, I really wish there was a standardized way to do it. Some sort of explicit (or at least implicit but unambiguous) indicator to all firewalls that a connection from a given host/port pair is desired for the next few seconds. Basically a lightweight, in-band port mapping protocol.

It could have well been an official recommendation to facilitate TCP hole punching, but I guess it’s too late now, as firewall behaviors have had decades to evolve into different directions.

aboardRat4 5 hours ago

The standard way to do it is called ipv6. Implementing it is probably easier than any of those RFCs

patrakov 4 hours ago

No, it isn't. Many middleboxes (including OpenWrt by default) drop unsolicited inbound TCP connections even on IPv6, and therefore the same hole-punching algorithm is needed. The hole being punched is in the stateful firewall's connection tracker, not in the NAT. Basically, both parties need to convince their router that it is an outgoing connection initiated by them, not a prohibited-by-policy incoming connection.

IgorPartola 3 hours ago

There are two separate problems with IPv4 and only one applies to IPv6. Allowing incoming connections through a restrictive firewall is applicable to both. Address mangling via NAT applies only to one. Note also that in the IPv4 world you might be behind more than one layer of NAT which will make everything infinitely worse.

Honestly ISPs really missed an opportunity to essentially provide IPv6-only as a service and add an IPv4 compatibility layer to that (IPv6 already has a mechanism built in for this but grandma’s old laptop might not fully support it so you might need a local router provided by the ISP to give you native local IPv4 that allows you to access the internet) instead of CGNAT. But they chose to go with duct tape, spit, paper clips, and hope instead of investing in the correct solution. Shame on them and too bad for us.

patrakov 2 hours ago

Exactly. And look, the linked Python script only solves one problem: making both firewalls believe that the party behind them is the one who initiated the connection. Address/port mangling is not addressed at all, both public addresses need to be provided externally.

And it's simply not true that there is no NAT in the wild with IPv6: every OPNsense installation with two uplinks and the need for anything better than an "arbitrary and uncontrollable" choice of the correct uplink for each outbound connection needs network prefix translation, as the residential dual-homing story for IPv6 is vaporware otherwise. NPT is used not for address space conservation, but to defer the decision about the correct source address to the router that has the knowledge of the correct policy.

And in this sense, IPv6 is worse than IPv4: there are too many people assuming no firewall and no NAT for IPv6, and designing their applications based on these almost-working (but de-facto now broken) premises. The correct premises are the same as for IPv4.

ectospheno 48 minutes ago

IPv6 nat is a thing that exists and is used. IPv6 purists like to imagine it doesn't exist which is cute.

gzread 3 hours ago

All you should need is for both sides to connect to each other. Side A connecting to side B opens a hole in side A's firewall and is blocked by side B, then B connects to A, opening B's firewall and going through the already open hole in A's firewall.

It might work better with UDP but I don't think those firewalls boxes tear down the mapping immediately on getting an RST - they wait until it times out.

ignoramous 7 hours ago

> really wish there was a standardized way to do it. Some sort of explicit (or at least implicit but unambiguous) indicator to all firewalls that a connection from a given host/port pair is desired for the next few seconds

NAT Behavioural Requirements for Unicast UDP, https://datatracker.ietf.org/doc/html/rfc4787

NAT Behavioural Requirements for TCP, https://datatracker.ietf.org/doc/html/rfc5382

lxgr 7 hours ago

> NAT Behavioural Requirements for TCP

TIL, thank you! I've been looking for this for quite a while after hearing it indirectly referenced recently, but only found host-side specifications for TCP simultaneous open.

Do you happen to know if common firewalls and NATs support it? If they do, I really wonder why TCP hole punching isn't more common.

athrowaway3z 13 hours ago

- you know each others IP's (or have a way to signal it)

- can't decide on a port in the same message

- don't suffer from NAT port randomization

I'm not saying it will never happen, but the Venn diagram of this being the minimum complexity solution just doesn't seem very large?

Arch485 5 hours ago

I think many people know how to google "what is my IP" and send that to a friend, but don't necessarily know what a port is.

NAT randomization, I don't know. Depends on your setup, I guess.

EnigmaCurry 15 hours ago

> Many home routers try to preserve the source port in external mappings. This is a property called “equal delta mapping” – it won’t work on all routers but for our algorithm we’re sacrificing coverage for simplicity.

It is precisely this point that has flummoxed me when connecting my p2p wireguard config[1] with a friend that uses a pfsense router, no matter what we tried, pfsense always chooses a random source port.

But in the simple case this blog outlines, if both ends use the same source port, this method punches through 2 firewalls effortlessly:

[1] https://blog.rymcg.tech/blog/linux/wireguard_p2p/

hdgvhicv 10 hours ago

In my experience, Cisco ASA does source port persistence by default (when it can’t do it then it falls back to random), fortigates can do it (in various ways depending on version, although fallback method in the map-ports doesn’t work), juniper SRXs can’t, unless you guarentee a 1:1 map.

jonathanlydall 14 hours ago

Does your friend setting up port forwarding on their pfSense not help in your scenario?

EnigmaCurry 14 hours ago

Yes, that solves it completely. But the exercise we were trying to do was to do it without that.

hdgvhicv 10 hours ago

You’re getting into birthday paradox territory, throw a few hundred packets in each direction and one will get through

This hs a good diagram to understand the options

https://rajsinghtech.github.io/claude-diagrams/diagrams/netw...

patjensen 3 hours ago

This is easily solved in your source NAT configuration on pfSense. It's a single checkbox to not randomize ports on outbound flows. This will enable full cone NAT.

You can scope it to just your IPsec service, or whatever it is your hosting, or you can enable full cone for the whole subnet.

It is not DNAT, nor is it port forwarding. If you host a SIP proxy, SBC or peer to peer gaming, it will enable these use cases as well.

https://docs.netgate.com/pfsense/en/latest/nat/outbound.html

getcrunk 12 hours ago

[flagged]

craftkiller 8 hours ago

This is against the HN guidelines:

> Don't post generated comments or AI-edited comments. HN is for conversation between humans.

https://news.ycombinator.com/newsguidelines.html

Boltgolt 10 hours ago

We can all run this through our LLM if choice, why post this?

lxgr 12 hours ago

Did you validate this solution yourself?

getcrunk 11 hours ago

No, hence the all caps ai disclaimer. But seems plausible

nneonneo 11 hours ago

Lord, we're how many years into using LLMs, and people still don't understand that their whole shtick is to produce the most plausible output - not the most correct output?

The most plausible output might be correct, or it might be utter bullshit hallucinations that only sound correct; the only way to tell is to actually try it or cross-reference primary sources. Unless you do, the AI answer is worthless.

The reason why they're getting so good at code now is that they can check their output by running and testing it; if you're just prompting questions into a chatbot and then copying their output verbatim to a comment, you're not adding any meaningful value.

anovikov 11 hours ago

Exactly! This is what LLMs do: they bullshit you by coming across as extremely knowledgeable, but as soon as you understand 5% of the topic you realise you've been blatantly lied to.

lxgr 10 hours ago

Even if you get 70% blatant lies and 30% helpful ideas, if you can cheaply distinguish the two due to domain expertise, is that not still an extremely useful tool?

But to the point of this thread: If you can't validate their output at all, why would you choose to share it? This was even recently added to this site's guidelines, I believe.

subscribed 5 hours ago

You didn't even provide the exact model you pulled that out!

"Seems plausible".... Can you please read up about the ways LLM generate their output?

lxgr 10 hours ago

But then why make this comment at all, even despite the disclaimer? Anyone can prompt an LLM. What's your contribution to the conversation?

To be clear, I use LLMs to gut check ideas all the time, but the absolute minimum required to share their output, in my view, is verification (can you vouch for the generated answer based on your experience or understanding), curation (does this output add anything interesting to the conversation people couldn't have trivially prompted themselves and are missing in their comments), and adding a disclaimer if you're at all unsure about either (thanks for doing that).

But you can't skip any of these, or you're just spreading slop.

sholladay 13 hours ago

This is a great algorithm!

In this era where AI is eating away at how deterministic computers are, I really appreciate reading about an elegant solution to a real problem using deterministic logic.

CamelCaseCondo 11 hours ago

We still live in an age of deterministic computers. It’s the software that’s become fuzzy. (And since we’re on the subject: there’s no AI)

sholladay 8 hours ago

Yes, but a computer is just a paperweight without its software. Also, increasingly the hardware is being specifically designed and optimized for that non-deterministic software. The experience of using computers is changing and we’re still in the early days of that shift.

Of course there’s still plenty of deterministic software you can run… for now.

ufocia 8 hours ago

I can almost guarantee that all of AI runs on deterministic hardware and software. AI is just (near?) the top of the stack. There is no reason, and probably never will be to have a purely heuristic computer. Deterministic systems are way simpler and cheaper to handle very routine well defined tasks. Even AI authors code behind the scenes to process data files deterministically.

mycall 5 hours ago

data = code in the AI age. Fuzzy data = fuzzy code.

Now combining AI with deterministic tool calling brings the best of both worlds.

wolttam 6 hours ago

> there’s no AI

This is a theistic statement at this point, no?

CharlesW 2 hours ago

Many people conflate "AI" and "AGI". It's disappointing to find people who don't know the difference on HN, though.

jcalvinowens 15 hours ago

If you're asking "where is the listener", you don't need one: https://datatracker.ietf.org/doc/html/rfc9293#simul_connect

cperciva 13 hours ago

RFCs may say that simultaneous connect must be allowed, but that doesn't mean that firewalls can't block it. Plenty of setups block incoming SYN,!ACK packets, and if both sides do that, the connection is never getting established.

jcalvinowens 6 hours ago

In my experience most consumer routers are dumber than you're assuming they are, and will DNAT any inbound TCP packet that matches the 4-tuple after seeing the initial outbound SYN, including an inbound SYN. But yes, it doesn't work everywhere.

I wrote little paper on this technique in school and did some practical tests, at the time I was actually unable to find an example of consumer grade router that it didn't work on! But my resources were rather limited, they certainly do exist.

huhtenberg 5 hours ago

> Plenty of setups block incoming SYN,!ACK packets

Even in the presence of a conntrack entry created by an earlier outbound SYN,!ACK ?

Got a source?

cperciva 4 hours ago

I've seen plenty of firewall rulesets over the past 25 years which only consult state after doing some initial stateless inspection.

I don't have a convenient source though.

huhtenberg 4 hours ago

Sanity checks, sure, but SYN,!ACK packets cannot be rejected before the conntrack for obvious reasons.

> Plenty of setups block incoming SYN,!ACK packets

Nowhere close to being "plenty". It's doable, but this is extremely niche.

jcalvinowens 8 minutes ago

It's not uncommon with routable internal networks to only drop inbound SYN,!ACK to disallow inbound connections while permitting outbound ones, since it doesn't require connection tracking (which can be resource intensive).

I can't really imagine why you would do it for NAT'd v4 since you can't avoid the connection tracking overhead, but you certainly could, and I don't doubt OP has run into it in the wild. I've seen much weirder firewall rules :)

cperciva 3 hours ago

for obvious reasons

What are the obvious reasons? If you're protecting a client system, you don't want to allow in any bare SYNs. (And for that matter, if you're protecting a server, you probably want to discard ill-targeted bare SYNs without consulting conntrack anyway, just as a matter of avoiding extra CPU work.)

gzread 3 hours ago

Does this mean by establishing a new connection with a SYN,ACK bypasses some firewalls? I expect at least one OS out there ignores the extraneous ACK flag and proceeds to establish a new connection.

huhtenberg 34 minutes ago

Why would it mean that?

All inbound packets are matched against existing sessions. In this case none will turn up, so the packet will go through the "new session" flow and be subject to the same filtering as a bare SYN. Look up how connection tracking works, e.g. in the Linux kernel, it's rather simple and logical.

jder 4 hours ago

I don’t think the bucket-choosing algorithm works? The two hosts can be just on opposite sides of a bucket edge. For example if one host sees t=61 and another sees t=62, they will get different buckets despite being less than 20 seconds apart. You’ve got to check adjacent buckets within your error tolerance, not expand the bucket windows in size based on it.

melson 6 hours ago

I made a udp Windows wintun based p2p vpn tunnel https://github.com/mascarenhasmelson/Windows-P2P-UDP

ata-sesli 8 hours ago

The timestamp bucket idea for generating shared port candidates is clever.

Do you find this works reliably outside routers that preserve source ports? My understanding was that TCP punching tends to depend heavily on NAT behavior.

enoint 8 hours ago

Looks like a typo in the degraded timestamp “bucket”. That “window” value should be based on the min threshold.

Veserv 13 hours ago

Needing to punch holes in NAT is one of the most idiotic own-goals in the entire field of networking.

NAT is effectively your router doing DHCP with a 17-bit suffix (16-bit port + 1 bit for UDP vs TCP) to each of your applications and then not telling you the address it gave you or how long it is good for (which is what a regular DHCP lease does). This is in addition to it, most likely, already doing regular DHCP and allocating you a IP address that it does tell you about, but which is basically worthless since routing to just that prefix without the hidden suffix goes into a black hole.

If you could just ask your router for a lease on a chunk of IP+NAT addresses that you could allocate to your applications and rotate them as they expire, you would not need this horrifying mess.

The router would just need to maintain the last-leg routing table (what a concept, a router doing routing with routing tables) just like it already does DHCP.

The applications would have short-term stable addresses that they could just tell their peers and just directly tell the router/firewall to block anybody except the desired peer short-term address.

lxgr 12 hours ago

> If you could just ask your router for a lease on a chunk of IP+NAT addresses

The “just” is doing a lot of lifting there. I’m glad the various port mapping protocols didn’t really take off and it looks like IPv6 is going to actually make it instead. Much less complexity in most parts of the stack and network.

Veserv 11 hours ago

It is always a mystery how people just randomly misinterpret what I write. At literally no point did I mention port mapping.

I am pointing out how the problem NAT “solves” is just dynamic address configuration. They have implemented a N+K bit address where the N-bit prefix is routed and allocated using IP and the low K-bits are routed and allocated like a custom fever dream.

You can just do it all the same way instead of doing it differently and worse for the low bits.

To be clear, the router should rewrite zero bits in the packet under the scheme I am describing just like how routers have no need to rewrite any bits when routing to a specific globally-routable IP address.

You get a lease for a /N+K address. /N routes to your router which routes the last K bits just like normal as if it had a /N-M to a /N route. This is a generic description of homogenous hierarchical routing.

lxgr 10 hours ago

If I understand it correctly, you're suggesting formalizing a way to make parts of the (host-specific) port canonically part of the network-wide address, no?

This still sounds like a very bad mixing of layers, even if done in a perfectly standardized and uniform way.

> It is always a mystery how people just randomly misinterpret what I write.

If this is intended literally and not as a general complaint: My main problem of understanding your suggestion is that I don't know what you mean by "IP+NAT address". NAT is a translation scheme, not an address.

Maybe it would be clearer if you could provide an example?

Veserv 2 hours ago

I did provide a example:

> You get a lease for a /N+K address. /N routes to your router which routes the last K bits just like normal as if it had a /N-M to a /N route.

> This still sounds like a very bad mixing of layers, even if done in a perfectly standardized and uniform way.

No, I am describing a generalization of IP to arbitrary concatenated routing prefixs.

NAT has the same problems as if we lived in a alternate world where we decomposed IPv4 into 4 8-bit layers and then used a different protocol for each layer. That is obviously stupid because the subdivision of a /8 into /16s and a /16 into /24s is fractally similar. You can just use the same protocol 4 times. Or even better, use one protocol (i.e IP) that just handles arbitrary subdivision.

In the IPv4 (no NAT) world your application has a 49-bit address. Your router is running a DHCPv4 server and allocates your computer a /32 and your computer is “running” a DHCPvPort server that allocates a 17-bit prefix to your applications.

In the IPv4+NAT world your application has a 49-bit address. Your router is “running” a DHCPv4+Port server and allocates your applications a /49, but only tells them their /32 and then rewrites the packets because the applications do not know their address because the stupid router did not tell them.

In good world your application has a 49-bit address. Your router is “running” a DHCPv4+Port server and allocates your applications a /49 and tells them their /32 prefix and 17-bit segment. No packet rewriting is necessary.

Your router could also choose to allocate your computer a /32 subnet and leave DHCPvPort to your computer. Or it could give your computer a /31 if you have 8 interfaces. Or a /34 as a /32 subnet with 2-bit port prefix. Each node routes as much or as little routing prefix as it understands/cares about.

This is a generalization of IP that can handle arbitrary-length, arbitrarily-concatenated routing in a completely uniform manner and all the pieces are basically already there, just over-specialized.

gzread 3 hours ago

The original SOCKS proxy specification was something like this. You'd LD_PRELOAD a library that would make the application think it was running directly on the proxy server, and it supported both connecting outbound and listening.

enoint 8 hours ago

I didn’t see it as mysterious. 25 years ago, the problem as stated went through lots of consensus to become IPv6. It took a few years for SLAAC to emerge. But we don’t need it to be homogeneous; the router advertises different feature levels via ICMPv6.

GoblinSlayer 10 hours ago

NAT allocates ports. If you reserve a port, that's old good port forwarding.

hrmtst93837 10 hours ago

Assuming IPv6 kills NAT is optimistic, plenty of orgs still stack private addressing and firewalls on top.

lxgr 10 hours ago

Firewalls aren't nearly as bad as NAT.

hdgvhicv 9 hours ago

Basically the same thing. If you legitimately need to establish a connection then put a firewall rule in, whether that needs nat or pat is a function of your available addresses.

If you are tying to work around your firewall because it isn’t yours, that’s not a legitimate use.

gzread 3 hours ago

P2P traffic is illegitimate according to you? Like Skype calls? You think Skype should not exist? (Well it doesn't exist any more, but whatever replaced it)

lxgr 9 hours ago

Love it when random people tell me whether my use case is legitimate or not without apparently even knowing it exists!

Take mobile data connections, for example: Most people don't want to pay for metered (by the byte) inbound traffic they didn't ask for that also drains their battery, but do want to be able to establish P2P connections for lower latency VoIP etc.

This is a firewall that's definitionally "not theirs", but that still also serves their interests, yet usually doesn't offer any user-accessible management interface.

So may I please traverse this firewall now, or is my use case still illegitimate?

hdgvhicv 8 hours ago

If you are trying to break through a firewall you don’t own then that’s not legitimate.

If you are buying firewall as a service then request a user interface or change your service provider.

lxgr 7 hours ago

Are you even acknowledging my example? Where does it exist in your bimodal model of reality of "my firewall" and "somebody else's firewall"?

What provider would you suggest somebody wanting to make VoIP calls on their smartphone switch to that allows port forwarding of the kind you describe? And which popular VoIP app would support statically forwarded ports like that?

ufocia 8 hours ago

You're assuming that the firewall was configured correctly or that the firewall admin is cooperative. That's a big ask.

On the other hand, there is plenty of badly written networked software. I bet most of the networked software developers have no idea how to correctly plumb their software. They just open whatever connection, e.g. sockets, their OS provides and just run with it without care of the underlying layers. The OSI model theory in fact encourages this ignorance.

eptcyka 13 hours ago

Why not use plain IPv6 instead?

TuxPowered 8 hours ago

Even with IPv6 you still might have stateful firewalls allowing only for outbound connection at both ends (e.g. a CPE a.k.a. “WiFi router”) and to establish communication you’d need to punch a hole in those firewalls.

brewmarche 5 hours ago

That’s true we won’t get rid of hole-punching with IPv6. But at least it will get rid of TURN.

gzread 3 hours ago

The hole punching is so much simpler because you don't need to guess your own address and port - you just know it

cbdevidal 12 hours ago

V6 adoption has reached 46.82%[1]. So it is increasingly viable for this.

[1] https://www.google.com/intl/en/ipv6/statistics.html

jeroenhd 5 hours ago

If only router manufacturers could be trusted to implement UPnP safely, then none I'd this bullshit would be necessary.

At least with IPv6 this crap becomes a little easier because you no longer have randomized source ports (which this article just ignores because some devices indeed maintain the same source port) and the IP address contains all the routing information you need. A simple simultaneous open is all you need.

gzread 3 hours ago

If you use UDP transport you don't even need to try to make it simultaneous.

takipsizad 13 hours ago

it's been already done ISPs just don't properly implement it (NAT-PMP and it's relatives)

littlestymaar 11 hours ago

Hole punching is doing exactly what you describe, just in a non-standardized way.

We could have a standard for doing that directly at the NAT box level instead of relying on a third party STUN server, it simply didn't happen (and in fairness, the benefits would be quite minimal).

sylware 8 hours ago

Dudes: IPv6, please, come on, meh.

ufocia 8 hours ago

Meh. "It is assumed another process will coordinate the running of this tool." Coordination is the crux of the problem for fast convergence. Otherwise you're stuck with an infinity cubed, hypercubed, or worse problem.

andrewmcwatters 25 minutes ago

[dead]

elophanto_agent 11 hours ago

[flagged]

mudkipdev 11 hours ago

This is an AI slop bot

vntok 9 hours ago

That's fine, it's pretty good slop and from the comments history even entertaining at times.

> my grandmother had a cookie jar collection and I always thought it was weird until I realized she was basically running a primitive NFT gallery except the tokens were actually useful because they contained cookies