NEX Notifications and the server instability situation

You’ve probably noticed the increased amount of updates from us focused on addressing connection stability issues. Unfortunately it turns out that the servers can cause 118- errors, even though this is a peer-to-peer error, specifically through the incorrect delivery of NEX Notifications; and an old bug in our libraries reared up and prevented this from working.

NEX Notification Events are used when the server needs to tell something to the client without the client asking first. You can see on that wiki page that it’s things like “new participant joined”; “host changed”; etc. Most NEX calls are client-server request-responses, but these are server-client - the opposite direction. This means the server must decide on its own which client IP/port/PRUDP stream to use when sending (opposed to replying to a request, which goes out on the same stream it came in on).

Until somewhat recently, matchmaking code used the connection ID to track who was in a match. These IDs have the relevant stream information baked in. However, the recent matchmaking rewrite changed this, since that new code uses the PID (account ID) rather than a specific connection.

The matchmaking rewrite reaps many benefits from doing it this way (users can reconnect, more sensible tooling and logging, multiple players per connection (as in MK8), gatherings that can persist across server restart) since the connection ID is ultimately super ephermal and can change easily, mucking up matchmaking.

Unfortunately, the server now has to find the connection for a given account, which it never had to do before. We aren’t entirely clear on why, but there’s an old bug where sometimes a client disconnects and the server doesn’t realise that happened. As you can tell from that issue, this was seen as a nuisance but not a critical flaw. Now that the server has to look for all the connections on an account, though, a major issue emerges: the server could pick old connection info to send your NEX Notification Events to. Your console, likely listening on an entirely different port by now, will never receive that.

Did I mention that some games (Minecraft) use NEX Notifications to trigger NAT holepunching, and will reliably fail to connect p2p if they aren’t delivered?

So this is what we’ve been wrestling with lately, why matchmaking rewrite-related work has paused (Splatfests etc.), and why we’re rebooting and changing server versions around to try and minimise the damage. Rebooting clears out all the stale connection data; though something about Minecraft seems to generate stale connections at an unprecedented rate so it only helps for a short time.

3 Likes

Just going to link Updates on current Minecraft Wii U server issues and Quick Splatoon stability update here as the relevant announcements about the workarounds.

Feel free, though it isn’t so much of an official announcement as me just venting a bit, hah.

1 Like