Plex Outage
Incident Report for Quick Think of Something Witty
Postmortem

Yesterday, in preparation for an upgrade of 4 servers to 2.5Gb, I tested one of the adapters on one of my non storage k8s workers, trainman. Because I didn’t have an interface defined in netplan, this somehow caused all devices to lose connectivity. I fixed this quickly and restored the node to service, but that started a process of recovering my longhorn volumes.

Today I plugged in the adapters (but without network cables plugged in since I don’t have the 2.5Gb switch yet) and configured the 3 storage k8s workers, but noted the unRAID host didn’t show a new network interface. I did a little research and found that the drivers for the adapter were released in a more recent version of unRAID so I posted the maintenance and upgraded. Post-upgrade, everything seemed to work so I ended the maintenance. As it turns out, trainman was not in a good state (and it was running my ingress), plus the unRAID host (simulacrum) changed the MAC address to the new adapter which changed its IP address.

I updated DHCP for simulacrum to use the new adapter’s address and restarted services. Then I drained trainman and stopped and restarted kubelet. There were still lingering network issues (the ingress couldn’t reach pods in the class C) so I restarted trainman which corrected the issue.

After doing this, all services were restored but longhorn is still recovering one 450GB volume. Also, longhorn has some odd behavior with some of the mounts, so when I do the upgrade I will be spinning down all k8s services with volumes and rebooting the workers to ensure there are no lingering Docker issues.

Posted Mar 21, 2021 - 21:02 EDT

Resolved
The issue is resolved
Posted Mar 21, 2021 - 20:40 EDT
Monitoring
I fixed the issue and services seem stable
Posted Mar 21, 2021 - 19:12 EDT
Investigating
unRAID (which hosts Plex) is down. Backup plex should be working.
Posted Mar 21, 2021 - 18:59 EDT
This incident affected: Plex (Primary Plex, Secondary Plex) and Homelab.