Rebooting my router did not fix my DNS problems. I have discovered the cause of the problem, however.
Please write in your guesses. First correct answer wins a prize.
Rebooting my router did not fix my DNS problems. I have discovered the cause of the problem, however.
Please write in your guesses. First correct answer wins a prize.
@ryanc Broken cable?
@ryanc cloudflare
@ryanc my second guess is, upstream dns tampering, which you would detect and thus ignore, so no queries were valid, resulting on no host records.
@schtaks no
@kajer no
@kajer no
@FritzAdalis @kajer i mean, that depends on semantics, but i would say no
@aschmitz no
@ryanc BGP.
@ryanc VPN or some other routing tunnel accidentally left on?
I'll go the simple route and guess a corrupt DNS cache...?
@ryanc idk how but MTU would be very funny
prior service config change was committed but the service had not been restarted
@antsu no
@ryanc systemd-resolved
@distrowatch no, no
@ryanc I'd probably go with loose/unplugged cable.
Other common issues include Internet/DNS bill not paid, and typo in a settings field.
@nyanbinary no
@munin no
@overflo no
@ryanc resolv.conf fubar?
@dontreportme no, but that's extremely cursed
@ryanc
NIC power save
@ryanc You didn't plug the fibre into the router
@foobarsoft no
@ryanc ISP DNS was broken?
- systemd-resolvd
- your router gets default resolver from upstream and that was missing/wrong/pointing to broken resolver, borking DNS for local network getting DNS to use from DHCP lease
- something in /etc/hosts
- not putting something in /etc/hosts
@ryanc iptables
@paul_ipv6 none of the above
@swatters no
@ryanc client was doing DNS over HTTPS?
@atax1a no, and I've migrated to nftables anyway
@jwildeboer no
@ryanc Firewall.
@1mik no
@ryanc ISP gives back a DNS Server that is not working.
@ryanc My guess is Google’s nameservers caching incorrect DNS for much, much longer than the published TTL. That, at least, is what I’ve been dealing with.
Thanks, Google :P
@aadeacon no
@ryanc Not an IT person so I will go with A CAT.
and this is why DNS is such a solid source of employment. :D
@Wolkensteine no; it's a supermicro running debian
@ryanc
Maybe the DHCP server of your router was configured incorrectly, so clients got bad stuff, which would have been discovered when rebooting probably since a device trying to reconnect probably would then throw errors? Or wait — no, that can't be right? That wouldn't have caused you DNS problems instantly … Mhhhhh
What kinda router are you using?
@jackeric arguably not
@ryanc is it DNS
@arielmt no
@ryanc Assigned IPv6-only DNS servers on an IPv4-only WAN?
@bobkmertz no
@ryanc
Cable plugged in to LAN vs WAN?
@ryanc your machine was not connected to that router at all but using a different one you forgot about.
@poleguy no
@ryanc
Modem related: unpowered or "release of magic smoke"
@foobarsoft no, but what the actual fuck
@ryanc Dang. I’ve seen the other guests, it seems like this is likely to be something incredibly stupid.
I’ll give it one last insane try.
DNS server set to EBCDIC not ASCII?
I assume that’s impossible but oh well.
@ryanc dnsmasq cache/connection limit exhausted
@1mik no
@eyes4abbot will post the answer tomorrow
@ryanc I've no guesses but am SO curious for the answer...👀
@tb1402 no
@ryanc Some UDP only related problem. Detected after the reboot, as DHCP now also didn't work.
@mdione @dontreportme this was affecting all computers on my network at once
@ryanc @dontreportme well, my funputer starts to get disconnections from the WiFi every night and it seems it's related to this.
@murph no
@ryanc Wrong subnet?
@gantua none of the above, probably lennart's fault somehow anyway
@gantua none of the above, see previous toot about blaming lennart for things completely unrelated to systemd
@kaoudis no
@ryanc ISP-level issues?
@nieldk no
@gantua none of those
@ryanc IP conflict
@make no
@ryanc
The thing you did just before it Broke?
@ryanc Your DNS config is not working because some container process keeps overwriting the file?
@pa3weg no
@ryanc incorrect MTU size?
@waynedixon no
@gantua no
@baardhaveland none of these
Some "it's too stupid to check" issues I've experienced:
- Someone plugged a loose ethernet cable back into one of the switches, but the other end of said cable was already connected to the network
- Someone decided to make their own WiFi (with hookers and blackjack), but it started handing out new IPs to everyone
- Someone visited the switch room, and turned off the lights when they left, despite huge warning signs
- Logging all traffic caused router to overheat after 2-3 days
@ryanc @Sempf I still feel like some variety of ground squirrel *should* be the answer (but how that would only affect DNS idk) 😆
@ryanc @kajer
Hmm, that probably rules out EDNS0 or forgetting to roll the zone timestamp.
@ryanc lunar disturbances
The list is long.
I'll go with resetting a cicuit breaker.
@ryanc duplex mismatch
@ryanc ntp
@ryanc after scanning the thread; filesystem quotas?
@morb no
@atax1a not that I can tell
@bertkoor no
@ryanc
An unrelated service / cron job on the same machine taking 100% cpu
@kajer no
@ryanc something causing intermittent broadcast storms? something stealing the MAC addr or IP addr of your DNS resolver? something making dnsmasq hang, maybe it was trying to write to something very slow, eg. logs on a broken disk?
@gdupont no, but A+ for creativity
@ryanc
This thread is soooo good. I need to make my attempts (as most, based on past experiences):
- power fluctuation that lower just enough to make some device unreliable but not enough to trigger any shutdown or power fallback
- if this is related to outdoor devices: rabbit partially ate cable or ant nest into a box that makes it just too hot to work well (but again not enough to shut it down)
@liquidlight no, but lol
@ryanc ghost ate mac address and replaced it with B00:B00:B00 to scare people
@b3lt3r no
@ryanc zero trust mesh with local DNS overriding
@generalx no, but I have been screwed by checksum offloading before
@ryanc
My guesses:
1. TCP
2. Checksum offloading
3. Ignored transaction ID
@fanf I'm going to give it to you for this, it was dnsmasq calling fsync(2) on the dhcp leases file, which happens fairly frequently because I've got about 80 DHCP clients on my home network.
I have resolved it by LD_PRELOADing libeatmydata.
@fanf no
@ryanc something else on the dnsmasq server causing the whole machine to choke, eg by eating all its ram? disk full causing indigestion? something killing dnsmasq and dns works again after it autorestarts?
@aris no
@ryanc@infosec.exchange DHCP configuration?
@oheso arguably no
@ryanc Cause of DNS problem: DNS
@mdione @fanf I didn't dig into the code, but I believe it is single threaded, and fsync is always blocking by design even on an otherwise non-blocking file descriptor.
But. In common configurations, fsync also has to wait on any other writes that were pending on the same filesystem.
@ryanc @fanf so, dnsmasq has a single thread (async?) for dns and dhcp requests, and if dhcp are lagging, they block dns?
@nCrazed all DNS lookups were failing on every device, so no.
@ryanc Were you checking the wrong domain?
@morb not rats, answer has been posted
@ryanc rats
@toadjaune @fanf strace
How did you end up diagnosing this ? :O
ah, dnsmasq... the "gift" that keeps on giving. the project hasn't been given consistent love from maintainers for reasons and router vendors are infamous for shipping really bad versions and not updating...
@zymurgic @fanf It would only make the problem less frequent, and even with a 24 hour lease time, that would be 160 renewals per day.
@ryanc @fanf how about just extending the lease duration? Less frequent DHCP queries, hence less load on dnsmasq.
It was a trick answer.
For various definitions of circuit.
GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.
All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.