Notices by Zimmie (bob_zim@infosec.exchange)

Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Friday, 13-Jun-2025 16:22:59 JST Zimmie
in reply to
- Kelly Shortridge
@shortridge Some time later, I was no longer working tech support. I got hired to do network and firewall stuff for a fairly large company. At one point, they decided to relocate the office where a lot of the operations and monitoring staff worked. They moved the whole application monitoring team to the new building with the unproven infrastructure first, because some people in charge made very bad decisions.
The monitoring team gets to the new building, and they can’t access any of their monitoring systems. Clearly a problem with the new office, right? They go through a few environments to get to their monitoring systems, so I log in to the remote access VPN for the first one and confirm the first firewall they hit sees their traffic and isn’t dropping it.
I go to log in to the remote access VPN for the second environment, where the monitoring systems actually live. I’m able to start the connection, but it never prompts me for my credentials, and the tunnel never comes up. Huh. That’s weird.
Well, I’ll just get in through the DR version of the second environment. Connection works and it prompts me for my credentials, but it rejects them. I try again, in case I made a mistake entering the passphrase for my key, but it’s still rejected. Huh. That’s weird.
I eventually find a working way in. I’m able to ping all the relevant systems, I’m able to make TCP connections via telnet, but trying to actually use a service like SSH or MSRDP just hangs. But wait! I can connect to my firewalls via SSH! So what’s common among the broken systems?
All the broken systems are VMs. I start testing connections to other things which I know are VMs. They all behave the same. Ping works, TCP connections work, but data over the connections gets no response.
I bring in the virtualization team. Some of us drive in to the datacenter hosting the VMs giving us trouble. Someone quickly realizes the single SAN hosting all of the VMs’ drives was up, but wasn’t responding to storage requests. Effectively the drive had been pulled out of every single VM. Now we have an explanation for why all the VMs seem to be broken.
With most operating systems, the network stack is wired in RAM and can’t be swapped out. The network stack handles responding to pings and opening TCP connections on listening ports. Once a TCP connection is opened, it requests a copy of the listening service from storage to handle the connection. With storage no longer responding, the network stack never gets the copy of the service to handle the connection, so data doesn’t work.
Why couldn’t I connect to the second VPN endpoint? Well, some people in charge made very bad decisions. They had decided that since VMs are the future, the VPN endpoints in that facility should be moved from dedicated hardware to VMs stored on the SAN. They hadn’t gotten to the first VPN endpoint yet, but that environment wasn’t allowed to connect in to the second environment.
Okay, but I could connect to the other site’s VPN endpoint, and the other site didn’t have any problems. Why didn’t it accept my credentials? Well, some people in charge made very bad decisions (you may be noticing a theme!). All authentication was run through some VMs which were stored on the SAN. The VPN boxes in the working location were set to monitor the health of the authentication boxes in the failed location by pinging them. As long as they responded to ping, they were good, so the VPN boxes wouldn’t fail over to using their local authentication boxes. And a computer with its drive pulled can still respond to ping with just the network stack in RAM.
Once we realized what was going on, we physically connected to the WAN routers and added routes to prevent the two sites from reaching each other’s authentication boxes. Presto! We could now log in via the DR environment as normal. The other infrastructure teams were then able to start digging into their parts.
But why is the SAN unresponsive? Turns out this particular SAN vendor had an option for what to do under certain failure conditions: it could fail read-only or fail completely silent. This one was set to fail silent, and it had filled up.
I wasn’t directly involved in fixing the SAN. I know the manager over the SAN team had been sounding the alarm for months before it filled. I also know there were multiple levels of bad configuration, such as more space offered by LUNs than the SAN could physically provide.
Big takeaways:
1. Make sure your access to fix a system doesn’t depend on that system. It’s really easy to accidentally introduce dependency cycles, and it takes constant work to avoid them.
2. Superficial tests like whether you can ping something can’t detect some pretty major failures. More significant tests are more likely to notice the problem.
3. When something is critical to an environment, maybe have more than one of them? The SAN had internal redundancy to deal with faulty drives and so on, but all the storage was in one giant pool. Multiple SAN systems can provide a bulkhead such that breaking one would not break all VMs.

In conversation about 21 days ago from infosec.exchange permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Friday, 13-Jun-2025 08:07:26 JST Zimmie
in reply to
- Kelly Shortridge
@shortridge While working tech support, I got a call on a Monday. Some VPNs which had been working on Friday were no longer working. After a little digging, we found the negotiation was failing due to a certificate validation failure.
The certificate validation was failing because the system couldn’t check the certificate revocation list (CRL).
The system couldn’t check the CRL because it was too big. The software doing the validation only allocated 512kB to store the CRL, and it was bigger than that. This is from a private certificate authority, though, and 512kB is a *LOT* of revoked certificates. Shouldn’t be possible for this environment to hit within a human lifespan.
Turns out the CRL was nearly a megabyte! What gives? We check the certificate authority, and it’s revoking and reissuing every single certificate it has signed once per second.
The revocations say all the certificates (including the certificate authority’s) are expired. We check the expiration date of the certificate authority, and it’s set to some time in 1910. What? It was around here I started to suspect what had happened.
The certificate authority isn’t valid before some time in 2037. It was waking up every second, seeing the current date was after the expiration date and reissuing everything. But time is linear, so it doesn’t make sense to reissue an expired certificate with an earlier not-valid-before date, so it reissued all the certs with the same dates and went to sleep. One second later, it woke up and did the whole process over again. But why the clearly invalid dates on the CA?
The CA operation log was packed with revocations and reissues, but I eventually found the reissues which changed the validity dates of the CA’s certificate. Sure enough, it reissued itself in 2037 and the expiration date was set to 2037 plus ten years, which fell victim to the 2038 limitation. But it’s not 2037, so why did the system think it was?
The OS running the CA was set to sync with NTP every 120 seconds, and it used a really bad NTP client which blindly set the time to whatever the NTP server gave it. No sanity checking, no drifting. Just get the time, set the time. OS logs showed most of the time, the clock adjustment was a fraction of a second. Then some time on Saturday, there was an adjustment of tens of thousands of seconds forward. The next adjustment was hundreds of thousands of seconds forward. Tens of millions of seconds forward. Eventually it hit billions of seconds backwards, taking the system clock back to 1904 or so. The NTP server was racing forward through the 32-bit timestamp space.
At some point, the NTP server handed out a date in 2037 which was after the CA’s expiration. It reissued itself as I described above, and a date math bug resulted in a cert which expired before it was valid. So now we have an explanation for the CRL being so huge. On to the NTP server!
Turns out they had an NTP “appliance” with a radio clock (i.e, a CDMA radio, GPS receiver, etc.). Whoever built it had done so in a really questionable way. It seems it had a faulty internal clock which was very fast. If it lost upstream time for a while, then reacquired it after the internal clock had accumulated a whole extra second, the server didn’t let itself step backwards or extend the duration of a second. The math it used to correct its internal clock somehow resulted in dramatically shortening the duration of a second until it wrapped in 2038 and eventually ended up at the correct time.
Ultimately found three issues:
• An OS with an overly-simplistic NTP client
• A certificate authority with a bad date math system
• An NTP server with design issues and bad hardware
Edit: The popularity of this story has me thinking about it some more.
The 2038 problem happens because when the first bit of a 32-bit value is 1 and you use it as a signed integer, it’s interpreted as a negative number in 2’s complement representation. But C has no protection from treating the same value as signed in some contexts and unsigned in others. If you start with a signed 32-bit integer with the value -1, it is represented in memory as 0xFFFFFFFF. If you then use it as an unsigned integer, it becomes the value 4,294,967,296.
I bet the NTP box subtracted the internal clock’s seconds from the radio clock’s seconds as signed integers (getting -1 seconds), then treated it as an unsigned integer when figuring out how to adjust the tick rate. It suddenly thought the clock was four billion seconds behind, so it really has to sprint forward to catch up!
In my experience, the most baffling behavior is almost always caused by very small mistakes. This small mistake would explain the behavior.
In conversation about 22 days ago from infosec.exchange permalink
Attachments
1. Untitled attachment
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Sunday, 23-Mar-2025 03:08:30 JST Zimmie
in reply to
@horse @SrRochardBunson @YvanDaSilva Holding the sleep/wake button and either volume button for a few seconds also locks the phone. This may work better for people who have trouble hitting a button several times quickly.
If Siri is enabled, saying “Hey Siri, whose iPhone is this?” also locks the phone.

In conversation about 3 months ago from infosec.exchange permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Saturday, 22-Feb-2025 10:57:30 JST Zimmie
in reply to
- Sylvhem
- beatrix bitrot
@Sylvhem @bea the THERAC-25 was a radiation therapy machine. Sloppy concurrency programming led to race conditions which allowed operator error to put the machine into a dangerous state. On earlier versions, hardware interlocks prevented it from firing in this state, but the hardware safeties were replaced with software to save money. Several people got massive overdoses, and a few died.

In conversation about 4 months ago from gnusocial.jp permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Friday, 17-Jan-2025 09:07:27 JST Zimmie
in reply to
- Chris Hallbeck
@Chrishallbeck
In conversation about 6 months ago from infosec.exchange permalink
Attachments
1. A three panel comic from PoorlyDrawnLines First panel: “The goat, he screams like a man” Second panel shows a goat screaming “I am so tired” Third panel zooms out. The goat continues “I am so tired all the time”
  https://media.infosec.exchange/infosec.exchange/media_attachments/files/113/840/027/515/673/218/original/a82eca3e18612aae.jpeg
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Saturday, 11-Jan-2025 21:17:29 JST Zimmie
- ?????
@alice Yeah, that sounds like a Bob thing to do.

In conversation about 6 months ago from infosec.exchange permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Saturday, 28-Dec-2024 00:00:30 JST Zimmie
in reply to
- Baldur Bjarnason
- Rich Felker
@dalias @baldur Sure, but my point is LLMs are statistical grammar. They get syntax right almost all the time, but they don’t make any attempt at semantics.

In conversation about 6 months ago from infosec.exchange permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Friday, 27-Dec-2024 23:49:29 JST Zimmie
in reply to
- Baldur Bjarnason
- Rich Felker
@dalias @baldur We’ve known what is currently sold as “AI” is a dead end since at least 1956 with Chomsky’s paper Three Models for the Description of Language.
I can’t imagine how frustrating it must be for him having published on the topic for twice as long as a lot of the proponents of LLMs as AI have been alive.

In conversation about 6 months ago from infosec.exchange permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Thursday, 05-Dec-2024 11:53:48 JST Zimmie
in reply to
- evacide
- Gilbert Pilz
@gpilz @evacide Vaccine mandates are hard to justify under a framework of absolute bodily autonomy, but the others are easy.
Drug prohibitions should be lifted. Drug abuse is a public health problem and should be handled in that framework.
Conscription should not only be stopped, it should be explicitly prohibited. That “selective service” lasted so long should be seen as a national embarrassment.
Circumcision of infants should be illegal. It’s not the parents’ call. If an adult wants to be circumcised for religious reasons, that’s their decision to make.

In conversation about 7 months ago from infosec.exchange permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Thursday, 05-Dec-2024 05:13:34 JST Zimmie
@evacide @Blort @pluralistic We simply need tech companies to invent a new number you can only use if you believe in truth, justice, and the American way!

In conversation about 7 months ago from infosec.exchange permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Wednesday, 04-Dec-2024 07:20:21 JST Zimmie
in reply to
- David August ❌👑
- ArchaeoIain
@ArchaeoIain @davidaugust > Miscarriages of justice can be dealt with by the courts.
Which courts? Specifically which ones? And where do we go when those fail us? And where do we go when *that* court fails us? It’s not practical to have an infinite series of courts for appeals, so it has to end somewhere. What do you do when the final court is hopelessly corrupt?
Consider the crime of “felony murder”, which is also known as “not murdering anybody at all”. The fact anybody is in prison over this is inherently a miscarriage of justice, yet it’s very rarely fixed by the courts.
Edit: looks like the equivalent legal concept in Australia is “constructive murder”. In the US, if you are involved in any way with a felony (even an unwitting accessory) and someone dies (regardless of who or of circumstances), you can be charged with murder. Of course, if you actually kill someone, they charge you with real murder, not with “felony murder”. Stealing as little as $200 is a felony in various states.
Pardon power is good, and isn’t used nearly often enough.

In conversation about 7 months ago from infosec.exchange permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Tuesday, 19-Nov-2024 13:21:04 JST Zimmie
in reply to
- Hisham
- Michael Lucas :flan_set_fire:
@hisham_hm @mwl People really take the wrong thing away from the trolley problem. It isn’t directly about what you personally would or should do. Instead, it’s like an axis of comparison for ethical frameworks. It’s one of the extremes where differences (and sometimes similarities) between them become more apparent.
Like how Schrödinger’s cat isn’t saying the cat is both alive and dead, it’s taking a model we have for quantum effects and showing how, when taken to extremes, it produces results which are patently absurd.

In conversation about 8 months ago from infosec.exchange permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Tuesday, 19-Nov-2024 10:09:56 JST Zimmie
in reply to
@cR0w @jornane @The_Turtle_Moves @dalias I share this every time “user education” is brought up as a solution to phishing. It’s the first two lines of an email sent by the security team at my company at the time.
In conversation about 8 months ago from infosec.exchange permalink
Attachments
1. A screenshot of the first two lines of an email. We Need to Just Stop Clicking on Everything! View in your browser to display images. The phrase “View in your browser” is a link, which we are presumably expected to click.
  https://media.infosec.exchange/infosec.exchange/media_attachments/files/113/506/230/462/216/282/original/594140e7e54058cf.jpeg
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Sunday, 13-Oct-2024 07:52:44 JST Zimmie
in reply to
- Ryan Castellucci :nonbinary_flag:
@ryanc A fabric which stretches in one direction is said to have “two-way stretch”. A “two-way mirror” is only a mirror from one direction. Almost every instance of a term with “way” in it is mind-numbingly wrong.

In conversation about 9 months ago from infosec.exchange permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Saturday, 12-Oct-2024 04:34:28 JST Zimmie
in reply to
@ryanc @kajer @davidmc @zesty In that case, if the fuse blows, there’s current.

In conversation about 9 months ago from infosec.exchange permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Saturday, 12-Oct-2024 03:38:45 JST Zimmie
in reply to
@ryanc @kajer @davidmc @zesty It probably does, just not labeled that way. Current mode is low impedance. The downside is if your multimeter isn’t fused, measuring wall current with current mode will probably show ~15A for about five milliseconds, then your multimeter melts and/or explodes.
This is the only real downside to the demise of incandescent bulbs. Loose sockets are cheap. You could stick one plus a switch on a board, hook the mystery wire to the switch, the other end of the switch to one terminal on the light socket, and neutral (or earth, if neutral isn’t available) to the other terminal of the socket.
If the bulb doesn’t light, there’s voltage on the wire, but not much current. If the bulb lights, there’s current.

In conversation about 9 months ago from infosec.exchange permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Wednesday, 25-Sep-2024 20:27:51 JST Zimmie
in reply to
- Lukasz Olejnik
- DelegateVoid
@delegatevoid @LukaszOlejnik Upper limits on passphrase length are mostly about closing a possible resource exhaustion vector on the authenticating system. If you hash it all down to 64 bytes, there’s no point dealing with passphrases longer than 128 characters. Further characters don’t add any further entropy, but if you have no upper bound, some knucklehead is going to make your server hash the entirety of War and Peace over and over.

In conversation about 9 months ago from infosec.exchange permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Monday, 16-Sep-2024 16:28:47 JST Zimmie
@lmorchard @WhiteCatTamer @nex @alexhammy It would be really challenging. For example, I have no idea how you would make the word “house” sound blue.

In conversation about 10 months ago from infosec.exchange permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Wednesday, 11-Sep-2024 08:06:07 JST Zimmie
in reply to
@Di4na @clacke @makdaam @hendric That doesn’t seem at all the case to me. The Therac-25 report had quite a few big lessons.
• Data races can exist anywhere shared mutable state exists. This was poorly understood at the time. Language people have taken this to heart with copy-on-write data structures, static analysis for control flow, and more recently with proof-based data access validation as seen in Swift 6. This kind of issue is why those capabilities exist, and why you shouldn’t just turn them off to silence warnings.
• Software interlocks are strictly worse than hardware interlocks. They have more opportunities to fail in non-obvious ways.
• Safety-critical software has become a much more formalized discipline, finally matching the rigor of real engineering. For example, techniques were developed to prove a given program is free of bugs by proving it exactly matches the behaviors defined by its formal specification (no undefined behaviors, and no missing behaviors).
• Reported issues should be treated as real until you can prove what happened. Part of the reason the Therac-25 hurt so many people is the company brushed off the early issue reports.
A lot of the company-culture problems the incidents exposed are still major issues today. The company thought their software was perfect, and they didn’t include it in their analysis of potential failure modes. They didn’t have any independent review of their code. They shipped straight to production (the hardware and software were never tested together outside customer installations). They didn’t document error codes and didn’t differentiate between minor errors and safety-critical errors.

In conversation about 10 months ago from gnusocial.jp permalink
Embed this notice
Zimmie (bob_zim@infosec.exchange)'s status on Sunday, 01-Sep-2024 23:57:19 JST Zimmie
in reply to
- Wary Jerry
- SunTzuCyber
@jerry @SunTzuCyber
In conversation about 10 months ago from infosec.exchange permalink
Attachments
1. A post from “sea shanty stan account” (anarchoshanties.bsky.social) which reads: The Art of War is so funny when you realize it’s basically a very frustrated Sun Tzu writing The Absolute Dipshit Entitled Brat Silver Spoon Nepo Baby’s Guide to Not Immediately Fucking Up A War, which also explains why CEOs like it so much
  https://media.infosec.exchange/infosec.exchange/media_attachments/files/113/057/934/680/096/985/original/a3d276d495ccd4a7.jpeg

Before

Public

Notices by Zimmie (bob_zim@infosec.exchange)

User actions

Following 0

Followers 0

Groups 0

Statistics

Feeds