So, I'm facing a weird issue with my #fedora workstation. The filesystem goes "read-only" at random times and I dont know why. I've do a reboot to fix it. Do you have any idea about this and how to fix it ?
@javalps Wow, that's crazy. I'm really glad you found what's triggering the problem, and that it wasn't a drive failure. Seems odd that a piece of software would be able to do that to your filesystem — maybe it's calling some function in the operating system that has a problem and isn't used much by other software 🤔 It might be worth flagging it with Fedora as an issue, too (not sure how you'd do that, but there must be a way). @abhijith@fossunleashed@llutz@zenbrowser
But, the moment I opened Zen Browser suddenly the filesystem went read-only. Coincidence, I think not. This also happened when I was offline. I opened Zen it showed me the webpage, all good. I reloaded it, it showed me the classic thing "something unexpected happened". All good. But when I closed the #browser, suddenly the FS went read-only.
I now think that I can pinpoint the problem of the filesystem going read-only and it's (probably) neither the FS itself nor the nvme drive. And definitely not the RAM.
The problem is a single app that's causing this or that's what I found and its the @zenbrowser browser.
So, today morning I opened the laptop with #wifi turned off and checked the system. It was going alright. The filesystem was behaving normally like it should. I also double-checked it using `mount | grep "btrfs"` and `fastfetch`. To check the nvme drive, I used `sudo smartctl --xall /dev/nvme0n1p3` + the diagnostics tool in the bios menu.
To check the memory I used `sudo memtester 1024 5`. And EVERYTHING was fine. Even when I turned on the #wifi nothing changed. I opened the #gnomesoftware app, I also opened #firefox to browse #youtube and log in to this instance. Everything was fine.
@javalps@abhijith@fossunleashed ... quite clearly indicated. Nothing you've written here in the SMART test output screams disk failure to me (the test would fail rather than pass, for starters). I'm not a disk expert, though. You could try the extended test but some hard drives abort them... western digital I think.
In short, I've never had a failing disk look in SMART like there's nothing wrong with it — with the symptoms you're describing a SMART failure should be obvious.
(2/2) Data Units Read: 5,746,241 [2.94 TB] Data Units Written: 9,246,806 [4.73 TB] Host Read Commands: 86,356,172 Host Write Commands: 202,068,936 Controller Busy Time:412 Power Cycles:422 Power On Hours: 2,470 Unsafe Shutdowns:8 Media and Data Integrity Errors: 0 Error Information Log Entries: 1 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 37 Celsius
@javalps@abhijith@fossunleashed It sounds like you've done the test so you don't need to do it through the BIOS as well (it would be the same). With SMART you can do a short test or an extended test. There's also data you can pull from the drive (like the number of times it has ever failed reads in its lifetime, stuff like that). When I had a problem like this recently I ended up using a GUI tool that highlighted the problems, and before that when I used a BIOS test myself the failure was...
@javalps If the BIOS doesn't have it you could try installing smartmontools or similar (while the disk is working) and try to run them. I think you can run them off a USB too if needed.
@javalps See if you can check online which keypress will open up the BIOS menu on your workstation (it's different by manufacturer — sometimes a notice is shown during boot telling you which key to press, but it's best to check to be sure). Once you reboot the machine and use the keypress to get into the BIOS menu, you'll be able to navigate the options (via the keyboard). The exact menu structure varies. Look for something to do with disks or devices, or SMART in particular.
It's possible corruption was caused by some kind of physical failure even if it doesn't show up in SMART reports. You can run `btrfs scrub start /` and `btrfs scrub status /` frequently to monitor data integrity in the future (and keep important backups constantly up-to-date!). Or add another drive to your (new) btrfs filesystem and set up RAID1, if you can. It can automatically repair corrupted data using healthy copies on the other drive so you'd be less likely to lose data in case your current drive fails. If the drive is encrypted, using DUP profile on a single drive instead of RAID1 on two might also protect from some minor failures (encryption should prevent the hardware from detecting and deduplicating identical data).