Big Storage on #FreeBSD question: suppose you have eight-lane LSI/Avago/Broadcom SAS card (mpr(4)) and a WD Ultrastar Data60 enclosure. If you connect both miniSAS connectors on the HBA to the same IOM on the enclosure, does the HBA actually treat it as a single eight-lane link or is it functionally indistinguishable from connecting each four-lane cable to a different IOM? (We have a decent number of these, despite some peccadilloes it's been our preferred JBOD enclosure for some time.)
Conversation
Notices
-
Embed this notice
Garrett Wollman (wollman@mastodon.social)'s status on Wednesday, 18-Jun-2025 13:03:02 JST Garrett Wollman
-
Embed this notice
feld (feld@friedcheese.us)'s status on Wednesday, 18-Jun-2025 13:02:59 JST feld
@wollman I have an enclosure that could be multipathed but I'm super lazy so i haven't, I may play with this as well ... will send you a message if anything comes of it -
Embed this notice
Garrett Wollman (wollman@mastodon.social)'s status on Wednesday, 18-Jun-2025 13:03:00 JST Garrett Wollman
This should then be enough information to balance utilization across all available PHYs. gmultipath(8) isn't smart enough to do this on its own, it needs to be told explicitly (and probably again at every reboot). Can plz hav "smart" multipathing?
-
Embed this notice
Garrett Wollman (wollman@mastodon.social)'s status on Wednesday, 18-Jun-2025 13:03:00 JST Garrett Wollman
I spent a bit of time trying to write a simple Perl script to parse the output of `camcontrol smpphylist` and ran into a wall: there is no way to get the *local* WWPN of the expander. You can see all the WWPNs on the devices connected to it (which might include other expanders) but given two phylists from expanders that are physically connected to each other, there's no way to tell (except by heuristic matching) that 0x5000ccab054d0b3d and 0x5000ccab054d0b7d are in fact wired together.
-
Embed this notice
Garrett Wollman (wollman@mastodon.social)'s status on Wednesday, 18-Jun-2025 13:03:00 JST Garrett Wollman
That information *does* exist, in the protocol, at least for drives: `smartctl -l sasphy` on a drive will tell you the WWPN of the drive and of the expander it's attached to. But this is now getting a lot more involved than I expected it to be.
It would be great if someone smarter than me who actually knew the protocol could implement a tool that would explore the SAS topology and output it as a graph in a standard format.
-
Embed this notice
Garrett Wollman (wollman@mastodon.social)'s status on Wednesday, 18-Jun-2025 13:03:01 JST Garrett Wollman
Ok, so I've figured out a bit more about how to figure out the topology of these things under #FreeBSD, using everyone's favorite utility, camcontrol(8)!
The key is that there are six SAS expanders in these boxes: each IOM has one on the host side and two on the drive side -- but the enclosure service processor is connected to the host-side expander. So, if you do `camcontrol smpphylist sesX` you'll see the SAS neighbors of all 48 ports of the host-side expander of that IOM.
-
Embed this notice
Garrett Wollman (wollman@mastodon.social)'s status on Wednesday, 18-Jun-2025 13:03:01 JST Garrett Wollman
And if you look at the WWNs, you can clearly see ports 0-23 are connected to the host ports on the back panel -- right now the one I'm looking at only has four lanes wired. Then ports 24-43 are wired to the two drive-side expanders, ports 44-46 are the cross-connect to the other IOM, port 47 is unconnected, and port 48 is the service processor.
-
Embed this notice
Garrett Wollman (wollman@mastodon.social)'s status on Wednesday, 18-Jun-2025 13:03:01 JST Garrett Wollman
`camcontrol smpphylist` conveniently gives the #FreeBSD device names associated with WWNs it knows about, so for example if you run that command against a drive, it will show you the device names of all the other drives on that same expander. If I look at, say, `da5` on one server, it tells me that it's on the expander that is only wired to nine drives, which happened to end up being assigned `da1` through `da10` except `da2`.
-
Embed this notice
Garrett Wollman (wollman@mastodon.social)'s status on Wednesday, 18-Jun-2025 13:03:02 JST Garrett Wollman
I spent a while reading the manual the other night, particularly staring at the block diagram. Each IOM has 24 host-facing lanes, 20 drive-facing lanes, 3 lanes of cross-connect to the other IOM, and 1 lane for the enclosure service processor. The drive-facing lanes are then split evenly, 10 lanes each to 2 expanders -- one of which services only 9 drives and the other of which services the remaining 51. And they don't tell you which slots are the undersubscribed ones. WTF?
-
Embed this notice
Garrett Wollman (wollman@mastodon.social)'s status on Wednesday, 18-Jun-2025 13:03:02 JST Garrett Wollman
My guess is that the Data60 shares its IOM design with a larger enclosure -- I think they have a 90-drive one? -- where both expanders are equally, or more nearly equally, oversubscribed. But anyway, this suggests that there's not much point in connecting more than 10 lanes per host.
This also raises the question of whether #FreeBSD can distinguish between the short path, HBA to IOM to disk, and the longer and lower-bandwidth path, HBA to IOM-A to IOM-B to disk.
-
Embed this notice
Garrett Wollman (wollman@mastodon.social)'s status on Wednesday, 18-Jun-2025 15:16:11 JST Garrett Wollman
@feld We've always set up gmultipath just because it was the easiest way to label the drives by physical location. But having read up on the weird-ass way these expanders are set up, I definitely want to do more work to make sure we're not accessing the drives over a congested link!
feld likes this.
-
Embed this notice