When Snowcrash was written, the VR we had was MUDs and MOOs. People were trying to write 3D standard descriptions, like VRML, but the people were in text.
These text systems, if coupled with a text to speech converter, are accessible to blind people and potentially illiterate people. They also require much less bandwidth for people on slow connections.
The physical realities of permacomputing are fantasy but the design goals of longevity require some low tech solutions. MOOs are suitably low tech.
But what about artists? MOOs allow writers to shine, but what about composers?
Music streaming is relatively intense throughput, but MIDI data isn't. Around 2004, I wrote a MOO client in SuperCollider. Objects, including players, could optionally specify arrays of notes, based on the notation used by Nokia ringtones. With this system, I could hear when my friends had logged in, because it would play their leitmotif.
How might a hybrid system work? What if objects were allowed small audio files or small image files? What if some rooms were all text, but others looked like hubs? How could this system retain accessibility while allowing expression across multiple modalities?