Conversation

Notices

Embed this notice
pistolero (p@fsebugoutzone.org)'s status on Wednesday, 09-Apr-2025 04:08:04 JST pistolero
@lispi314 @sicp @Suiseiseki @jeffcliff @scathach

> And promptly performance has just dropped by a magnitude or more.

This is false. It's just the same (false) claim repeated since the 80s, and I'm not sure it was true then.

In any case, processing data in bulk has moved on since then: even if you take two orders of magnitude off the performance, being able to reliably and cost-effectively split the data into chunks gets you better performance because if you actually want performance, you will need to parallelize across cores within a machine and using multiple machines across the network. You can do this with plain text in a straightforward way, but it's not quite so easy to do this with a prefab data structure.

HTTP is about as inefficient as a protocol gets: how many webservers' big bottleneck is dealing with the serialization of the requests and responses? That part's negligible: the performance focus is on the easiest way to poll a large number file descriptors.

Massive ETL pipelines work like that, too. In fact, the "E" part is mostly concerned with turning it into a text stream so that the various "T" stages can be swapped in and out and very easily parallelized (or even fork in different directions so the pipeline is more like a tree). Even trivial stuff like the all-causes mortality database, right, it's just a gig and a half of fixed-length records. In the 1970s, computer time cost more money than programmer time and people still used plain text: there's got to be a reason and because Unix hadn't eaten the world yet, dogma doesn't account for it. It's still more than fast enough.

> Compiling various languages to Common Lisp (or some other host language) and running programs in its environment allows for safe near zero-cost IPC (through the use of object capabilities) with no serialization overhead.
> 9p has a lot of such overhead.

You put these sentences right next to each other.

You absolutely do have serialization overhead even if you are using Common Lisp to talk to Common Lisp. The cross-platform solution for this is the Protocol Buffer shit, really unpleasant, high programmer overhead and for near-zero gains for the machine.. It doesn't get much faster than "bytestream" if you want performance (the ethernet chipset will hand it to the kernel, the kernel will place it into the buffer, and the next syscall will get that buffer either copied or remapped into the address space of the calling process). Plain text is pretty easy to convert to a bytestream, because the overhead is zero: plain text is a bytestream.

The illustration from the blog post is treating the log files as TSVs. I've got about a dozen awk scripts monitoring FSE's logs and I fire up top right now to look at them and they are all using less CPU than top. They're using less CPU than all of the "postgres: pleroma fse [local] idle" processes. The awk scripts doing all of this string-mangling and real-time numerical analysis and printing out all of these statistics and I just leave them running in screen and never turn them off because they use no resources; they are all I/O-bound (and, since the logs are all being tailed anyway, the I/O is mostly reading from cache and writing to stdout), and not just I/O-bound but the processing overhead fits inside the roundoff error: that's the best-case scenario for this kind of software. If, instead of `tail -f` they get fed the entire log file, the CPU spike is barely noticeable: they spend more time waiting for read() or write() than they do calculating a rolling average and stddev and then deciding if the current event they are observing is aberrant, and they spend more time doing that than parsing the input.

In conversation about 2 months ago from fsebugoutzone.org permalink
- Phantasm likes this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Wednesday, 09-Apr-2025 06:40:58 JST pistolero
  in reply to
  @lispi314 @Suiseiseki @jeffcliff @scathach @sicp
  
  > Think tools on one's own system instead.
  
  I already was. I mean, this is the kind of shit I do all day every day, for money, for fun, for nothing.
  
  Hamming again: "The purpose of computing is insight, not numbers."
  
  There are one-off scripts you write to answer a question (bashing them into a REPL or actually putting them into a file, whichever), and then there are pipelines.
  
  > It is a *lot* more common for serialization pipelines to become a bottleneck in local data processing, particularly when the storage backing is sufficiently fast to not be the bottleneck.
  
  I have never seen serialization become the bottleneck in my life, except in the rare case where the only processing something is doing is translating between two different methods of serialization. The processing you're doing has to be trivial and serialization has to be incredibly expensive for serialization to be the bottleneck. I have not ever seen a case where serialization was slower than the disk or the network.
  
  FediList is moving all of this data around (~100k reqs/hour) and a lot of it goes through *Ruby* and serialization/deserialization is still not the bottleneck.
  
  > The point where using a program instead of a pipeline to have something barely complex complete within a reasonable amount of time comes pretty quickly.
  
  I shove a 20GB log file through awk and don't have this problem. Fancy NVMe storage, ARM CPUs (and gcc still sucks for optimizing ARM), and *still* everything is I/O-bound.
  
  > Communication efficiency matters and serialized text pipes are not particularly efficient
  
  You keep making these statements without qualifying or quantifying them, and they are broad, vague, useless.
  
  > There is a reason many messaging libraries use shared memory as a transport when possible instead of local sockets.
  
  There's a reason I usually dismiss this kind of statement out of hand.
  
  There are a few reasons to use shared memory instead of pipes: the primary reason is superstition. Joe Armstrong noted that copying is faster than locking for most of the things he was doing, but that even if he benchmarked it, and people would still tell him he was wrong. Superstition.
  
  cmpxchg is faster than a syscall, sure, if you only look at it in isolation, but it depends on the shape of your workload. Odds are better that the syscall overhead is buried because either your pipeline is I/O-bound or one of the processes is more expensive than the others and the others are waiting with full buffers (and completely idle because they're not even scheduled, rather than busy-waiting polling some shared memory) while the expensive process grinds (and because it is eating the data and spitting it back out, it doesn't have to wait to acquire a lock, nor get involuntarily preempted so that the other processes can cmpxchg and nanosleep).
  
  But, look, you are certain of it: build it and get back to me.
  
  > Only if you have not designed the compiler and library to provide an interoperable type usable from both Common Lisp and the hosted language natively without needing further transformations.
  
  The perpetual lament: "this machine was written to run C and Unix, Lisp would be faster if the CPU and the OS and the compiler were designed from the ground up to be faster for Lisp". It's always asserted and never backed up with a working CPU design. We've got FPGAs all over now, you can do this at home, implement type-tagging in the MMU, go nuts. Build it and get back to me.
  
  Chip designers disagree: http://www.yosefk.com/blog/the-high-level-cpu-challenge.html , https://yosefk.com/blog/high-level-cpu-follow-up.html .
  
  But if we narrow it just to talk about a binary serialization scheme, then you're moving the overhead to the barriers where the data enters and exits the system (and that's very generously assuming that you've got zero overhead on things like "pointers" so just a memcopy and you can use the data-structure as-is; I'll believe in that serialization format when I see it).
  In conversation about 2 months ago permalink
  Attachments
  1. No result found on File_thumbnail lookup.
    
    The "high-level CPU" challenge
  2. Domain not in remote thumbnail source whitelist: www.yosefk.com
    
    "High-level CPU": follow-up
- Embed this notice
  LisPi (lispi314@udongein.xyz)'s status on Wednesday, 09-Apr-2025 06:41:00 JST LisPi
  in reply to
  @p @scathach @jeffcliff @sicp @Suiseiseki > HTTP is about as inefficient as a protocol gets: how many webservers' big bottleneck is dealing with the serialization of the requests and responses? That part's negligible: the performance focus is on the easiest way to poll a large number file descriptors.
  
  Think tools on one's own system instead. It is a *lot* more common for serialization pipelines to become a bottleneck in local data processing, particularly when the storage backing is sufficiently fast to not be the bottleneck.
  
  The point where using a program instead of a pipeline to have something barely complex complete within a reasonable amount of time comes pretty quickly.
  
  Communication efficiency matters and serialized text pipes are not particularly efficient (both from serialization and various Linux-related slowdowns). There is a reason many messaging libraries use shared memory as a transport when possible instead of local sockets.
  
  > You absolutely do have serialization overhead even if you are using Common Lisp to talk to Common Lisp.
  
  Only if you have not designed the compiler and library to provide an interoperable type usable from both Common Lisp and the hosted language natively without needing further transformations.
  
  There is some impedance mismatch that may be unavoidable when the languages differ sufficiently, yes.
  
  Observe how protobuf has high impedance mismatch with *everything* instead (it all needs binary serialization). That's not fixing the problem, because that isn't the problem it is intended to fix in the first place.
  
  > (the ethernet chipset will hand it to the kernel, the kernel will place it into the buffer, and the next syscall will get that buffer either copied or remapped into the address space of the calling process)
  
  Most affordable ethernet chipsets are not capable of RDMA. So yes, that is how it goes and that's what one is limited to as a result.
  
  > Plain text is pretty easy to convert to a bytestream, because the overhead is zero: plain text is a bytestream.
  
  Unless you're using stringly-typed languages on both ends, there's overhead converting it back & forth its computation-usable form. Or are you exclusively processing strings?
  
  Even so, with enough fields there is some overhead in needing to traverse the string for separators instead of having an indexed structure.
  
  > using less CPU
  
  You ackowledged that this can just as well be indicative of IO or syscall bottlenecking. And yes, the latter is a thing I've observed.
  
  > they spend more time waiting for read() or write() than they do calculating a rolling average and stddev and then deciding if the current event they are observing is aberrant, and they spend more time doing that than parsing the input.
  
  And if one is unsatisfied with the runtime, the optimization/refactoring targets first considered would be the IO, decision computation and parsing (in this order). The easiest with money would most likely be the IO, and then the parsing, unless one did something wrong and trivially fixable with the decision-making.
  
  In conversation about 2 months ago permalink
- Embed this notice
  LisPi (lispi314@udongein.xyz)'s status on Wednesday, 09-Apr-2025 15:17:22 JST LisPi
  in reply to
  @p @scathach @jeffcliff @sicp @Suiseiseki > I have not ever seen a case where serialization was slower than the disk or the network.
  
  Any slowdown including that initial delay is undesirable, but that initial one can't really be eliminated. That's no reason to keep the others.
  
  > I shove a 20GB log file through awk and don't have this problem. Fancy NVMe storage, ARM CPUs (and gcc still sucks for optimizing ARM), and *still* everything is I/O-bound.
  
  You're using a single awk program instead of a bunch of programs with pipes & buffers between each (and no real need for serialization either).
  
  Obviously it'll work fine. Epsecially since awk was designed for the specific kind of text processing you're doing.
  
  Split out every one of the tasks into a different program and pipe them, you'll get the kind of result I'm talking about.
  
  > The perpetual lament: "this machine was written to run C and Unix, Lisp would be faster if the CPU and the OS and the compiler were designed from the ground up to be faster for Lisp". It's always asserted and never backed up with a working CPU design. We've got FPGAs all over now, you can do this at home, implement type-tagging in the MMU, go nuts. Build it and get back to me.
  
  There's no need for that.
  
  >> We've got half of those. I think you could assemble a plausible Lisp workstation if you build out the Lisp environment on top of the rest. That is, you do it like OpenStep/GNUStep, or like Arcan.
  
  That provides the same kind of environment I'm suggesting.
  
  Medley Interlisp would be another historical example of that overlay environment concept.
  
  In conversation about 2 months ago permalink
  
  pistolero likes this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Wednesday, 09-Apr-2025 15:17:31 JST pistolero
  in reply to
  @lispi314 @Suiseiseki @jeffcliff @scathach @sicp
  
  > Any slowdown including that initial delay is undesirable, but that initial one can't really be eliminated. That's no reason to keep the others.
  
  If the goal is throughput, and if we continue assuming your premise holds, there is no point optimizing anything except the bottleneck while the bottleneck exists.
  
  Lisp environments tend to prize power over speed, anyway. There's a threshold where conciseness or expressiveness or power is worth more than speed, or why would anyone use Lisp to begin with? And at some point, you start having to trade a disproportionate amount of power or expressiveness to gain just a little more speed: all the way at one end of the spectrum we have ASICs, where you get no power or expressiveness because their function is etched into the chip.
  
  I haven't done any Project Euler problems in a while, but my usual approach is to read the problem, bash out a naive solution in a REPL, and then let that run while I get pencil and paper and come up with a real solution. Sometimes the first program is done before I've written the second one, sometimes the process of writing the second one makes it obvious that the first one is not going to finish within the lifetime of the universe.
  
  But same quote from Hamming: "The purpose of computing is insight, not numbers." If you're doing things interactively, you don't look at the runtime of the script, you add the time it took to write the script to the runtime of the script and for the majority of the questions you have, the former is slower than the latter.
  
  So, a shell script. You have a lot of fast programs, you can use pipes to connect them together, you can use awk or a shell or some other scripting language to make little intermediary tweaks, and then you can just dump that into a file and you've taught the computer something, you've increased the shared vocabulary. That's as powerful as it gets: you teach the machine a word, you build concepts into higher-level concepts. It's not as fast as an ASIC but that hardly matters because the power is worth it, and the loose coupling means you can give the machine a definition of the same word in a faster language if you need to, but you rarely need to. You don't need to speak the protocols for Redis or Postgres because you can call `psql` or `redis-cli`: you can incorporate any other machine speaking any other protocol into your system. `curl` and `wget`. So now you've got unlimited conciseness and expressiveness, the entire internet is your computer, and the cost is you speak plain text: I'll take the trade-off, I like plain text anyway.
  
  > You're using a single awk program instead of a bunch of programs with pipes & buffers between each (and no real need for serialization either).
  
  That's not the case, and it's picking one sentence where it sounded like it might be the case, ignoring all of the other examples where it was definitely not.
  
  Sometimes it's a single awk program. Usually it's not: off the top of my head, a pretty common pattern is `awk|sort|awk` where the first awk is a small transformation to make the data sortable and the second is for doing the order-dependent computation. Sometimes it is `grep|awk`, because the filter might be easier to write in grep or because the awk actually is expensive. Pretty frequently there is a zcat or something near the beginning of the pipeline. It could be `(zcat $list_of_files;tail -f $other_file)|grep|awk|grep`. You do `for box in $boxes; do ssh -C -q $box grep something /some/log/file; done | mawk`, you've got the filter distributed to the CPUs closest to the disks (as Armstrong noted, it's easier to send the data to the program than the other way around), it's sequential but easy to write, it's trivial to parallelize if it actually does end up useful to parallelize, either in the shell directly with redirects and `&` and `wait` or with xargs.
  
  Everything anyone does with a computer is filter, map, sort, reduce. You deal with a million records each on a dozen machines like it's nothing nowadays: it's just a gig, it fits in memory, answer in under a second.
  
  > Epsecially since awk was designed for the specific kind of text processing you're doing.
  
  That's not an accident. It's also not an accident that programs are expressed in plain text: we use words, the machine has been made to understand plain text, it's easy to read, write, generate. It's ridiculously portable: you can talk to a machine from the 1960s if you speak ASCII. Unconventional things become trivial when you can manipulate programs, I'm sure a Lisp fan knows that. (I mean, consider the implications of `xargs -0 -P$whatever -n1 sh -c`, or, you know, makefiles.) Programs are text and we have a wealth of tools for manipulating text.
  
  > Split out every one of the tasks into a different program and pipe them, you'll get the kind of result I'm talking about.
  
  I'm interested in what workload makes it reasonable to write a pathological pipeline like that, which you also do on your desktop machine, and where performance matters.
  
  > That provides the same kind of environment I'm suggesting.
  
  Seeing an environment like that is delightful. It would delight me. Complaints about plain text and speculation that pipes are slow, that's a pain. But building it and getting results, that's delightful, everyone is delighted.
  
  > Medley Interlisp would be another historical example of that overlay environment concept.
  
  I think the distinction stopped mattering between an "overlay environment" or a "hosted environment" or a VM. Whatever I'm looking at on my screen, that's the interface, and whatever it can talk to, that's the environment, and whatever machines it can reach are the system. People get hung up.
  
  So a closed environment is a little stultifying, but you've probably got something really cool if you build your Lisp system such that it's good for talking to the rest of the world pretty easily (plain text, HTML over HTTP, JSON over HTTP, and that's 90% of anything anyone needs to deal with nowadays; let it read a pipe and translate it into whatever internal format you have in mind and you might have something really great).
  
  In conversation about 2 months ago permalink
- Embed this notice
  Thomas Magnum (leyonhjelm@shitposter.world)'s status on Wednesday, 09-Apr-2025 15:51:09 JST Thomas Magnum
  in reply to
  @p @Suiseiseki @sicp @jeffcliff @scathach @lispi314
  In conversation about 2 months ago permalink
  Attachments
  1. Untitled attachment
    https://media.shitposter.world/shitposter.club/c7/b3/9f/c7b39f3e1064cbbe5ef2c21cdb228610098f719eb986ff0aefbb27a952b31e38.png?name=53vq5mMk3vuwXw.png
  pistolero likes this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Wednesday, 09-Apr-2025 15:51:34 JST pistolero
  in reply to
  @Leyonhjelm @Suiseiseki @jeffcliff @lispi314 @scathach @sicp :terrylol2:
  
  In conversation about 2 months ago permalink
- Embed this notice
  the_daikon_warfare (sicp@freesoftwareextremist.com)'s status on Wednesday, 09-Apr-2025 15:52:05 JST the_daikon_warfare
  in reply to
  @p @Suiseiseki @jeffcliff @scathach @lispi314
  > but you've probably got something really cool if you build your Lisp system such that it's good for talking to the rest of the world pretty easily (plain text, HTML over HTTP, JSON over HTTP
  JSON is pretty much the same as S-expressions, so it's pretty easy to just read them in as those and deal with them as you normally would. I've written websocket code for emacs that fires off event handlers, and it just applys it as normal function arguments.
  
  In conversation about 2 months ago permalink
  
  pistolero likes this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Wednesday, 09-Apr-2025 15:58:37 JST pistolero
  in reply to
  @sicp @Suiseiseki @jeffcliff @lispi314 @scathach
  
  > I've written websocket code for emacs that fires off event handlers, and it just applys it as normal function arguments.
  
  Ha, wow.
  
  This is kind of the thing I like about acme, you know, you don't have to write it in $language (acme doesn't even have one), you just write it in whatever and type `win`, so the shoutbox client, I can just run that in acme.
  
  In conversation about 2 months ago permalink
- Embed this notice
  the_daikon_warfare (sicp@freesoftwareextremist.com)'s status on Wednesday, 09-Apr-2025 16:01:48 JST the_daikon_warfare
  in reply to
  @p @Suiseiseki @jeffcliff @scathach @lispi314
  One thing Acme did is get me to realize I hated syntax highlighting.
  In conversation about 2 months ago permalink
  Attachments
  1. Untitled attachment
    https://media.freesoftwareextremist.com/media/c7/5e/2f/c75e2fc5cb73ee0bf5610b175345bc845239f726a308c6b400fa23f195002986.jpg
  2. Untitled attachment
  ✙ dcc :pedomustdie: :phear_slackware: and pistolero like this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Wednesday, 09-Apr-2025 16:11:46 JST pistolero
  in reply to
  @sicp @Suiseiseki @jeffcliff @lispi314 @scathach
  
  > One thing Acme did is get me to realize I hated syntax highlighting.
  
  Oh, yeah, I got really excited about color in ls and in top but when I'm trying to read code, it fucks with me. My only concession to it was, when I was still using vim, TODO/FIXME/HACK/XXX were colored, and eventually I had it make comments bold.
  
  (The bold comments were due to an incident with a bad codebase where the initial author had decided to write his own preprocessor so that he could do "literate JS" and typeset the codebase using LaTeX--and all of that and several printouts of the codebase were stacked around the office but the code didn't work--I had it make comments bold, because the delimiters for the "literate" sections didn't jump off the screen and the bastard loved writing 20-page comments that also included alternative implementations, so you might not discover until it was too late that you were debugging the imagination games version of the function and not the one that was actually being executed. I probably would have throttled the guy if he still worked there when I showed up.)
  
  In conversation about 2 months ago permalink
- Embed this notice
  the_daikon_warfare (sicp@freesoftwareextremist.com)'s status on Wednesday, 09-Apr-2025 19:53:24 JST the_daikon_warfare
  in reply to
  @p @Suiseiseki @jeffcliff @scathach @lispi314
  > when I'm trying to read code, it fucks with me.
  Same. Don't mind it for writing notes and stuff but when I'm dealing with code I don't like having all of the same words grab my attention.
  
  That guy sounds sick in the head.
  
  In conversation about 2 months ago permalink
  
  pistolero likes this.

Public

Conversation

Notices

Feeds