huh, systemd-run is really neat, you can do stuff like:
$ systemd-run --user -S -p MemoryHigh=1000M -p MemoryMax=1100M
and get a shell inside which you can't use more than around 1G of RAM (but can use more swap)?
Notices by Jann Horn (jann@infosec.exchange)
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Thursday, 11-Sep-2025 04:19:48 JST Jann Horn
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Monday, 25-Aug-2025 02:35:54 JST Jann Horn
@whitequark what would bypassing ELF loading mean? pretty much the only elf loading the kernel does for a static binary is to map its memory ranges into an address space and then run it starting at the entry point...
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Monday, 25-Aug-2025 02:35:53 JST Jann Horn
@whitequark ah yes the vdso section is just a VMA with a custom page fault handler that inserts PTEs pointing to an in-kernel buffer on demand (and vvar is basically like that, too).
but ELF loading in the kernel isn't really all that complicated either, you basically go through an array of "please map this range to this location"... -
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Monday, 25-Aug-2025 02:27:54 JST Jann Horn
@whitequark why is the posix compatible application making direct system calls
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Wednesday, 20-Aug-2025 07:51:46 JST Jann Horn
@vbabka also note that the "user" and "sys" times together are way lower than the "real" time
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Wednesday, 20-Aug-2025 07:51:46 JST Jann Horn
@vbabka oh no it's much worse (in terms of wall clock time) than just mutex contention, and doing the join before the open_sockets() call in main() would help somewhat but not all that much (because the open_sockets() call in the other thread would still be slow). and it's not the network subsystem's fault 😆
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Wednesday, 20-Aug-2025 07:51:45 JST Jann Horn
@vegard @vbabka that's not it. I'm pretty sure if these threads were runnable they would run, I tested this on a pretty much idle system
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Wednesday, 20-Aug-2025 07:51:44 JST Jann Horn
@vbabka @vegard nope! here's another example with the same performance issue:
user@debian12:~/test$ cat slow2.c
#include <pthread.h>
#include <unistd.h>
#include <err.h>
#include <sys/socket.h>
static void open_fds(void) {
for (int i=0; i<256; i++) {
int fd = dup(0);
if (fd == -1)
err(1, "dup");
}
}
static void *thread_fn(void *dummy) {
open_fds();
return NULL;
}
int main(void) {
pthread_t thread;
if (pthread_create(&thread, NULL, thread_fn, NULL))
errx(1, "pthread_create");
open_fds();
if (pthread_join(thread, NULL))
errx(1, "pthread_join");
return 0;
}
user@debian12:~/test$ gcc -O2 -o slow2 slow2.c -Wall
user@debian12:~/test$ time ./slow2
real 0m0.048s
user 0m0.001s
sys 0m0.000s
user@debian12:~/test$ -
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Tuesday, 19-Aug-2025 17:41:09 JST Jann Horn
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Tuesday, 19-Aug-2025 17:39:32 JST Jann Horn
@rgo closing the sockets would be one way to avoid the performance hit, yes; but can you also avoid the performance hit while opening that many sockets and keeping them open?
(sorry, I guess it's not a great example) -
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Tuesday, 19-Aug-2025 17:38:42 JST Jann Horn
Linux kernel quiz: Why is this program so slow and takes around 50ms to run?
user@debian12:~/test$ cat > slow.c
What line do you have to add to make it run in ~3ms instead without interfering with what this program does?
#include <pthread.h>
#include <unistd.h>
#include <err.h>
#include <sys/socket.h>
static void open_sockets(void) {
for (int i=0; i<256; i++) {
int sock = socket(AF_INET, SOCK_STREAM, 0);
if (sock == -1)
err(1, "socket");
}
}
static void *thread_fn(void *dummy) {
open_sockets();
return NULL;
}
int main(void) {
pthread_t thread;
if (pthread_create(&thread, NULL, thread_fn, NULL))
errx(1, "pthread_create");
open_sockets();
if (pthread_join(thread, NULL))
errx(1, "pthread_join");
return 0;
}
user@debian12:~/test$ gcc -O2 -o slow slow.c -Wall
user@debian12:~/test$ time ./slow
real 0m0.041s
user 0m0.003s
sys 0m0.000s
user@debian12:~/test$ time ./slow
real 0m0.053s
user 0m0.003s
sys 0m0.000s
user@debian12:~/test$ -
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Thursday, 14-Aug-2025 04:24:16 JST Jann Horn
@jmorris @brauner Christian is joking about how I only learned about this feature because I looked at a patch that intended to use MSG_OOB as part of the new core dumping mechanism
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Tuesday, 12-Aug-2025 16:10:49 JST Jann Horn
I landed an LLVM change today that plumbs LLVM's existing !heapallocsite metadata into DWARF: https://github.com/llvm/llvm-project/commit/3f0c180ca07faf536d2ae0d69ec044fcd5a78716
This associates allocator call sites (in particular calls to C++ new) with DWARF type information; see the corresponding DWARF standard enhancement proposal, which has landed in the DWARF 6 Working Draft, and Microsoft's prior work that this is based on.
If you have C++ code that allocates heap objects with operator new and use a memory allocator that records the addresses from which it is called, this can be used by debugging/profiling tools to determine the types of heap allocations at runtime.
(LLVM does not yet support this for C-style malloc() calls yet though.)
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Tuesday, 12-Aug-2025 15:51:10 JST Jann Horn
I found a Linux kernel security bug (in AF_UNIX) and decided to write a kernel exploit for it that can go straight from "attacker can run arbitrary native code in a seccomp-sandboxed Chrome renderer" to kernel compromise:
https://googleprojectzero.blogspot.com/2025/08/from-chrome-renderer-code-exec-to-kernel.htmlThis post includes fun things like:
- a nice semi-arbitrary read primitive combined with an annoying write primitive
- slowing down usercopy without FUSE or userfaultfd
- CONFIG_RANDOMIZE_KSTACK_OFFSET as an exploitation aid
- a rarely-used kernel feature that Chrome doesn't need but is reachable in the Chrome sandbox
- sched_getcpu() usable inside Chrome renderers despite getcpu being blocked by seccomp (thanks to vDSO)
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Tuesday, 12-Aug-2025 15:51:09 JST Jann Horn
@brauner new feature work is one of the best ways to find bugs in existing code, I think :D
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Thursday, 03-Jul-2025 08:49:50 JST Jann Horn
@dysfun @whitequark why is this a problem? as long as you have virtual memory, you pay something like up to 4093 bytes of data memory more than you need, plus some inefficiency of TLBs and page tables?
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Thursday, 03-Jul-2025 07:49:55 JST Jann Horn
@whitequark compiler function attribute that teaches the compiler to lazily copy the stack after setjmp() has been called, so you basically have one active stack pointer and one stack pointer for saving old stack contents, and every callee of such a function, immediately after the call returns, backs up the current state of the now-active frame into memory referenced by the setjmp buffer
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Friday, 02-May-2025 02:13:43 JST Jann Horn
Pretty far up the list of things I find terrible about Linux development is how emailed patches often have no clear machine-readably-specified commit they should apply to which is available in git - so it takes some manual effort to figure out how to locally apply them so that I can look at the entire codebase with the patches applied.
Looking at a complex patch series with just 3 lines of context would be a really bad idea...
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Thursday, 01-May-2025 21:27:23 JST Jann Horn
@whitequark is the GPU part of a CPU that enforces a combined TDP limit or are those things completely separate?
-
Embed this notice
Jann Horn (jann@infosec.exchange)'s status on Thursday, 01-May-2025 21:23:37 JST Jann Horn
@whitequark does one of the two implementations involve more thread switches, where you have lots of "thread A wakes thread B, then thread A goes to sleep"? AFAIK the kernel already can't handle those particularly well (https://youtu.be/KXuZi9aeGTw?t=611), and I imagine adding more noise to the scheduler could make things worse?