26:45 - I actually prefer the cute term "funclet" for these, but it was observed during rehearsals (quite correctly) that the code I wrote in the demangler calls them "partial functions", and since presumably people have been seeing that term in stack traces for the last four years, perhaps we ought to take this opportunity to explain to people what it actually means.
34:00 - The initial slab of a task tends to be a bit smaller than this because the task object is colocated with it. And subsequent slabs can be larger if the task needs to allocate more than the default slab size at once. The current task allocator does not hand memory back to the system until the task is complete, so reaching a high-water mark and then suspending is somewhat punished; probably we ought to at least allow unused slabs to be reclaimed if memory is running low.
30:00 - Swift does a lot of extra optimizations and analysis with local vars to try to make this little bit of semantics zero-cost when you don't need it. For example, I don't talk about dynamic exclusivity enforcement in this talk, but we potentially have to use it with local vars if they escape, and yet we really don't want to pay for it if they don't. We also recognize when a reference capture can be a value capture because the value isn't modified after the capture point.
32:00 - Protocol types benefit from the same class-bound optimization as generic layout does; if we have `protocol DataModel : AnyObject`, then the struct would be more like:
That is, we know that the representation is a single class reference, and we know that we can recover its dynamic type from that class reference and don't need to store it separately.
31:15 - The witness functions are also passed a pointer to the witness table, which enables default implementations to be placed directly in the witness table without being specialized. (This is required in some cases with library evolution.) But that ended up being too busy for such a minor detail on a slide.
Of course, in actual C code, they also require calling-convention attributes like the closure example above.
Overall, I'm quite happy with the talk, and I think I did a good job covering some fairly advanced topics in a fairly approachable way. I had a couple different audiences in mind here: both experienced Swift programmers and longtime C programmers who were relatively new to Swift. I hope it's useful to folks.
Note that, if the protocol type only has ObjC protocol conformances (which all imply a class bound), there are no witness table pointers, and so this devolves to simply the class reference. We rely on this in bridging to ObjC, but it also just falls out of the layout rules.
0:30 - The C blocks extension is probably the most significant exception to this — __block variables can get allocated on the heap. But even that is largely only triggered by function calls in C and non-ARC ObjC, although we're currently reviewing a patch to fix a self-reference-during-initialization bug that will require eager heap allocation.
Hey everybody get a load of this guy with crazy eyebrows talking about the implementation of a bunch of Swift language features: https://developer.apple.com/wwdc24/10217
3:45 - I wish I'd had more time to talk about optimization throughout this talk, but it would've made it even drier and denser. I'll try to elaborate on this stuff in these annotations.
2:30 - I really do want to emphasize that you should always start looking at performance with a top-down, high-level investigation. A lot of people think microbenchmarks are meaningful! They often really aren't! Even when they mean something, it's usually not what the author was trying to evaluate.
6:10 - I technically do say this right in the talk, but I feel a little guilty about how I'm conflating two slightly different points here. Typically, static dispatch is necessary in order to do interprocedural optimization (although it is possible to do speculative compile-time IPO). Static dispatch is not *sufficient*, though — if the compiler can't see the definition of the function it's calling, IPO is sill impossible.
5:25 - This applies to a certain extent even when the argument has to be passed on the stack — processors have to retire memory accesses in their correct order for the memory model, but they often can still do out-of-order forwarding through memory, especially for stack-based patterns. Modern processors really are incredible; I could talk about this stuff for hours
7:40 - In this assembly snippet, you can also see the code that sets up the frame header by saving and restoring the caller's frame pointer (x29) and the return address (x30). These are 8-byte registers; 208 - 2*8 == 192, so you can see that these go at the very top of the frame.
The most important and obvious form of IPO is inlining, which exists in some form in basically every programming language. Swift also does generic specialization (either implicitly as part of inlining or as a separate operation, as shown later in the talk) and ownership-convention optimization, as well as a few other things like specializing the callee for constant arguments (including constant function arguments for higher-order functions).
The compiler doesn't have to pop the frame with an add — if the function allocates dynamic amounts of memory, the compiler will generally save SP and store it back. But I believe the sub/add pattern is micro-architecturally favored for things like the OOO forwarding I mention above, so the compiler emits it when it can
8:00 - This is slightly a lie in that the compiler doesn't always put separate variables in non-overlapping locations within the frame, the way a Swift or C struct would. Compilers do stack-slot coloring to reuse stack positions for variables whose lifetimes don't overlap. Values can also move within the stack if the compiler thinks that's a good idea. But understanding it as a C struct is a reasonable first approximation.
11:55 - Apple hardware is quite ARM64-dominant, so that's what I'm using as a base assumption throughout this talk, but of course the size and alignment of this type are target-depedent.
Language designer and implementer at Apple: Swift, Objective-C, C++, C. Erstwhile editor of the Itanium C++ ABI. Focus on security mitigations and low-level language implementation.