Hey everybody get a load of this guy with crazy eyebrows talking about the implementation of a bunch of Swift language features: https://developer.apple.com/wwdc24/10217
We've been kicking around an idea to add a similar `indirect` feature for structs, so that you can very easily adopt copy-on-write on some or all of a struct's members. If we had that already, it would've made a really nice narrative for the upcoming section about getting value semantics with out-of-line storage! Alas.
@rjmccall I’ve always had the intuitive feeling that indirect is a bit of a hack, and should either be not present or be generalized. This idea is intriguing!
13:05 - This is a *huge* simplification to the point of being a lie. Array's single stored property is another struct, which itself has another struct as its single stored property. Eventually, on Apple platforms, that bottoms out as a Builtin.BridgeObject, which is essentially an optimized sum of either an ObjC or a Swift object reference. So it is a pointer that's referenced-counted at the end of the day, but there's a lot of extra stuff to it that I had to leave out.
13:32 - I had to cut this for time, but of course indirect enum cases don't use inline storage. Instead, indirect case payloads are stored in a case-specific heap allocation. Swift doesn't currently support mutating enum cases, but when we do, we expect that the language will implicitly use copy-on-write on these.
11:20 - There's a whole theory waiting to be written about representations of values, but you should not try to find out more by googling "representation theory"
11:55 - Apple hardware is quite ARM64-dominant, so that's what I'm using as a base assumption throughout this talk, but of course the size and alignment of this type are target-depedent.
8:00 - This is slightly a lie in that the compiler doesn't always put separate variables in non-overlapping locations within the frame, the way a Swift or C struct would. Compilers do stack-slot coloring to reuse stack positions for variables whose lifetimes don't overlap. Values can also move within the stack if the compiler thinks that's a good idea. But understanding it as a C struct is a reasonable first approximation.
The compiler doesn't have to pop the frame with an add — if the function allocates dynamic amounts of memory, the compiler will generally save SP and store it back. But I believe the sub/add pattern is micro-architecturally favored for things like the OOO forwarding I mention above, so the compiler emits it when it can
The most important and obvious form of IPO is inlining, which exists in some form in basically every programming language. Swift also does generic specialization (either implicitly as part of inlining or as a separate operation, as shown later in the talk) and ownership-convention optimization, as well as a few other things like specializing the callee for constant arguments (including constant function arguments for higher-order functions).
7:40 - In this assembly snippet, you can also see the code that sets up the frame header by saving and restoring the caller's frame pointer (x29) and the return address (x30). These are 8-byte registers; 208 - 2*8 == 192, so you can see that these go at the very top of the frame.
5:25 - This applies to a certain extent even when the argument has to be passed on the stack — processors have to retire memory accesses in their correct order for the memory model, but they often can still do out-of-order forwarding through memory, especially for stack-based patterns. Modern processors really are incredible; I could talk about this stuff for hours
6:10 - I technically do say this right in the talk, but I feel a little guilty about how I'm conflating two slightly different points here. Typically, static dispatch is necessary in order to do interprocedural optimization (although it is possible to do speculative compile-time IPO). Static dispatch is not *sufficient*, though — if the compiler can't see the definition of the function it's calling, IPO is sill impossible.
2:30 - I really do want to emphasize that you should always start looking at performance with a top-down, high-level investigation. A lot of people think microbenchmarks are meaningful! They often really aren't! Even when they mean something, it's usually not what the author was trying to evaluate.
3:45 - I wish I'd had more time to talk about optimization throughout this talk, but it would've made it even drier and denser. I'll try to elaborate on this stuff in these annotations.
0:30 - The C blocks extension is probably the most significant exception to this — __block variables can get allocated on the heap. But even that is largely only triggered by function calls in C and non-ARC ObjC, although we're currently reviewing a patch to fix a self-reference-during-initialization bug that will require eager heap allocation.
@rjmccall Thanks for the talk! I think I •finally• understand what you all mean when you talk about async functions having “stack discipline.” In the past, I’d imagined that meant that variables whose lifetimes span suspension points somehow lived in the normal C stack, which seemed…nonsensical. But this “heap allocation but with stack-shaped guarantees” thing makes good sense! Nice talk.
Note that, if the protocol type only has ObjC protocol conformances (which all imply a class bound), there are no witness table pointers, and so this devolves to simply the class reference. We rely on this in bridging to ObjC, but it also just falls out of the layout rules.
Overall, I'm quite happy with the talk, and I think I did a good job covering some fairly advanced topics in a fairly approachable way. I had a couple different audiences in mind here: both experienced Swift programmers and longtime C programmers who were relatively new to Swift. I hope it's useful to folks.
31:15 - The witness functions are also passed a pointer to the witness table, which enables default implementations to be placed directly in the witness table without being specialized. (This is required in some cases with library evolution.) But that ended up being too busy for such a minor detail on a slide.
Of course, in actual C code, they also require calling-convention attributes like the closure example above.
32:00 - Protocol types benefit from the same class-bound optimization as generic layout does; if we have `protocol DataModel : AnyObject`, then the struct would be more like:
That is, we know that the representation is a single class reference, and we know that we can recover its dynamic type from that class reference and don't need to store it separately.
30:00 - Swift does a lot of extra optimizations and analysis with local vars to try to make this little bit of semantics zero-cost when you don't need it. For example, I don't talk about dynamic exclusivity enforcement in this talk, but we potentially have to use it with local vars if they escape, and yet we really don't want to pay for it if they don't. We also recognize when a reference capture can be a value capture because the value isn't modified after the capture point.
34:00 - The initial slab of a task tends to be a bit smaller than this because the task object is colocated with it. And subsequent slabs can be larger if the task needs to allocate more than the default slab size at once. The current task allocator does not hand memory back to the system until the task is complete, so reaching a high-water mark and then suspending is somewhat punished; probably we ought to at least allow unused slabs to be reclaimed if memory is running low.
26:45 - I actually prefer the cute term "funclet" for these, but it was observed during rehearsals (quite correctly) that the code I wrote in the demangler calls them "partial functions", and since presumably people have been seeing that term in stack traces for the last four years, perhaps we ought to take this opportunity to explain to people what it actually means.
25:15 - In practice, most async functions will also have additional implicit potential suspension points related to scheduling: whenever you enter the function (either in the prologue or after returning from a call), the function will potentially suspend in order to make sure it's running on the right executor. The optimizer will remove these suspensions if the function does nothing of significance before it reaches a different suspension point, such as a call or return.
22:00 - The fact that Swift already supports the complexity of dynamically-sized types for all these abstraction reasons actually means we're also well set up to support them for other reasons. For example, people have been talking about how to support fixed-size arrays in Swift for awhile; if we add that feature, I think we could relatively easily go further and support non-constant bounds, and we wouldn't have to restrict where they appear the way that e.g. C99 does with VLAs.
22:50 - As mentioned above, the special case here is that the first stored property of a struct always has an offset of zero, even if it's dynamically-sized.
There's another special case we *could* do: because Swift caps type alignment to 16 bytes, if the previous stored property happens to end at an offset that's a multiple of 16, we ought to know that the next property always starts there without any alignment padding. But I don't believe we currently take advantage of this.
Fundamentally, Swift is making a usability decision with a performance trade-off — by default, we assume it's better to implicitly copy something than to force the programmer to prove that it isn't simultaneously accessed. Again, you can certainly argue that that's the wrong decision to make.
I think Swift's decision here is the right one for most code, but not having some of these features in place already does make the story feel a little incomplete.
17:45 - `print` is actually a really bad example here. The current `print` cannot take a value like this without copying it because it actually takes arguments of type `Any`, and constructing any `Any` requires consuming a value representation. This is one of several tragic things about the current definition of `print` that we'd like to fix.
18:25 - The most obvious missing language feature here is a `borrow` operator, which I expect to be a relatively straightforward addition to the language. But we also ought to be able to make stronger guarantees about automatically borrowing in many more situations, like when you pass the value of a local variable that definitely is not being simultaneously mutated or consumed.
When the property is actually implemented as a stored property, this coroutine simply directly yields the normal storage, and so no copy is required. Only when the property is implemented as computed property does the coroutine use the getter-temporary-setter pattern.
Swift does have to use getters and setters directly for mutations of properties that it has to access through an ObjC-style interface, since that interface does not include this `_modify` coroutine.
In fact, this is not true. Swift performs these mutations by calling a coroutine that yields access to mutable storage. The mutation is done directly to that storage, and then the coroutine is resumed so it can clean up the access.
People sometimes describe this overall model as if the value is copied in and then copied back out, and I get why they think of it that way. If you want to understand the performance, though, you need to understand it in this somewhat more precise way.
Swift sometimes needs to access a stored property abstractly. For example, a public property of a type from a module built with library evolution (like of Apple's OS) generally can't be assumed to be a stored property by clients outside of the defining module. You might expect that this means that Swift implicitly generates a getter and setter for the property, which would mean that mutations of it would involve an implicit copy, as described above.
15:25 - Another major example of a context that always consumes a value is a return (or a throw). This includes a getter, of course, which is really just a function that returns a value. If you read a property that has a getter than just returns the value of a different property, there is necessarily a copy there. The optimizer may be able to eliminate that copy, of course, if it can do IPO with the getter.
17:10 - Swift supports passing computed storage as an inout argument, and this works by calling the getter, putting that value into temporary storage, doing the stored-storage ownership dance described here, and then passing the new value of the temporary back to the setter.
15:00 - In some ways this presentation of ownership is backwards from how you should usually think of it — you want to think semantically about how you're using values, not what particular contexts require. But this reverse thinking is necessary in order to understand when and why Swift inserts copies.
Essentially, the copies emerge as necessary in order to fulfill the high-level semantics without imposing a Rust-like requirement to always be explicit about ownership when writing code. You can certainly argue about whether this is the right thing to do! I think it is, but it's definitely a trade-off point in the design. Regardless, this is how it works.
Swift in practice always uses in-order layout for structs and tuples: the first stored property/element goes at offset 0, and then N+1 goes at offset (offset of N + size (not stride) of N + any alignment padding for N+1). This is guaranteed for tuples, but not for structs, where we reserve the right to play bit-packing tricks in the future.