@SuperDicq
Thanks, I feel 30 IQ points dumber now.
I get, that the point was, that compiler cannot optimalize by pushing commands place in code, but I still didn't get 80% of that text.
@SuperDicq
Thanks, I feel 30 IQ points dumber now.
I get, that the point was, that compiler cannot optimalize by pushing commands place in code, but I still didn't get 80% of that text.
@LukeAlmighty @SuperDicq
The integer overflow I understand, if a + b is greater than MAX_INT, that's an issue.
Not sure how this affects prefetching, but then I'm not an expert.
The rest is Chinese to me.
@wolf480pl@mstdn.io @coded_artist@gameliberty.club @LukeAlmighty@gameliberty.club Thank you for the explanation, it actually makes a lot of sense. I feel like I understood most of what you said but I couldn't quite put it into words.
Also I recommend you use an instance that is not Mastodon next time to get around the hardcoded 500 character limit so everything can actually be just one post.
@coded_artist @LukeAlmighty @SuperDicq
Also, a modern compiler like a recent version of LLVM or GCC with optimizations enabled will see right through your arithmetic tricks, and turn them into a temporary variable (that is then held in a CPU register and never reaches main memory).
@coded_artist @LukeAlmighty @SuperDicq
Also, instruction-level parallelism (ILP) is the property of a compiled program which says how many of its instructions can be executed in parallel if you have an infinitely parallel CPU. If every instruction in your program uses the result from the instruction immediately before it, your ILP is 1, and you cannot take advantage of a CPU that can do multiple things at once.
8/
@coded_artist @LukeAlmighty @SuperDicq
Modern CPUs are also superscalar (which means they execute multiple instructions in parallel) and out-of-order (which means they can reorder instructions that don't depend on each others' results of side effects).
A serial dependency between two instructions means they have to be executed in the same order they appear in the code, because one depends on the other.
6/
@coded_artist @LukeAlmighty @SuperDicq
If you use arithmetic tricks to swap two values, anything that uses any of the two values in the future depends on those swap instructions, which in turn depend on both the instructions that calculated a, and the instructions that calculated b.
That may prevent CPU from doing things in parallel, or reordering instructions, which again, makes your code slow.
7/
@coded_artist @LukeAlmighty @SuperDicq
Branch misprediction is when it guesses incorrectly, and it costs a lot of time.
If a value used inside an if condition is known some time before the execution reaches the if, that might make branch prediction easier for a sufficiently smart CPU.
But when you obscure which value is which by using arithmetic to swap two variables, the CPU might not be able to see through that, resulting in frequent mispredictions, and making your code slow.
5/
@coded_artist @LukeAlmighty @SuperDicq
So they have to guess which side of an if is going to be executed, start processing instructions from that side of a branch, and if it later turns out they guessed wrong, they need to undo all of that and start again with the correct side of a branch.
This is called speculative execution.
Branch prediction is the part where the CPU guesses which way a branch is going to go.
4/
@coded_artist @LukeAlmighty @SuperDicq
The SSA form is a type of internal representation of a program inside a compiler, in which instructions do not have a destination operand. Instead, every instruction creates a new variable with its result, and other instructions can use that as their source operands, but they cannot modify that variable
This means there are lots of temporary variables, so LLVM must be very good at optimizing them out.
Which means adding a temp var in your code is cheap
2/
@coded_artist @LukeAlmighty @SuperDicq
ARM and x86 are CPU architectures, i.e. sets of instructions that a CPU can execute.
Most CPUs implementing those architectures have a pipeline - that is, they start processing the next instruction before they're done with the previous one, kinda like an assembly line.
This is great when they know what instruction is going to be executed next. But when there is a branch in the code (eg. an if statement) they don't know ahead of time.
3/
@coded_artist @LukeAlmighty @SuperDicq
LLVM is the reusable compiler mid-end - the programming-language-independent part that does optimizations. It is used eg. by clang, which is a C compiler built on top of LLVM.
LLVM is famous for popularizing the SSA form.
1/
@wolf480pl @LukeAlmighty @SuperDicq I would appreciate that, if you have the time.
@coded_artist @LukeAlmighty @SuperDicq
Up to "and creating serial dependencies" I understand everything (can explain if you want).
But the rest is "I know some of these words" for me.
@SuperDicq @LukeAlmighty @coded_artist
I actually kinda like splitting things into multiple posts, but if I had a larger character limit, I would split it less / in a more organized way
@wolf480pl@mstdn.io @LukeAlmighty@gameliberty.club @coded_artist@gameliberty.club The 500 character limit of Mastodon is very arbitrary. Other fedi software don't do this. There's no reason to. It isn't the 1980s. Our storage space isn't limited because by goddamn text.
GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.
All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.