Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://freesoftwareextremist.com/objects/8b1c426d-09e0-4b01-b0b1-3e607a9e5509">翠星石 (suiseiseki@freesoftwareextremist.com)'s status on Thursday, 23-Nov-2023 22:22:36 JST</a><a href="https://freesoftwareextremist.com/users/Suiseiseki" title="suiseiseki@freesoftwareextremist.com"><img src="https://gnusocial.jp/avatar/789-48-20220724040913.webp" width="48" height="48" alt="翠星石" style="position: absolute; left: 0; top: 0;">翠星石</a><div><a href="https://shitposter.club/objects/823a531a-5251-4c46-997d-b5463ca4d0bd" rel="in-reply-to">in reply to</a><ul><li></ul></div></section><article><a href="https://shitposter.club/users/Moon">@Moon</a> Your wish is my command. &gt;your machine isn't a pdp-11 This is true simply because the CPU arch is different, you're not allowed to program the microcode (a free microcode was developed for the PDP-11 for example), it's not feasible to write anything but small functions in straight machine code like it could be done on the PDP-11 and machine code instructions are rarely executed directly on the metal anymore. For AMD64 CPUs, when machine code instructions are read by the instruction decoder, the instructions are usually converted into micro-ops instead of being executed directly and then those micro-ops are passed to the execution engine that does lots of dark magic like instruction reordering, simultaneous execution and speculative execution. As a result, even if you write in assembly, the actual instructions that get executed are very different (in most cases the behavior is the same, but not always - there are bugs in the execution engine for example). &gt;C isn't close to hardware C isn't close to the hardware simply because the language allows you to construct structures and mechanisms that have no equivalent machine code instructions - but the compiler is very good at generating all the instructions required to carry out your will. With gcc at high optimization levels, it's rare that the output looks anything but vaguely similar to the input - gcc's optimizer is so good that it detects when it's most performant to wrap or unwrap loops, add padding to make a function larger, but run faster, when to turn division or modulo into faster multiplication instructions, when to use the fastest equivalent out of addition, LEA or MUL instructions for multiplication, optimize alignment and much, much more. Pretty much, with C, gcc can do whatever it damn well pleases as long as the operations executed in the end are the same, or in the case of undefined behavior, it can do anything it wants, including making demons fly out of your nose. &gt;pointers and structs go directly to memory down to the bit level That isn't the case anymore sadly. Memory Management Units, address remapping, caching and the buffering of RAM I/O means that memory isn't controllable to the bit level anymore. For example, if the CPU detects that a chunk of memory is being used a lot, that memory may be loaded into the cache and further reads and writes will be mapped to those bytes in the cache - without the original memory address or pointer addresses changing on the programming side at all. The CPU at a later time may flush the cache and schedule for the cache contents to be written back to the main memory, replacing the old dirty memory (yes, there are a lot of very confusing bugs that result from reading stale cache or RAM). Memory sadly isn't bit, or even byte addressable anymore, with DRAM, as any reads and writes go into an I/O buffer and reads and writes are handled as blocks (although it's possible for a block to be sent that only changes one bit). SRAM is still bit addressable, but it's expensive to get a large amount of that, but depending on implementation, the interface may only address bytes. RAM I/O is handled by a memory controller built into the CPU and don't forget that DRAM is pretty much a bunch of leaky buckets and must be refreshed many times a second (16 is typical for DDR2 if I remember correctly), thankfully most memory controllers have a single command to trigger memory refresh. Additionally, any time DRAM is read. the charge in read cells are lost and such blocks are written back to avoid data loss. One of the reasons why C can be very fast is because pointers and structs map direct to the simulated address space and memory map with minimal overhead, so it's possible to arrange structs in alignment efficient ways, for loops in cache efficient ways and do pointer arithmetic to quickly make calculations, for example the length of data to copy; memcpy(sneed, feed, chuck-feed); would very efficiently map to approximately (ignoring caller-save to avoid complexity); mov [sneed],%rdi //0th argument mov [feed],%rsi //1st argument sub chuck,feed,%rdx //2nd argument call mempy Meanwhile, good luck turning a method call into 4 instructions in your typical OOP language. Pretty much, C maps very well to existing abstractions, that map to the hardware, so it runs very fast, despite no longer being close to the hardware. &gt;if you use a compiler like Tiny C Compiler without optimizations stuff maps close to cpu instructions Even TCC does some optimizations for divisions etc, meaning there is no direct mapping to CPU instructions, although the output is pretty similar. I could keep going, but it'll probably be better for me to answer specific wonderings.</article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/2368122#notice-4676068">In conversation</a><time datetime="2023-11-23T22:22:36+09:00" title="Thursday, 23-Nov-2023 22:22:36 JST">about a year ago</time> from <a href="https://freesoftwareextremist.com/objects/8b1c426d-09e0-4b01-b0b1-3e607a9e5509" rel="external" title="Sent from freesoftwareextremist.com via ActivityPub">freesoftwareextremist.com</a><a href="https://freesoftwareextremist.com/objects/8b1c426d-09e0-4b01-b0b1-3e607a9e5509">permalink</a></footer></blockquote>

Corresponding Notice

Embed this notice
翠星石 (suiseiseki@freesoftwareextremist.com)'s status on Thursday, 23-Nov-2023 22:22:36 JST翠星石
in reply to
- Sexy Moon
@Moon Your wish is my command.

>your machine isn't a pdp-11
This is true simply because the CPU arch is different, you're not allowed to program the microcode (a free microcode was developed for the PDP-11 for example), it's not feasible to write anything but small functions in straight machine code like it could be done on the PDP-11 and machine code instructions are rarely executed directly on the metal anymore.

For AMD64 CPUs, when machine code instructions are read by the instruction decoder, the instructions are usually converted into micro-ops instead of being executed directly and then those micro-ops are passed to the execution engine that does lots of dark magic like instruction reordering, simultaneous execution and speculative execution. As a result, even if you write in assembly, the actual instructions that get executed are very different (in most cases the behavior is the same, but not always - there are bugs in the execution engine for example).

>C isn't close to hardware
C isn't close to the hardware simply because the language allows you to construct structures and mechanisms that have no equivalent machine code instructions - but the compiler is very good at generating all the instructions required to carry out your will.

With gcc at high optimization levels, it's rare that the output looks anything but vaguely similar to the input - gcc's optimizer is so good that it detects when it's most performant to wrap or unwrap loops, add padding to make a function larger, but run faster, when to turn division or modulo into faster multiplication instructions, when to use the fastest equivalent out of addition, LEA or MUL instructions for multiplication, optimize alignment and much, much more.

Pretty much, with C, gcc can do whatever it damn well pleases as long as the operations executed in the end are the same, or in the case of undefined behavior, it can do anything it wants, including making demons fly out of your nose.

>pointers and structs go directly to memory down to the bit level
That isn't the case anymore sadly.

Memory Management Units, address remapping, caching and the buffering of RAM I/O means that memory isn't controllable to the bit level anymore.

For example, if the CPU detects that a chunk of memory is being used a lot, that memory may be loaded into the cache and further reads and writes will be mapped to those bytes in the cache - without the original memory address or pointer addresses changing on the programming side at all.

The CPU at a later time may flush the cache and schedule for the cache contents to be written back to the main memory, replacing the old dirty memory (yes, there are a lot of very confusing bugs that result from reading stale cache or RAM).

Memory sadly isn't bit, or even byte addressable anymore, with DRAM, as any reads and writes go into an I/O buffer and reads and writes are handled as blocks (although it's possible for a block to be sent that only changes one bit). SRAM is still bit addressable, but it's expensive to get a large amount of that, but depending on implementation, the interface may only address bytes.

RAM I/O is handled by a memory controller built into the CPU and don't forget that DRAM is pretty much a bunch of leaky buckets and must be refreshed many times a second (16 is typical for DDR2 if I remember correctly), thankfully most memory controllers have a single command to trigger memory refresh.

Additionally, any time DRAM is read. the charge in read cells are lost and such blocks are written back to avoid data loss.

One of the reasons why C can be very fast is because pointers and structs map direct to the simulated address space and memory map with minimal overhead, so it's possible to arrange structs in alignment efficient ways, for loops in cache efficient ways and do pointer arithmetic to quickly make calculations, for example the length of data to copy;
memcpy(sneed, feed, chuck-feed);
would very efficiently map to approximately (ignoring caller-save to avoid complexity);
mov [sneed],%rdi //0th argument
mov [feed],%rsi //1st argument
sub chuck,feed,%rdx //2nd argument
call mempy

Meanwhile, good luck turning a method call into 4 instructions in your typical OOP language.

Pretty much, C maps very well to existing abstractions, that map to the hardware, so it runs very fast, despite no longer being close to the hardware.

>if you use a compiler like Tiny C Compiler without optimizations stuff maps close to cpu instructions
Even TCC does some optimizations for divisions etc, meaning there is no direct mapping to CPU instructions, although the output is pretty similar.

I could keep going, but it'll probably be better for me to answer specific wonderings.
In conversationabout a year ago from freesoftwareextremist.compermalink

Public

Embed Notice

HTML Code

Corresponding Notice