Glenn Enright wrote:
On Monday 15 May 2006 9:58 am, Perry Lorier wrote:
Generally larger alignment is faster. Also as you
start getting closer
to the hardware you start getting stricter requirements for alignment.
Early DMA controllers for instance had to be "page" aligned (16 bytes
IIRC). I have no idea if this is still true today.
The downside of large alignment is that it uses more memory, and if you
end up having to touch that alignment padding then you end up wasting
resources. The trick is trading off this "Wasted" space, vs the speedup
you get from the alignment.
Right, pretty much matches what I was thinking. I will look at dma stuff next.
I should probably start doing some profiling to see what the objective
results are, rather than subjective ;-p. might look into ck series kernels to
Unfortunately most of my assembly language coding was done in the dark
ages in the 16 bit dos days. (Segmentation, 640k and TSR's oh my!), and
I've really not bothered about what's happening under the hood since then.
Goodness knows how APIC/IOMMU/ACPI work :)
My understanding is that P4 chips have quite bad
latency for some common io
instructions, which is where AMD makes ground. For example more here...
Intel chips have always been slow at various things, they've always just
done better with clock cycles. A memorable example of this was the LOOP
opcode on pentium chips. It was very slow on Intel machines, so
programs used it as a timing delay. On AMD chips it was extremely fast
(1 cycle IIRC), so those timing loops effectively became noops causing
lots of programs to fail in amusing and/or spectacular fashion.
I traced the movsl mask back to L1_CACHE_BYTES, which
What exactly are you trying to discover here? Are
you trying to figure
out how to deglitch some audio?
Not really, just using that as an example (although ac97 drivers do still have
a bad io related bug).
only one? Miracle!
I'm hacking round in i386 arch trying to learn a
more about how io is handled and increasing my knowledge of system
programming at the same time. Been attempting to absorb Intel docs on this.
Kinda hobby type thing.
Ah, I remember the good ol' days of doing this myself :) Although
documentation wasn't quite as free flowing as it is these days.
So far recent testing versions built created about 5%
decrease in core code size (subtle bugs aside) using gcc 3.4.6, just by
manually optimizing kernel code for a p4 2.6 (stepping 9) which I'm running.
Building with 'march=pentium4' has worked nicely so far.
Nice. I guess an important lesson here that when compiling a kernel,
compile it for your CPU, it'll be spiffier!
Also the MB (Abit IS7) appears to have really good IO
which fascinates me :). Learning what kernel devs do when things
break has been fun. I realise that newer 64bit offerings do many
things differently, but this is what I have to play with for now.
Yeah, I've not looked that closely at the 64bit stuff other than the
more, bigger, registers.
 I Realise that implicit in this statement I assume they do