New blog post: "Why is Rosetta 2 fast?"
https://dougallj.wordpress.com/2022/11/09/why-is-rosetta-2-fast/
New blog post: "Why is Rosetta 2 fast?" https://dougallj.wordpress.com/2022/11/09/why-is-rosetta-2-fast/ 6 comments
@dougall I wonder how hard it is to do proper inter-instruction optimization while at the same time retain enough bookkeeping so that you can still do jumps and interrupts. @dougall Could you expand a bit on this, from your Rosetta 2 post? "The Apple M1 has an undocumented extension that, when enabled, ensures instructions like ADDS, SUBS and CMP compute PF and AF and store them as bits 26 and 27 of NZCV respectively, providing accurate emulation with no performance penalty." I see PF has to do with data parity and AF is sometimes used with writes to devices (serial port, etc.) -- but I'm not capturing what you're conveying here. Thanks. @dougall I see you mention Windows on ARM's emulator, did you look at another high performance emulator like FEX ? |
The Rosetta 2 instruction size expansion factor for an sqlite3 binary is ~1.64x (1.05MB of x86 instructions vs 1.72MB of ARM instructions). Surprisingly good, especially given Firestorm cores have six-times the instruction cache of Ice Lake (192KiB vs 32KiB).
Something I'm not sure I said is that the goal is to have a single, equivalent ARM instruction for each x86 instruction. And, in real-world code, combining all those tricks allows Rosetta 2 to achieve that surprisingly often.