Instruction processing starts with the Group Decode ROM, which classifies instructions: 1 byte implemented in logic, a prefix, 1+ byte using microcode, or 2 bytes+ (including ModR/M byte) using microcode. A circuit called the loader gets 1 or 2 bytes from the prefetch queue.
An instruction implemented in logic (e.g. Clear Carry) or a prefix is executed directly. Otherwise the microcode engine starts executing the micro-instructions that make up the machine instruction.