Most people think of machine instructions as the lowest level of software, but many processors contain microcode. An instruction is broken down into micro-instructions. The 8086 has 21-bit micro-instructions; each executes a register move (source→dest) and an action in parallel.
A string instruction performs a loop up to 64K times. Because the loop happens inside the processor, it's faster than writing the loop in assembly. The SI register points to the source, the DI register points to the dest, and the CX register counts. The details are complicated.