The exact claim is that replacing memcpy in a playback loop with an optimized version removes some audiblea artifact ("edge").
This is within the realm of the plausible.
The code shown contains not only a memcpy, but several Win32 event waits in the middle of the loop, which are suspicious. As a general remark, if you want to be streaming samples without any skips, you have to just shove bits into some pipe/device, without synchronizing with other threads.
@dan
I mean, the call:
WaitForSingleObject(hNeedDataEvent, INFINITE);
takes place several times in the loop iteration. You have to wonder: is that some auto-reset event that another thread has to be banging to to keep the playback going?
There is also a puzzling call to ZeroMemory in the code; why would you be doing that on data which isn't a secret that had just been decrypted.