@vladh You can run the software inside qemu and count instructions for highly comparable results.
But I think the already suggested holistic benchmarks are pretty good and likely more practical.
The problem of differences between hardware/environments(e.g room temps/ power supply efficiency) can be mitigated with an automatic benchmark that you can run as a reference point for your own hardware.
For me the idea seems to correlate with performance work in general.