IMHO this goes deep into algorithm engineering. And likely depends highly on your architecture and even the function you are applying.
E.g. for sorting algorithms, quicksort performs badly if the size of your data exceeds the CPU cache (and then again if it exceeds a memory page etc.).
This is because quicksort jumps around in memory a lot.
Merge sort on the other hand work mostly linear, so it performs better independent of the concrete numbers for cache and mem sizes.
That said: Depending on the kind of calculation you are doing, it might be easiest to just test the performance of the code with different numbers of threads.
But I am curious about other answers :blobcatgiggle:
@wakame In this case I'm just modifying each cell, not even moving data. I was trying to figure in how many threads I should split the job in. I've tried all sorts of things, and for some reason, a bit more more threads than the number of cores my laptop has, is faster than exactly the number that I have. So I thought maybe there was some logic to it.