there is only one hard requirement for the MPI distribution which is
mod(Nx,np0)=0. The reason is the way the FFTW transform plans are set up. If you do not respect this, you will see a runtime error. In your case, any power of 2 will work for
I typically use a distribution which is close to equally distributed because makes the data chunks most likely to be the same on each process (“pencil distribution”). In your case, this would be
(np0,np1)=(4,6) for small, and
(np0,np1)=(16,24) for large. However, it might be worth to test if a “slab distribution” is faster in your case. This means that only one dimension is distributed, let’s say
(np0,np1)=(1,384) for large. The advantage is that you save cost for communication. The disadvantage is that it is more likely to distribute the data unequally over the tasks. np1 divides the x-dimension in physical space and the z-dimension in spectral. If your numerical domain is very long but narrow, a “slab distribution” is probably a bad idea but your domain has equal sides and it might be more performant.