Recommendations for -np0 and -np1

I’m running channelflow on a cluster with 28 compute nodes. Each node has two 12-core Xeon E5-2680 CPUS.

I envision two scales of channelflow computations:

  • small, running on one node with 24 cores, 128 x 129 x 128 discretization
  • large, running on, say, sixteen nodes with 384 cores, 1024 x 129 x 1024 discretization

What are decent starting point for values of -np0 and -np1 for these two simulations?

Hello John,

there is only one hard requirement for the MPI distribution which is mod(Nx,np0)=0. The reason is the way the FFTW transform plans are set up. If you do not respect this, you will see a runtime error. In your case, any power of 2 will work for np0.

I typically use a distribution which is close to equally distributed because makes the data chunks most likely to be the same on each process (“pencil distribution”). In your case, this would be (np0,np1)=(4,6) for small, and (np0,np1)=(16,24) for large. However, it might be worth to test if a “slab distribution” is faster in your case. This means that only one dimension is distributed, let’s say (np0,np1)=(1,384) for large. The advantage is that you save cost for communication. The disadvantage is that it is more likely to distribute the data unequally over the tasks. np1 divides the x-dimension in physical space and the z-dimension in spectral. If your numerical domain is very long but narrow, a “slab distribution” is probably a bad idea but your domain has equal sides and it might be more performant.

1 Like

On the computer and system that we use (IBM x3750) we find that (np0,np1)=(1,np) is optimal for our (Lx,Lz)=(10,40) domains, at least up to np=64.

Laurette: I experience the same on my Intel Xeon CPU E5-2680 cluster. (np0, np1) = (1,np) is optimal for all np I’ve tested, up to 64.