Comparison of MPI performance on different clusters

Dear all,

to check if Channelflow 2.0 runs properly on your cluster, we might want to compare the MPI performances in this forum.
For this purpose, I suggest using the program “benchmark” in the “tools/” directory. Then, please provide similar information as this

@ EPFL cluster Fidis (1 node = 28 CPUs), on 4 nodes = 112 cores
tools/benchmark -nc -np0 8 -np1 14
reading a FlowField file with (Nx,Ny,Nz)=(1024,121,1024) gives
Average time/timeunit: 68.1176s

Looking forward to your numbers :slight_smile:

PS: just to make sure, it is very important that such benchmarks are done using a code compiled in “release” mode

By the way, Channelflow’s performance depends critically on the bandwidth of inter-node communication. The above performance holds for Infiniband FDR connectors between nodes. On Garcrux we have Infiniband EDR connectors, and get

@ EPFL cluster Garcrux (1 node = 28 CPUs), on 4 nodes = 112 cores, EDR IB
tools/benchmark -nc -np0 8 -np1 14
reading a FlowField file with (Nx,Ny,Nz)=(1024,121,1024) gives
Average time/timeunit: 45.2445s

In this case, EDR performance = 2/3 FDR performance


Hello Florian,

Thank you for creating this thread on the forum.

The benchmark routine ends up giving a CFL=nan when run with a random initial field of resolution Nx,Ny,Nz=1024,121,1024. Thus, tests were conducted by measuring the time in the regular simulateflow.cpp with the default conditions of the routine.

  1. For a variable “dt” (dt_avg ~ 0.012), the performance for resolution of Nx,Ny,Nz=1024,121,1024 on the cluster ADA @IDRIS, Paris, when run on 4 nodes ( 1 node = 32 cores ) = 128 cores, InfiniBand FDR10 Mellanox network (2 links per node) is Average Time / Time unit = 168.18 s.

  2. For a fixed dt=0.001, the same case gives the result of Average Time/time unit = 40.45 min

  3. For the cluster TURING @IDRIS and the same case, run on 64 nodes ( 1 node = 16 cores) = 1024, with a variable “dt” (dt_avg ~ 0.015) the performance is, Average Time / Time unit = 102.4 s