Odd results with benchmark times

I have the code installed and running and wanted to get a sense of the speed with a time
benchmark. I am running a flow with Nx = 96, Ny = 129, and Nz = 80. I started to get some
odd results so I put together an experiment and am reporting the results.
For both runs, I set dt = dtmin = dtmax = 0.00025 so there is no adjustment fluctuation in step size.
Run 1 uses T = 1.0 and dT = 0.1 so it outputs 11 flows (0.0, 0.1, … 1.0) This run took approximately
30 minutes but the last flow field was output 3 minutes before the code finished.
Run 2 also uses T = 1.0 and the same input file but writes out at dT = 0.5 it outputs 3 flows (0, .5, 1.0)
This run took approximately 40 minutes and the last output was about 13 minutes before the code
finishes.
Flow 1 seemed to take about 2.7 minutes to do each set of 400 steps but needed about another 2.7
minutes after it wrote out its last file.
Flow 2 seemed to take about 13.5 minutes to do each set of 2000 steps but needed another 13.5
minutes after it wrote out the last file to finish.

Have other people seen this phenomena or have an explanation for it?

Thank You

A second question I have is when I type the letter “w” in another terminal while the code is running
I see the load average is about 1 for the all three periods. I installed without using MPI but I
thought I was using the multi-threaded FFTW. I have a third generation I7-3820 with 4 cores
and 8 siblings. Should I expect the “w” command to give me something noticeably larger than
1 on my machine if it is properly installed?

Again Thank You

Andy