Amdahl law and scale experiments

不论是在论文的experiments的部分还是说各种parallel algorithm的性能分析部分, scale efficiency 始终是一个重要的property。Amdahl law is its fundations.

Weak scaling

We increase the problem size proportionally to the number of processors, so the work/processor stays the same. For example, we assume the work/processor is 50, which means every processor should process four tasks, when test the scaling ability, you could chose following combination of (threadnum, tasknum):(1,50),(2,100),(3,150),(4,200). The key idea behind the weak scale is (work/processnum is a constant value) The ideal line for the weak scaling line is a horizental line.

Strong scale

We keep the problem size constant but increase the number of processors. The idea is that with the increase of used resources, how fast the execution time can improve. For example, the task number is 500, and we increase the thread number from 1, 2 , 4 , 8 to 16, to figure out the improvements of the performance. The key idea is that the fixed value of the task size.

Amdahl law

The amdahl law is more focused on the strong scale case, the x axis is the number of resources or threads used for solving the problem, the y axis is the speed up.

Memorizing how amdal law comes from might be a good way to understand it instead of just memorizing the final equaltion.

Assuming competing a workload on serial machine take time T. In this computing procedure, p (belong to 0 to 1) of them can be benifits by parallel computing, the T = (1-p)T + pT, there is a factor s that indicate the speed up of the parallel computing. T(s) = (1-p)T + (p/s)T Then using T divided by the T(s), we can get the spped up of the program based on parallel case. We can get T/T(s) = 1/[(1-p)+p/s]. In perfect case, when speed up s is infinity, the speed up of whole program is 1/(1-p) which is limited by the serial portion of the program. This is the upperlimit of improving the execution time in a parallel case.

Differnet explanation about what is fixed (for the data IO service)

Yeah that’s a kind of weak scaling too, sort of, but since we don’t care about I/O times, what really matters is the data size, so we should have had an amount of data proportional to the the number of servers

Meaning behind the scale experiments

idead scale is linear number, for example, when we process number increase from 1 to 2, the task number we could solved increase from 100 to 200. But in real case, there is overhead for increasing of the threads, parallel thread will also increase the overhead for communication between threads.

For the following parts, the contexts for linear weak scaling and strong scaling is discussed.

In the case of weak scaling, linear scaling is achieved if the run time stays constant while the workload is increased in direct proportion to the number of processors. Most programs running in this mode (attention to this assumption) should scale well to larger core counts as they typically employ nearest-neighbour communication patterns where the communication overhead is relatively constant regardless of the number of processes used; exceptions include algorithms that employ heavy use of global communication patterns, eg. FFTs and transposes.

In general, it is harder to achieve good strong-scaling at larger process counts since the communication overhead for many/most algorithms increases in proportion to the number of processes used.