Performance Testing
************************************************************
The program executable ./whirlpoolsum takes one parameter file and converts it into a hash.
where will i be testing ( on servers ) : Aarchie, Ccharlie
OS: Redhat, Fedora, v28
What CPU Architecture will i be testing on : AArch64
What I will be testing with ( 3 files each with a different size ) :
Big 22GB
Medium 11GB
Small 19MB
note: Each file is filled with the same repetitive phrase "hello world\n"
********** Baseline Results **********
Aarchie:
SYSTEM LOAD ( load average: 0.11, 0.08, 0.09 )
Big.txt file 22GB,
* 3 consecutive runs
Average Time: 9m59.109s
SYSTEM LOAD ( load average: 0.12, 0.07, 0.09 )
Medium.txt file 11GB,
* 3 consecutive runs
Average Time: 5m8.530s
SYSTEM LOAD ( load average: 0.00, 0.00, 0.00 )
Small.txt file 19MB,
* 3 consecutive runs
Average Time: real 0m0.540s
Ccharlie:
SYSTEM LOAD ( load average: 0.02, 0.01, 0.04 )
Big.txt file 22GB,
* 3 consecutive runs
Average Time: 14m57.121s
SYSTEM LOAD (load average: 0.00, 0.02, 0.00 )
Medium.txt file 11GB,
* 3 consecutive runs
Average Time: 7m7.179s
SYSTEM LOAD (load average: 1.04, 1.01, 1.31 )
Small.txt file 19MB,
* 3 consecutive runs
Average Time: 0m0.728s
******************** HOT FUNCTIONS ********************
Using PERF record we can see the hot function as:
84.91% lt-whirlpoolsum libwhirlpool.so.0.0.1 [.] processBuffer ◆
15.09% lt-whirlpoolsum libwhirlpool.so.0.0.1 [.] whirlpool_add
processBuffer is definitely the hot function here, the only reason whirlpool_add is on the list is because it calls processBuffer during its function.
Inside of processBuffer's assembly code I found that the hottest instruction is an fmov -> Floating-point Move (register)
6.15% │ fmov x1, d1
In order to provide any type of performance i will need to try to optimize this function.
There was no assembler code inline or in files so that remains an options as well as vectorization and compiler flag options. My plan of action will be to attack the compiler options first, followed by editing the existing code seeing if there is anything I can remove or edit, vectorization and lastly assembler.
Комментарии