Assignment: stage 2 - profiling

Hello everyone and welcome to my blog!

During this week we are moving to the second stage of our project. We are going to dig more deeply into the program and perform so called "profiling" of the program. The requirements for this stage are as following:
  1. Profile your software to identify where the software is spending its execution time and the amount of memory that the software is using. Perform this profiling using the test case you created in Stage 1, on both x86_64 and AArch64 platforms. Do not compare the absolute performance on different platforms, but do compare the relative performance on a givel platform.
  2. Identify the functions or methods that are taking the majority of the CPU time.
  3. Report on your results.
So as I mentioned in my previous post, for testing purposes I'm using Manjaro OS on Aarch64 64 bit and Ubuntu OS on x86 64 bit. Also, I continue to use the same file with the same password credentials for testing, as I used in stage 1.

As far as I know(more as our instructor told us during the lecture), there are 2 ways of profiling. 

1 Option: Sampling

We can interrupt the execution of the program very frequently, sample it. 

2 Option: Instrumentation

We can add some extra code to he software so that function calls, method calls are recorded.

There are two tools we are going to use: gprof and perf.

Gprof is using instrumentation as well as sampling. It requires to make some changes to the soft and build a special version of the program, but on the other hand it gives us the whole picture of what's going on.

Perf is using only sampling. For using perf, we don't need to make any changes to the program, so it's easier to use, but on the other hand we might miss something.

So first, I navigated into src folder in the software folder, then I removed the previous build with make clean command. Next, I run ./configure CFLAGS="-g Og -pg" to change configurations of the build, make executable with enabled following options: -g for debugging, set optimization level of -Og and -pg, which stands for profile generation and it will add instrumentation to the code. Next step, we create executable with make -j16 to enable 16jobs to run. Now we can actually run the program.

*Remember to remove john.pot (the result of previous run)
rm john.pot
./john crack.txt

I've tried to set up everything for profiling with gprof. But unfortunately, I didn't manage to make it work. Even though, I configured with the correct following options:  ./configure CFLAGS="-g Og -pg", ./configure CFLAGS="-g O0", ./configure CFLAGS="-g O3" but unfortunately when I would run and test the software it would not generate gmon.out. Therefore, I decided to proceed with tool - perf.

  • First step configure build with -Og option.
  • Second step run perf stat -d ./john to get the detailed stats and obtain event counts.
  • Third step run perf record -g ./john, and then perf report to record events for later reporting and break down events by process, function, etc. You might want to run these commands with sudo.
So let's go!

Ubuntu x86 64-bit machine

-Og compiler option

perf stat -d ./john 

osakhnatska@ubuntu:~/Downloads/JohnTheRipper-bleeding-jumbo/run$ sudo perf stat -d ./john crack.txt

Using default input encoding: UTF-8
Loaded 1 password hash (sha512crypt, crypt(3) $6$ [SHA512 256/256 AVX2 4x])
No password hashes left to crack (see FAQ)

 Performance counter stats for './john crack.txt':

            139.29 msec task-clock                #    0.995 CPUs utilized       
                 3      context-switches          #    0.022 K/sec               
                 0      cpu-migrations            #    0.000 K/sec               
            43,273      page-faults               #    0.311 M/sec               
   <not supported>      cycles                                                   
   <not supported>      instructions                                             
   <not supported>      branches                                                 
   <not supported>      branch-misses                                             
                 0      L1-dcache-loads           #    0.000 K/sec               
                 0      L1-dcache-load-misses     #    0.00% of all L1-dcache hits
   <not supported>      LLC-loads                                                 
   <not supported>      LLC-load-misses                                           

       0.139991867 seconds time elapsed
       0.019838000 seconds user
       0.119031000 seconds sys

perf record -g ./john
perf report



Manjaro aarch64 64 bit machine
-Og compiler option
perf stat -d ./john 

Performance counter stats for './john crack.txt':

            408.36 msec task-clock:u              #    0.994 CPUs utilized       
                 0      context-switches:u        #    0.000 K/sec               
                 0      cpu-migrations:u          #    0.000 K/sec               
             43268      page-faults:u             #    0.106 M/sec               
         210420993      cycles:u                  #    0.515 GHz                 
         250117119      instructions:u            #    1.19  insn per cycle       
   <not supported>      branches:u                                               
            439753      branch-misses:u                                           
          77913075      L1-dcache-loads:u         #  190.795 M/sec               
           1056762      L1-dcache-load-misses:u   #    1.36% of all L1-dcache hits
   <not supported>      LLC-loads:u                                               
   <not supported>      LLC-load-misses:u                                         

       0.410771238 seconds time elapsed
       0.148565000 seconds user
       0.260993000 seconds sys

perf record -g ./john
perf report


Conclusions

Above we can see screenshot of the report generated by perf instrument. The images show which are the most highly used functions of code in the program and percentage of time spent in those parts of code and which file they are a part of. What immediately caught my eye when I looked at the results of profiling are the following things.

First of all, timings for user, sys and time elapsed were significantly slower on Manjaro then on Ubuntu. Moreover, if we pay more attention to report for Manjaro and Ubuntu, we can see some interesting information. Most of the time was spent in SIMDSHA512body. On Ubuntu program spent around 90% of it's time in this function, on Manjaro program spent around 95% in this function. 

When we annotate SIMDSHA512body in report on Ubuntu we can see that the process that took the most of time was the following:

0.63 │        mov        %esi,%ecx
- this function happens immediately when the SIMDSHA512body starts

When we annotate SIMDSHA512body in report on Ubuntu we can see that the process that took the most of the time was the following:

           |    SHA512_STEP(a, b, c, d, e, f, g, h, 16, 0xe49b69c19ef14ad2ULL);
  0.27  |        mov        w13, #0x0                       // #0               

To summarize everything app, I feel that John the Ripper password cracker is an enormous intensive software. Even with my test case, which is very simple and has just 6 letters in it - "alexas", program runs for about 2 mins on Ubuntu and around 4 on Manjaro. When I annotated the reports for both Ubuntu and Manjaro, I saw there enormous amount of code and all functions would have about 0.01-0.2, 0.5 tops. However, when I would compare some of the things I would notice that same functions on Manjaro take longer, even though if the difference is just around ~0.1-0.2.

I run program several times, and each time performance report would show this line. Also in general, I can see that program makes a lot of calls to this function SHA512_STEP, which as I understand is responsible for decryption of SHA type hash. Secure Hash Algorithms, also known as SHA, are a family of cryptographic functions designed to keep data secured. It works by transforming the data using a hash function: an algorithm that consists of bitwise operations, modular additions, and compression functions. As for the stage 3, I will probably look deeper into the function SHA512_STEP, as it seems to be most intensive for Manjaro.

Comments