Assignment: stage 3 - optimization

Hello everyone and welcome to my blog!

During this week we are moving to the third stage of our project. We are going to dig more deeply into the program and perform so called "optimization" of the program. If you haven't yet checked the posts for project stage 1 and 2, I would highly recommend to first have a look on those.

Links are here: https://alexspo600.blogspot.com/2020/04/assignment-stage-1-building-and.html, https://alexspo600.blogspot.com/2020/04/assignment-stage-2-profiling.html

The requirements for this stage are as following:
  1. Examine the code of the functions and methods that are taking the most CPU time.
  2. Identify possible approaches to improving the performance of the software. Which changes to data types, software organization, algorithm decisions, or other aspects of the software could improve performance?
  3. Defend your approaches either by testing them or by building a case to support your approach(es).
  4. Report on your results.So as I mentioned in my previous post, for testing purposes I'm using Manjaro OS on Aarch64 64 bit and Ubuntu OS on x86 64 bit. Also, I continue to use the same file with the same password credentials for testing, as I used in stage 1 and 2.

So as I mentioned in my previous post, for testing purposes I'm using Manjaro OS on Aarch64 64 bit and Ubuntu OS on x86 64 bit. Also, I continue to use the same file with the same password credentials for testing, as I used in stage 1. The file containing I'm using for testing of software is called crack.txt and it contains the "key:value" pair of the username:hash of the password. In my system the password was hashed using SHA algorithm and hence when the password cracker runs, the program uses function for cracking SHA type of hash. Therefore, I decided to look deeper into SHA512_STEP, which is located in SIMDSHA512body. SHA512_STEP as I understand is responsible for decryption of SHA type hash. Secure Hash Algorithms, also known as SHA, are a family of cryptographic functions designed to keep data secured. It works by transforming the data using a hash function: an algorithm that consists of bitwise operations, modular additions, and compression functions.

As our instructor, Chris Tyler, told us in the lecture there are different approaches and ways optimization can be achieved. 

Portable Optimization - works on multiple architectures.
  1. Change the algorithm - use a better sorting/searching algorithm etc.
  2. Change the implementation - use a better data structure, data type etc.
  3. Change the build - change build options and/or code structure to enable optimization such as autovectorization. 
Platform Specific Optimization - works on a specific architecture.  
  1. Build-time Code Selection - use preprocessor directives to select an architecture-specific optimization. 
  2. Run-time Code Selection - use processor feature flags to select implementation-specific optimization. 
Some background info about our software

John the Ripper is a free password cracking software tool. It's pretty old. Originally developed for the Unix operating system, it can run on fifteen different platforms (eleven of which are architecture-specific versions of Unix, DOS, Win32, BeOS, and OpenVMS). It is among the most frequently used password testing and breaking programs as it combines a number of password crackers into one package, autodetects password hash types, and includes a customizable cracker. It can be run against various encrypted password formats including several crypt password hash types most commonly found on various Unix versions (based on DES, MD5, or Blowfish), Kerberos AFS, and Windows NT/2000/XP/2003 LM hash. 

Taking all the information into account, that the software is pretty old,  I feel like this particular tool it's already optimized pretty well, and uses the best algorithms and data structures and so on possible. Hence, as I understand, it will be very hard to find a way to optimize software using portable optimization approach. I believe that unfortunately it is not possible to get all of the performance of this software out of my platform. Hence, I'm going to look into what platform specific approaches are available and what preprocessing methods could be applied for my project.

As I stated in my previous article, the program spent the most of its time on SIMDSHA512body function - 95% of it's time on Manjaro, and 90% on Ubuntu. After doing some research and digging through John the Ripper source files I found the file that contains code for SIMDSHA512body and it makes a lot of calls to SHA512_STEP, which asThis code can be found here which is simd-intrinsics.c under src folder.

Here is the part of assembly code that we can see has the highest % of time spent, after I annotated SIMDSHA512body function. I can see in the list of assembly instructions a lot of calls to SHA512_STEP.

           |         SHA512_STEP(a, b, c, d, e, f, g, h, 16, 0xe49b69c19ef14ad2ULL);
  0.27  |               mov        w13, #0x0                       // #0  -      Move register to register.
           | 874        cbz          w13, 2720                                -      Compare and Branch on Zero
Suggestions:

1) While performing benchmarking and profiling, I noticed the increase in performance and positive effect of application of certain compiler feature flags, in particular, -O3 option, which is responsible for optimization more for code size and execution time. In addition, this flag also conducts full optimization as in -O2; also uses more aggressive automatic inlining of subprograms within a unit (Inlining of Subprograms) and attempts to vectorize loops.

2) While searching through the list of platform dependant flags that are available on my aarch64 platform, I noticed that there were no flags defined in our software that would determine which architecture it's working on. So another suggestion would be to define architecture specific flags, write architecture specific code, that would determine which architecture we are currently on and perform work differently.

3) While investigating John's Makefile I noticed that build is also not platform specific. In particular, for example multiple source code files could be used in order to choose platform specific code to execute. As I run a command to find assembly files I saw the following list of files:

[alexa@bumblebee run]$ find ../../ -name '*.[Ss]'
../../JohnTheRipper-bleeding-jumbo/src/x86.S
../../JohnTheRipper-bleeding-jumbo/src/aes/aesni/asm/x64/do_rdtsc.s
../../JohnTheRipper-bleeding-jumbo/src/aes/aesni/asm/x64/iaesx64.s
../../JohnTheRipper-bleeding-jumbo/src/aes/aesni/asm/x86/do_rdtsc.s
../../JohnTheRipper-bleeding-jumbo/src/aes/aesni/asm/x86/iaesx86.s
../../JohnTheRipper-bleeding-jumbo/src/x86-mmx.S
../../JohnTheRipper-bleeding-jumbo/src/alpha.S
../../JohnTheRipper-bleeding-jumbo/src/x86-64.S
../../JohnTheRipper-bleeding-jumbo/src/x86-sse.S

We can see that there folders and files with name x86, hence I assume that either there is just no option to choose my platform or there is no option to choose any platform. Anyways, I think it would be beneficial if we introduce such ability.

Conclusions:

To sum everything up, I feel that it was a great learning opportunity and pleasant experience I had playing with John the Ripper and both my virtual Ubuntu machine and Raspberry Pi4 with Manjaro. I learned how to look on the software from a different perspective. I developed thinking from the performance point of view. I gained some valuable experience of building and configuring Raspberry Pi 4. Also, I learned about different optimization approaches, which ones should be applied in different cases, and how they could be implemented.



References:

https://brilliant.org/wiki/secure-hashing-algorithms/

https://github.com/magnumripper/JohnTheRipper/tree/bleeding-jumbo/src

https://wiki.cdot.senecacollege.ca/wiki/Winter_2020_SPO600_Project

Comments