top of page

Stage 3 Optimization(COMPUTER ARCHITECTURE ENDIANESS)

Seeing as how the compiler flags did not provide any optimization I will on to my next attempt which is converting big endian to small endian.

The aarch64 architecture uses the little endian byte ordering sequence. However, whirlpool keeps all of its data in big endian format so hopefully we can get some optimization here.


Architecture: aarch64

Byte Order: Little Endian


# DOCUMENTATION INSIDE Whirlpool

/*

* Though Whirlpool is endianness-neutral, the encryption tables are listed

* in BIG-ENDIAN format, which is adopted throughout this implementation

* (but little-endian notation would be equally suitable if consistently

* employed).

*/



I will do a quick run down of what Big and Little Endian is and what I hope to achieve through converting them.


When a computer is reading memory byte by byte it can be read one of two ways Little Endian or Big Endian, right to left or left to right.

In Big Endian format the biggest or most significant byte is read first versus in Little Endian format which as the least significant byte read first.

Some architectures are Big Endian and some are in Little Endian.

Just like how there are many languages in the world different architectures were developed in the past and as technology advanced and machines could communicate through networks it would cause 2 different types of architectures to communicate.

Please note bits are always read from right to left, this is universal across machines however when dealing with multiple bytes it can be interpreted differently.


If we have 4 bytes of data represented as follows:


Big Endian: 01 23 45 67

Little Endian: 67 45 23 01


So currently we have all our memory stored in Big Endian and we could potentially save the compiler some time by converting what we need to read in memory into Little Endian, as least this is the goal we will see how successful we are as I have never done this before.


********

Searching for algorithms to help perform the function of byte swapping I came across this one assembler function called BSWAP which is great and is exactly what I am looking for, only problem is that it is only for 32bit registers and c++ not c which is what my program is executed on.


uint32 cq_ntohl(uint32 a) {

__asm{

mov eax, a;

bswap eax;

}

}


**********


So continuing my search I have found this algorithm to swap bytes from big to little endian as follows:


int64_t swap_int64( int64_t val )

{

val = ((val << 8) & 0xFF00FF00FF00FF00ULL ) | ((val >> 8) & 0x00FF00FF00FF00FFULL );

val = ((val << 16) & 0xFFFF0000FFFF0000ULL ) | ((val >> 16) & 0x0000FFFF0000FFFFULL );

return (val << 32) | ((val >> 32) & 0xFFFFFFFFULL);

}


And i will implement this function into the code calling it early on to convert all the arrays which processBuffer is referencing using the c[$] arrays from big to little endian.

I am hoping this will create a performance boost.


**************Results:


So the execution time is down 5.35% percent however, the output is not the same as the original output so I cannot deem this as a successful optimization.


charlie medium original -> real 7m10.935s

charlie medium new -> real 6m47.322s


Archie big original -> real 11m40.245s

aarchie big new -> real 11m04.421s


OUTPUT:

ORIGINAL

[jadach@ccharlie src]$ ./whirlpoolsum ../../test/small.txt

5946053a0e99486fc2bd707169f5104d2cb9c1a84155f9e254e512d633571ef211a1d53ae8bf0113646c71ae29a0e8dc03076bd5fc65d96a37bad4457f8ebbe1 ../../test/small.txt


NEW

[jadach@ccharlie src]$ ./whirlpoolsum ../../test/small.txt

b462b7706b9d184c25327db1f0bd9adc388262664cf94517613621edcf4e957f13764bc137556b7f4c0896c8ecf31b314503727ddea4fc09200944059830f935 ../../test/small.txt


During testing there was only 1 user on the system(myself) and the system load was very low meaning almost all resources were available to my disposal.

*****************************

6 views0 comments

Recent Posts

See All

Closing Thoughts

For my final blog post I would like to discuss what I have learned and plan to utilize in the future from this course. So although I was not able to successfully improve my package to operate function

Stage 3 Optimization(Compiler Flags)

My first attempt to optimize the project will be to work with the compiler flag options. By default the compiler is set to compile in this manner "gcc -E -g -o2" The -E option represents preprocesses

Stage 3 Optimization: Hot Function(closer look part2)

The main function I will be working on is processBuffer static void processBuffer(struct NESSIEstruct * const structpointer) This function looks to be doing 2 main things to cipher the hash inside the

bottom of page