top of page

SIMD , Inline Assembly Language, Vectors

Updated: Oct 12, 2018

SIMD stands for Single Instruction Multiple Data, this can be used by our C code to potentially optimize a program. For example if we are multiplying an array by a set number in a loop then the only value we are really changing is the input number, we can take advantage of a feature used in Assembly Language which lets us use a single instruction across multiple sets of data.


A register in an ARM CPU holds 128 bits, this can also be referred to as a quad. What we can do with this 128 bit quad is break it up into smaller 32 bit partitions known as Vectors and can represent this new value in this way ( V0.4s ) , where V indicates a vector and "s" indicates a lane which represents 32 bits, so what that value tells us is that there are "4" lanes inside that vector each representing 32 bits. The "s" can also be substituted for another letter which will symbolize a different value of bits for example ".2d" represents 64 bits, ".8h" represents 16 bits and ".16b" represents 8 bits.

By splitting the registers into partitions we can potentially perform calculations on multiple data sets at one time.


====================================

WRITING ASSEMBLER IN C

=-==================================


If we want to add a section of code written in assembler language to our C program we must declare it first as one of the two ways below:


ASM() // Frist way

__ASM__() // Second way


Next we need to specify the registers and how many we will be using for the input and output as well as any special registers we want to reserve. So we will have a format like shown below:


Value from the C program: Description of registers for output : Description of registers for input : Clobbers


Note: Each field is separated by a colon :


The Clobber section warns the C compiler of the registers we plan to use so it can try to leave them untouched. We can also pass in a "memory" parameter in the Clobber section which will also warn the C compiler that we are going to use memory and any data inside of memory before our code is executed might be different once our code is executed.



5 views0 comments

Recent Posts

See All

Closing Thoughts

For my final blog post I would like to discuss what I have learned and plan to utilize in the future from this course. So although I was not able to successfully improve my package to operate function

Stage 3 Optimization(COMPUTER ARCHITECTURE ENDIANESS)

Seeing as how the compiler flags did not provide any optimization I will on to my next attempt which is converting big endian to small endian. The aarch64 architecture uses the little endian byte orde

Stage 3 Optimization(Compiler Flags)

My first attempt to optimize the project will be to work with the compiler flag options. By default the compiler is set to compile in this manner "gcc -E -g -o2" The -E option represents preprocesses

bottom of page