We will be working with assembly language and registers in this weeks lab.
We were given 3 different hello world scripts written in "c" each one had a different variant of the printf command as follows:
--------------------------------------------------------------------------------------------
hello.c
/* Hello World in traditional C using printf() */
#include <stdio.h>
int main() {
printf("Hello World!\n");
}
--------------------------------------------------------------------------------------------
hello2.c
/* Hello World with a direct write to stdout (file descriptor 1) */
#include <unistd.h>
int main() {
write(1,"Hello World!\n",13);
}
--------------------------------------------------------------------------------------------
hello3.c
/* Hello World using a direct kernel system call to write to
file descriptor 1 (stdout) */
#include <unistd.h>
#include <sys/syscall.h>
int main() {
syscall(__NR_write,1,"Hello World!\n",13);
}
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
After compiling the code above for all programs i took a look at the disassembly language using objdump.
Inside the <main> section of the dump we can see the variances between the 3 programs
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
hello
00000000004004e6 <main>:
4004e6: 55 push %rbp
4004e7: 48 89 e5 mov %rsp,%rbp
4004ea: bf 80 05 40 00 mov $0x400580,%edi
4004ef: b8 00 00 00 00 mov $0x0,%eax
4004f4: e8 f7 fe ff ff callq 4003f0 <printf@plt>
4004f9: b8 00 00 00 00 mov $0x0,%eax
4004fe: 5d pop %rbp
4004ff: c3 retq
--------------------------------------------------------------------------------------------
hello2
00000000004004e6 <main>:
4004e6: 55 push %rbp
4004e7: 48 89 e5 mov %rsp,%rbp
4004ea: ba 0d 00 00 00 mov $0xd,%edx
4004ef: be 90 05 40 00 mov $0x400590,%esi
4004f4: bf 01 00 00 00 mov $0x1,%edi
4004f9: e8 f2 fe ff ff callq 4003f0 <write@plt>
4004fe: b8 00 00 00 00 mov $0x0,%eax
400503: 5d pop %rbp
400504: c3 retq
400505: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40050c: 00 00 00
40050f: 90 nop
--------------------------------------------------------------------------------------------
hello3
00000000004004e6 <main>:
4004e6: 55 push %rbp
4004e7: 48 89 e5 mov %rsp,%rbp
4004ea: b9 0d 00 00 00 mov $0xd,%ecx
4004ef: ba 90 05 40 00 mov $0x400590,%edx
4004f4: be 01 00 00 00 mov $0x1,%esi
4004f9: bf 01 00 00 00 mov $0x1,%edi
4004fe: b8 00 00 00 00 mov $0x0,%eax
400503: e8 e8 fe ff ff callq 4003f0 <syscall@plt>
400508: b8 00 00 00 00 mov $0x0,%eax
40050d: 5d pop %rbp
40050e: c3 retq
40050f: 90 nop
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
So right off the bat we can see that using the syscall and write instead of the printf function create more work for the cpu where it went from 8 to 12 instructions.
Now there is not much difference between the write and syscall other than different registers being used. The syscall command seems to invoke one more register than the write command in addition to setting the %eax register to 0 twice: once before syscall and once right after.
Moving on to the assembly part of the Lab we will compile the three assembly projects and compare them as well as their objdump -d files.
The 2 Assembly Language codes are written using the NASM and GAS format. GAS is the GNU assembler syntax and it works from left to right, while NASM moves from right to left and has a more similar syntax to the Aarch64 architecture which we will see later on.
Below is the code for each assembler syntax program
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
hello-gas.s
.text
.globl _start
_start:
movq $len,%rdx /* message length */
movq $msg,%rsi /* message location */
movq $1,%rdi /* file descriptor stdout */
movq $1,%rax /* syscall sys_write */
syscall
movq $0,%rdi /* exit status */
movq $60,%rax /* syscall sys_exit */
syscall
.section .rodata
msg: .ascii "Hello, world!\n"
len = . - msg
--------------------------------------------------------------------------------------------
hello-nasm.s
section .text
global _start
_start:
mov rdx,len ; message length
mov rcx,msg ; message location
mov rbx,1 ; file descriptor stdout
mov rax,4 ; syscall sys_write
int 0x80
mov rax,1 ; syscall sys_exit
int 0x80
section .rodata
msg db 'Hello, world!',0xa
len equ $ - msg
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
Next we will look at the output of the objdump -d file produced by the hello-gas and hello-nasm executables.
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
hello-gas: file format elf64-x86-64
Disassembly of section .text:
0000000000400078 <_start>:
400078: 48 c7 c2 0e 00 00 00 mov $0xe,%rdx // Buffer Length
40007f: 48 c7 c6 a6 00 40 00 mov $0x4000a6,%rsi // Address of buffer
400086: 48 c7 c7 01 00 00 00 mov $0x1,%rdi // File descriptor
40008d: 48 c7 c0 01 00 00 00 mov $0x1,%rax // ID call argument
400094: 0f 05 syscall // syscall invoke
400096: 48 c7 c7 00 00 00 00 mov $0x0,%rdi // prepare to exit
40009d: 48 c7 c0 3c 00 00 00 mov $0x3c,%rax
4000a4: 0f 05 syscall // exit with syscall
--------------------------------------------------------------------------------------------
hello-nasm: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: ba 0e 00 00 00 mov $0xe,%edx
400085: 48 b9 a4 00 40 00 00 movabs $0x4000a4,%rcx
40008c: 00 00 00
40008f: bb 01 00 00 00 mov $0x1,%ebx
400094: b8 04 00 00 00 mov $0x4,%eax
400099: cd 80 int $0x80
40009b: b8 01 00 00 00 mov $0x1,%eax
4000a0: cd 80 int $0x80
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
The two output files are very similar with minor differences between the registers being used. The NASM syntax uses the int command to invoke a syscall while the GAS syntax uses syscall.
Comments