Assembly – Hello World

Trying to learn some basic assembly code, I often couldn’t find good content that shows in a fast way the main idea behind the code. This segment is created to hopefully fill the hole.

Keep in mind I don’t write code in assembly language professionally, these are only code snippets that worked for me & are written only for educational purposes. This example worked on a x86 architecture, was build using the default assembler (as) and linker (ld) that comes with Linux [GAS Syntax].

First of all, try run the code below and check the output:

.data 
message:
    .ascii "Hello World\n" 
.text 
.global _start 
_start:
    mov $4, %eax 
    mov $1, %ebx 
    mov $message, %ecx 
    mov $12, %edx 
int $0x80
    mov $1, %eax 
    mov $0, %ebx 
int $0x80

file: hello.s

  1. Write the above code using a text editor and save it – in my example hello.s
  2. Start your terminal and go to the place where the file was created
  3. Assembly the code: user$ as hello.s -o hello.s
  4. Link the code: user$ ld hello.o -o hello
  5. Run the code & check the output: user$ ./hello

As a result, you should see the glorious output Hello World. Let’s now explain as simple as possible all the new things that we saw here.

About the code Structure

You will often see the structure:

.data
[...]
.text
.global .......
[...]

These lines of code (which start with a dot) describe mostly places of the current code. Every program that we execute is given some part of the memory. We divide this main section in more sub-sections, making it easier for our OS and the user itself, by placing code with similar purpose nearby. You can see our example in the image compared to the typical computer data memory arrangement:

assembly

Sections in code

There purpose is to define different contents of the program and so for our example: .data is the starting point of the section with data used later in the program. As you can see, it’s not so different from other languages. For example in C you would write something like:

int speed = 40;

[type] [name] = [value]; In assembly it looks only slightly different:

message:
.ascii "Hello World\n"

[name]:
.[type] [value]

Constants in as assembler

.text is the part where the “real” program starts. It’s the place where the executable executes. The last one is .global. It is a specific directive, which makes a label of a place global. It means, that this place in the program is available from the outside. You will almost always see this directive once written in a program. To give you a better feeling of this directive: Let’s say I will have a program written in C (a calculator for example) and want to use a sub-program written in assembly (just the addition of two values). I can do that, by linking the object files, but only if the function of adding – the label in assembly code – is flagged as global.

assembly-linking

The fun part

We have now our code and a basic idea about the structure. We know that we already have our message in the .data section and we need somehow to output it for the user. In the code you can clearly see that the main part of the magic happens under the global label _start. The label _start is again only a pointer to a place in the code. In an image above, you could see that the _start code is divided – not only optically – into two sections. These sections are responsible for calling two system calls.

The first one is WRITE and the second one – EXIT. Of course you don’t see any of the names above in the code. System calls are called in a more specific way. We have to give (in assembly language) which of the system calls should be called and with what parameters. For example when we give the information that we will run the WRITE function, we need also give the information where we want to write, how much and what exactly. All these information are given in the highlighted code above.

We have mostly one command, that repeats often – mov. As you can expect it is an abbreviation for move – it takes one thing and places it in another. In our example, we assign numbers (remember everything is binary), and place them in our 32-bit registers. You can read more about registers and the assembly syntax here.

The function calls

The WRITE function call takes us 5 lines of code:

  • mov $4, %eax
    In this part we are indicating which system call we want to call. A number 4 in the 32-bit register eax does that. In example, if we would insert the number 3 in this register, the system would prepare to execute the READ function. Finally we will call another function at the end – the EXIT function, which takes the number 1 in the register eax. The whole list of system calls on a Linux kernel using 32-bit registers can be found here.
  • mov $1, %ebx
    The first parameter we give to this function is the target for the output. We want to see it in our console after execution. ($1 for terminal output)
  • mov $message, %ecx
    In this line we give already the exact point, where our message for output is placed in memory. The ecx register expects us to give an address to the point, the ‘$’ sign before ‘message’ does exactly that.
  • mov $12, %edx
    In this part we give the system the information, how many characters we want to display.
  • int $0x80
    What do we need else? After giving the needed information for the function WRITE, we need only to signal the system that the registers are ready to go.

The EXIT function is executed in a similar way, we give the $1 to the eax register (indicating EXIT) and zero-out the ebx register. In more difficult examples and software it is important to keep in mind what we leave in the registers. It is only a very simple explanation of the basics, and I can assure you that many things are oversimplified, but I hope it will help just for the start 😉