Appendix D
Embedding Assembly Code in a C Function

The gcc C compiler has an extension to standard C that allows a programmer to write assembly language instructions within a C function. Of course, you need to be very careful when doing this because you do not know how the compiler has allocated memory and/or registers for the variables. Yes, you can use the “-S” option to see what the compiler did, but if anybody make one change to the function, even compiling it with a different version of gcc, things almost certainly will have changed.

The way to do this is covered in the info pages for gcc. In my version (4.1.2) I found it by going to “C Extensions,” then “Extended Asm.” (No, it’s not obvious to me, either.) The presentation here is a very brief introduction.

The overall format is a C statement of the form:

asm("assembly_language_instruction" : output(s) : inputs(s));

The output operands are destinations for the assembly_language_instruction, and the input operands are sources. Each operand is of the form

"operand_constraint" : C_expression

where the operand_constraint describes what type of register, memory location, etc. should be used for the operand, and C_expression is a C expression, often just a variable name. If there is more than one operand, they are separated by commas.

The assembly_language_instruction can refer to each operand numerically with the “%n” syntax, starting with n = 0 for the first operand, 1 for the second, etc.

For example, let us consider a case where we wish to add two 32-bit integers. (Yes, there is a C operation to do this, but it is generally better to start with simple examples.) The program is shown in Listing D.1.

 
1/* 
2 * embedAsm1.c 
3 * Very simple example of how to embed assembly language 
4 * in a C function. 
5 * Bob Plantz - 18 June 2009 
6 */ 
7 
8#include <stdio.h> 
9 
10int main() 
11{ 
12    int x, y; 
13 
14    printf("Enter an integer: "); 
15    scanf("%i", &x); 
16    printf("Enter another integer: "); 
17    scanf("%i", &y); 
18    asm("addl %1, %0" : "=m" (x) : "r" (y)); 
19    printf("There sum is %i\n", x); 
20 
21    return 0; 
22}
Listing D.1: Embedding an assembly language instruction in a C function (C).

There is only one output (destination), and its operand constraint is "=m". The ‘=’ sign is required to show that it is an output. The ‘m’ character shows that this operand is located in memory.

Now, recall that the addl instruction requires that at least one of its operands be a register. So we specify the input operand as a register with the "r" operand constraint. We have to do this for the assembly language instruction even though the C code does not specify whether the variable, y, is in memory or in a register.

The operand constraints are described in the info pages for gcc. In my version (4.7.0) I found it by going to “C Extensions,” then “Constraints.” The documentation covers all the architectures supported by gcc, so it is difficult to wade through.

Listing D.2 shows the assembly language actually generated by the compiler.

 
1        .file  "embedAsm1.c" 
2        .section      .rodata 
3.LC0: 
4        .string"Enter an integer: " 
5.LC1: 
6        .string"%i" 
7.LC2: 
8        .string"Enter another integer: " 
9.LC3: 
10        .string"There sum is %i\n" 
11        .text 
12        .globlmain 
13        .type  main, @function 
14main: 
15        pushq  %rbp 
16        movq  %rsp, %rbp 
17        subq  $16, %rsp 
18        movl  $.LC0, %edi 
19        movl  $0, %eax 
20        call  printf 
21        leaq  -8(%rbp), %rax 
22        movq  %rax, %rsi 
23        movl  $.LC1, %edi 
24        movl  $0, %eax 
25        call  __isoc99_scanf 
26        movl  $.LC2, %edi 
27        movl  $0, %eax 
28        call  printf 
29        leaq  -4(%rbp), %rax 
30        movq  %rax, %rsi 
31        movl  $.LC1, %edi 
32        movl  $0, %eax 
33        call  __isoc99_scanf 
34        movl  -4(%rbp), %eax 
35#APP 
36# 18 "embedAsm1.c" 1 
37        addl %eax, -8(%rbp) 
38# 0 "" 2 
39#NO_APP 
40        movl  -8(%rbp), %eax 
41        movl  %eax, %esi 
42        movl  $.LC3, %edi 
43        movl  $0, %eax 
44        call  printf 
45        movl  $0, %eax 
46        leave 
47        ret 
48        .size  main, .-main 
49        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0" 
50        .section      .note.GNU-stack,"",@progbits
Listing D.2: Embedding an assembly language instruction in a C function gcc assembly language.

In fact, the compiler did allocate y in memory, at -4(%rbp). It had to do that because scanf needs an address when reading a value from the keyboard.

The embedded assembly language is between the #APP and #NO_APP comments on lines 35 and 39, respectively.

34        movl  -4(%rbp), %eax 
35#APP 
36# 18 "embedAsm1.c" 1 
37        addl %eax, -8(%rbp) 
38# 0 "" 2 
39#NO_APP

The movl instruction on line 34 loads y into a register so that the addl instruction on line 37 can add the value to a memory location (x). Of course, it would have had to do that even if we had used a C statement for the addition instead of embedding an assembly language instruction.

There may be situations where you need to use a specific register for a variable. Listing D.3 shows how to do this.

 
1/* 
2 * embed_asm2.c 
3 * Shows two assembly language instructions embedded 
4 * in a C function. 
5 * Bob Plantz - 18 June 2009 
6 */ 
7 
8#include <stdio.h> 
9 
10int main() 
11{ 
12    int x, y; 
13    register int z asm("edx"); 
14 
15    printf("Enter an integer: "); 
16    scanf("%i", &x); 
17    printf("Enter another integer: "); 
18    scanf("%i", &y); 
19    asm("movl %1, %0\n\taddl %2, %0\n\tsall $4, %0" : "=r" (z) : "m" (x), "m" (y)); 
20    printf("Sixteen times there sum is %i\n", z); 
21 
22    return 0; 
23}
Listing D.3: Embedding more than one assembly language instruction in a C function and specifying a register (C).

The declaration on line 13,

13   register int z asm("edx");

shows how to request that the compiler use the edx register for the variable z.

We have decided to embed three assembly language instructions. Recall that each assembly language statement is on a separate line. And on the next line, we tab to the place where the operation code begins. In C, the newline character is ’\n’ and the tab character is ’t’. So if you read line 18 carefully, you will see that there are three lines of assembly language. The first one is terminated by a ’\n’. The second instruction begins with a ’\t’ and is terminated by a ’\n’. And the third begins with a ’\t’.

The assembly language results are shown in Listing D.4.

 
1        .file  "embedAsm2.c" 
2        .section      .rodata 
3.LC0: 
4        .string"Enter an integer: " 
5.LC1: 
6        .string"%i" 
7.LC2: 
8        .string"Enter another integer: " 
9        .align 8 
10.LC3: 
11        .string"Sixteen times there sum is %i\n" 
12        .text 
13        .globlmain 
14        .type  main, @function 
15main: 
16        pushq  %rbp 
17        movq  %rsp, %rbp 
18        subq  $16, %rsp 
19        movl  $.LC0, %edi 
20        movl  $0, %eax 
21        call  printf 
22        leaq  -8(%rbp), %rax 
23        movq  %rax, %rsi 
24        movl  $.LC1, %edi 
25        movl  $0, %eax 
26        call  __isoc99_scanf 
27        movl  $.LC2, %edi 
28        movl  $0, %eax 
29        call  printf 
30        leaq  -4(%rbp), %rax 
31        movq  %rax, %rsi 
32        movl  $.LC1, %edi 
33        movl  $0, %eax 
34        call  __isoc99_scanf 
35#APP 
36# 19 "embedAsm2.c" 1 
37        movl -8(%rbp), %edx 
38        addl -4(%rbp), %edx 
39        sall $4, %edx 
40# 0 "" 2 
41#NO_APP 
42        movl  %edx, %eax 
43        movl  %eax, %esi 
44        movl  $.LC3, %edi 
45        movl  $0, %eax 
46        call  printf 
47        movl  $0, %eax 
48        leave 
49        ret 
50        .size  main, .-main 
51        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0" 
52        .section      .note.GNU-stack,"",@progbits
Listing D.4: Embedding more than one assembly language instruction in a C function and specifying a register (gcc assembly language).

Indeed, we can see our embedded assembly language on lines 37 – 39:

35#APP 
36# 19 "embedAsm2.c" 1 
37        movl -8(%rbp), %edx 
38        addl -4(%rbp), %edx 
39        sall $4, %edx 
40# 0 "" 2 
41#NO_APP

This has been a much abbreviated introduction to embedding assembly language in C. Each situation will be unique, and you will need to study the info pages for gcc in order to determine what needs to be done. You can also expect the rules to change — hopefully become easier to use — as gcc evolves.