linux c编程之"hello world"一


  • hello world
  • 开始学习
    • 汇编文件 hello.s
      • 第1行
      • 第2行
      • 第3行
      • 第4行
      • 第5行
      • 第6行
      • 第7行
      • 第8行
      • 第9行
      • 第10行
      • 第11行
      • 第12行
      • 第13行
  X

OS:CentOS 7
GCC: 4.8.5



  Using as
  Using as, the Gnu Assembler

hello world


include <stdio.h>int
{   printf("hello world!\n");return 0;



预处理器(cpp)源程序 hello.c
预处理后的程序 hello.i
编译器(ccl)预处理后的程序 hello.i
汇编程序 hello.s
汇编器(as)汇编程序 hello.s
可重定位目标程序 hello.o
链接器(ld)可重定位目标程序 hello.o printf.o


gcc -E -o hello.i hello.c

-E:Stop after the preprocessing stage; do not run the compiler proper. The output is in the form of preprocessed source code, which is sent to the standard output.
Input files which don’t require preprocessing are ignored.




gcc -S hello.c




汇编文件 hello.s

All assembler directives have names that begin with a period (‘.’). The names are case insensitive for most targets, and usually written in lower case.[ 来自GNU Assembler Manual 2.31]
  或者使用info来查看as手册(CentOS 7.4.1708上的版本为2.25.1):

info as



.file "hello.c"

There are two different versions of the .file directive. Targets that support DWARF2 line number information use the DWARF2 version of .file. Other targets use the default version.
Default Version
This version of the .file directive tells as that we are about to start a new logical file. The syntax is:

.file string

string is the new file name. In general, the filename is recognized whether or not it is surrounded by quotes ‘“’; but if you wish to specify an empty file name, you must give the quotes–”". This statement may go away in future: it is only recognized to be compatible with old as programs.

.file ""

DWARF2 Version
When emitting DWARF2 line number information, .file assigns filenames to the .debug_line file name table. The syntax is:

.file fileno filename

The fileno operand should be a unique positive integer to use as the index of the entry in the table. The filename operand is a C string literal.
The detail of filename indices is exposed to the user because the filename table is shared with the .debug_info section of the DWARF2 debugging information, and thus the user must know the exact indices that table entries will have.
  DWARF(Debug With Arbitrary Record Format)是广泛使用的,标准的调试数据格式。[各版本DWARF的PDF版文档,请点此下载,链接地址为: ]


.section       .rodata

Use the .section directive to assemble the following code into a section named name.
This directive is only supported for targets that actually support arbitrarily named sections; on a.out targets, for example, it is not accepted, even with a standard a.out section name.

COFF Version
For COFF targets, the .section directive is used in one of the following ways:

.section name[, "flags"]
.section name[, subsection]

If the optional argument is quoted, it is taken as flags to use for the section. Each flag is a single character. The following flags are recognized:

bbss section (uninitialized data)
nsection is not loaded
wwritable section
ddata section
eexclude section from linking
rread-only section
xexecutable section
mmergeable section (TIGCC extension, symbols in the section are considered mergeable constants)
uunaligned section (TIGCC extension, the contents of the section need not be aligned)
sshared section (meaningful for PE targets, useless for TIGCC)
aignored (for compatibility with the ELF version)
ysection is not readable (meaningful for PE targets)
0-9single-digit power-of-two section alignment (GNU extension)   注:2n(n∈[0,9])

If no flags are specified, the default flags depend upon the section name. If the section name is not recognized, the default will be for the section to be loaded and writable. Note the n and w flags remove attributes from the section, rather than adding them, so if they are used on their own it will be as if no flags had been specified at all.

If the optional argument to the .section directive is not quoted, it is taken as a subsegment number。
Assembled bytes conventionally fall into two sections: text and data. You may have separate groups of data in named sections that you want to end up near to each other in the object file, even though they are not contiguous in the assembler source. as allows you to use subsections for this purpose. Within each section, there can be numbered subsections with values from 0 to 8192. Objects assembled into the same subsection go into the object file together with other objects in the same subsection. For example, a compiler might want to store constants in the text section, but might not want to have them interspersed with the program being assembled. In this case, the compiler could issue a ‘.text 0’ before each section of code being output, and a ‘.text 1’ before each group of constants being output.
  每个段内,子段的编号可以从0到8192。(子段号就是一个子段的编号,其范围为[0, 8192]

Subsections are optional. If you do not use subsections, everything goes in subsection number zero.

Each subsection is zero-padded up to a multiple of four bytes. (Subsections may be padded a different amount on different flavors of as.)
  每个子段是0到4的倍数字节填充的。即填充字节数为4n (n为非负整数)。不同的汇编器,其子段填充数量不同。

Subsections appear in your object file in numeric order, lowest numbered to highest. (All this to be compatible with other people’s assemblers.) The object file contains no representation of subsections; ld and other programs that manipulate object files see no trace of them. They just see all your text subsections as a text section, and all your data subsections as a data section.

To specify which subsection you want subsequent statements assembled into, use a numeric argument to specify it, in a ‘.text expression’ or a ‘.data expression’ statement. When generating COFF output, you can also use an extra subsection argument with arbitrary named sections: ‘.section name, expression’. When generating ELF output, you can also use the .subsection directive (see SubSection) to specify a subsection: ‘.subsection expression’. Expression should be an absolute expression (see Expressions). If you just say ‘.text’ then ‘.text 0’ is assumed. Likewise ‘.data’ means ‘.data 0’. Assembly begins in text 0. For instance:

.text 0     # The default subsection is text 0 anyway.
.ascii "This lives in the first text subsection. *"
.text 1
.ascii "But this lives in the second text subsection."
.data 0
.ascii "This lives in the data section,"
.ascii "in the first data subsection."
.text 0
.ascii "This lives in the first text section,"
.ascii "immediately following the asterisk (*)."

       为了将后续指令汇编入某个子段,在 “.text expression” 或 “.data expression” 语句中,使用一个数字参数来指定该子段。在生成COFF输出时,还可以使用带有额外子段参数的任意命名的段:“.section name, expression”。在生成ELF输出时,还可以使用.subsection指令指定一个子段:“.subsection expression”(用子段expression替换当前子段)。expression应该是纯粹的表达式(expression指定一个地址或数字值)。“.text”即是“text 0”,同样,“.data”表示“.data 0”。汇编语言从text 0开始。通过上面的例子可以看出,同一子段中的指令被汇编在相邻的位置。
Each section has a location counter incremented by one for every byte assembled into that section. Because subsections are merely a convenience restricted to as there is no concept of a subsection location counter. There is no way to directly manipulate a location counter—but the .align directive changes it, and any label definition captures its current value. The location counter of the section where statements are being assembled is said to be the active location counter.

子段号就是子段的编号,取值范围在[0, 8192]之间;

ELF Version
This is one of the ELF section stack manipulation directives. The others are .subsection, .pushsection, .popsection, and .previous.
This directive replaces the current section and subsection.
For ELF targets, the .section directive is used like this:

.section name [, "flags"[, @type[,flag_specific_arguments]]]

The optional flags argument is a quoted string which may contain any combination of the following characters:

asection is allocatable
dsection is a GNU_MBIND section
esection is excluded from executable and shared library.
wsection is writable
xsection is executable
Msection is mergeable
Ssection contains zero terminated strings
Gsection is a member of a section group
Tsection is used for thread-local-storage
?section is a member of the previously-current section’s group, if any
<number>a numeric value indicating the bits to be set in the ELF section header’s flags field.
Note - if one or more of the alphabetic characters described above is also included in the flags field, their bit values will be ORed into the resulting value.
<target specific>some targets extend this list with their own flag characters

Note - once a section’s flags have been set they cannot be changed. There are a few exceptions to this rule however. Processor and application specific flags can be added to an already defined section. The .interp, .strtab and .symtab sections can have the allocate flag (a) set after they are initially defined, and the .note-GNU-stack section may have the executable (x) flag added.
The optional type argument may contain one of the following constants:

@progbitssection contains data
@nobitssection does not contain data (i.e., section only occupies space)
@notesection contains data which is used by things other than the program
@init_arraysection contains an array of pointers to init functions
@fini_arraysection contains an array of pointers to finish functions
@preinit_arraysection contains an array of pointers to pre-init functions
@<number>a numeric value to be set as the ELF section header’s type field.
@<target specific>some targets extend this list with their own types

Many targets only support the first three section types. The type may be enclosed in double quotes if necessary.

Note on targets where the @ character is the start of a comment (eg ARM) then another character is used instead. For example the ARM port uses the % character.

Note - some sections, eg .text and .data are considered to be special and have fixed types. Any attempt to declare them with a different type will generate an error from the assembler.

If flags contains the M symbol then the type argument must be specified as well as an extra argument—entsize—like this:

.section name , "flags"M, @type, entsize

Sections with the M flag but not S flag must contain fixed size constants, each entsize octets long. Sections with both M and S must contain zero terminated strings where each character is entsize bytes long. The linker may remove duplicates within sections with the same name, same entity size and same flags. entsize must be an absolute expression. For sections with both M and S, a string which is a suffix of a larger string is considered a duplicate. Thus “def” will be merged with “abcdef”; A reference to the first “def” will be changed to a reference to “abcdef”+3.

If flags contains the G symbol then the type argument must be present along with an additional field like this:

.section name , "flags"G, @type, GroupName[, linkage]

The GroupName field specifies the name of the section group to which this particular section belongs.
The optional linkage field can contain:

comdatindicates that only one copy of this section should be retained
.gnu.linkoncean alias for comdat

Note: if both the M and G flags are present then the fields for the Merge flag should come first, like this:

.section name , "flags"MG, @type, entsize, GroupName[, linkage]

If flags contains the ? symbol then it may not also contain the G symbol and the GroupName or linkage fields should not be present. Instead, ? says to consider the section that’s current before this directive. If that section used G, then the new section will use G with those same GroupName and linkage fields implicitly. If not, then the ? symbol has no effect.

If no flags are specified, the default flags depend upon the section name. If the section name is not recognized, the default will be for the section to have none of the above flags: it will not be allocated in memory, nor writable, nor executable. The section will contain data.

For ELF targets, the assembler supports another type of .section directive for compatibility with the Solaris assembler:

.section "name"[, flags...]

Note that the section name is quoted. There may be a sequence of comma separated flags:

#allocsection is allocatable
#writesection is writable
#execinstrsection is executable
#excludesection is excluded from executable and shared library.
#tlssection is used for thread local storage

This directive replaces the current section and subsection. See the contents of the gas testsuite directory gas/testsuite/gas/elf for some examples of how this directive and the other section stack directives work.



Symbol names begin with a letter or with one of ‘._’. On most machines, you can also use $ in symbol names. That character may be followed by any string of digits, letters, dollar signs (unless otherwise noted for a particular target machine), and underscores.

符号名不能由数字打头。Local Labels例外。

Local Symbol Names
A local symbol is any symbol beginning with certain local label prefixes. By default, the local label prefix is ‘.L’ for ELF systems or ‘L’ for traditional a.out systems, but each target may have its own set of local label prefixes. On the HPPA local symbols begin with ‘L$’.

Local symbols are defined and used within the assembler, but they are normally not saved in object files. Thus, they are not visible when debugging. You may use the ‘-L’ option to retain the local symbols in the object files. This option(‘-L’) tells as to retain those local symbols in the object file. Usually if you do this you also tell the linker ld to preserve those symbols.
A label is written as a symbol immediately followed by a colon ‘:’. The symbol then represents the current value of the active location counter, and is, for example, a suitable instruction operand. You are warned if you use the same symbol to represent two different locations: the first definition overrides any other definitions.

the first definition overrides any other definitions.



.string    "hello world!"


.string "str", .string8 "str", .string16 "str", .string32 "str", .string64 "str"

Copy the characters in str to the object file. You may specify more than one string to copy, separated by commas. Unless otherwise specified for a particular machine, the assembler marks the end of each string with a 0 byte. You can use any of the escape sequences described in Strings.


The variants string16, string32 and string64 differ from the string pseudo opcode in that each 8-bit character from str is copied and expanded to 16, 32 or 64 bits respectively. The expanded characters are stored in target endianness byte order.

	.string32 "BYE"

expands to:

	.string   "B\0\0\0Y\0\0\0E\0\0\0"  /* On little endian targets.  */.string   "\0\0\0B\0\0\0Y\0\0\0E"  /* On big endian targets.  */

所以,这行指令定义了一个由0终结的字符串"hello world!"。在.rodata段。接下来看第五行。




.text subsection

Tells as to assemble the following statements onto the end of the text subsection numbered subsection, which is an absolute expression. If subsection is omitted, subsection number zero is used.



.globl    main


.global symbol, .globl symbol

.global makes the symbol visible to ld. If you define symbol in your partial program, its value is made available to other partial programs that are linked with it. Otherwise, symbol takes its attributes from a symbol of the same name from another file linked into the same program.
Both spellings (‘.globl’ and ‘.global’) are accepted, for compatibility with other assemblers.
On the HPPA, .global is not always enough to make it accessible to other partial programs. You may need the HPPA-only .EXPORT directive as well.



.type  main, @function

This directive is used to set the type of a symbol.
COFF Version
For COFF targets, this directive is permitted only within .def/.endef pairs. It is used like this:

.type int

This records the integer int as the type attribute of a symbol table entry.

ELF Version
For ELF targets, the .type directive is used like this:

.type name , type description

This sets the type of symbol name to be either a function symbol or an object symbol. There are five different syntaxes supported for the type description field, in order to provide compatibility with various other assemblers.

Because some of the characters used in these syntaxes (such as ‘@’ and ‘#’) are comment characters for some architectures, some of the syntaxes below do not work on all architectures. The first variant will be accepted by the GNU assembler on all architectures so that variant should be used for maximum portability, if you do not need to assemble your code with other assemblers.

The syntaxes supported are:

  .type <name> STT_<TYPE_IN_UPPER_CASE>.type <name>,#<type>.type <name>,@<type>.type <name>,%<type>.type <name>,"<type>"

The types supported are:

Mark the symbol as being a function name.
Mark the symbol as an indirect function when evaluated during reloc processing. (This is only supported on assemblers targeting GNU systems).
Mark the symbol as being a data object.
Mark the symbol as being a thread-local data object.
Mark the symbol as being a common data object.
Does not mark the symbol in any way. It is supported just for completeness.
gnu_unique_objectMarks the symbol as being a globally unique data object. The dynamic linker will make sure that in the entire process there is just one symbol with this name and type in use. (This is only supported on assemblers targeting GNU systems).

Note: Some targets support extra types in addition to those listed above.






       此为一个标签。其中的FB即"function begin"。其中的数字0是一个任意的数值,是编译器基于一些实现细节生成的唯一标签名。



       .cfi打头的指令是CFI(Call Frame Information)指令,是辅助汇编器创建栈帧(stack frame)信息的。有25个CFI指令。

.cfi_startproc [simple]

.cfi_startproc is used at the beginning of each function that should have an entry in .eh_frame. It initializes some internal data structures. Don’t forget to close the function by .cfi_endproc.
Unless .cfi_startproc is used along with parameter simple it also emits some architecture dependent initial CFI instructions.

       在每个函数调用过程中,都会形成一个栈帧。理论上,调试器或异常处理程序完全可以根据frame pointer(或base pointer,通常保存在寄存器ebp(32位CPU)/rbp(64位CPU)中)来遍历调用过程中各个函数的栈帧,但是因为gcc的代码优化,可能导致调试器或异常处理很难甚至不能正常回溯栈帧,所以这些CFI指令的目的就是辅助编译过程创建栈帧信息,并将它们保存在目标文件的".eh_frame"段中,这样就不会被编译器优化影响了。
       GCC Exception Frame即eh_frame,其中的eh为exception handling.


pushq   %rbp

  将保存在寄存器rbp中的base pointer压入栈。目的是保存现场,以便在调用完成后恢复现场。


.cfi_def_cfa_offset 16


.cfi_def_cfa_offset offset

.cfi_def_cfa_offset modifies a rule for computing CFA. Register remains the same, but offset is new. Note that it is the absolute offset that will be added to a defined register to compute CFA address.
CFA(Canonical Frame Address)
An area of memory that is allocated on a stack called a ‘‘call frame.’’ The call frame is identified by an address on the stack. We refer to this address as the Canonical Frame Address or CFA.Typically, the CFA is defined to be the value of the stack pointer at the call site in the previous frame (which may be different from its value on entry to the current frame).[参考DWARF 6.4]


.cfi_offset 6, -16
.cfi_offset register, offset

Previous value of register is saved at offset offset from CFA.


movq    %rsp, %rbp



.cfi_def_cfa_register 6
.cfi_def_cfa_register register

.cfi_def_cfa_register modifies a rule for computing CFA. From now on register will be used instead of the old one. Offset remains the same.


movl    $.LC0, %edi

       通过对第三行的学习研究,我知道.LC0这个符号表示的是字符串"hello world!"的地址,它是一个立即数。所以,这行指令表示将字符串"hello world!"的地址送到寄存器edi中,用作函数printf()的参数。


call    puts



movl    $0, %eax



popq    %rbp



.cfi_def_cfa 7, 8
.cfi_def_cfa register, offset

.cfi_def_cfa defines a rule for computing CFA as: take address from register and add offset to it.






.cfi_endproc is used at the end of a function where it closes its unwind entry previously opened by .cfi_startproc, and emits it to .eh_frame.



       此为一个标签。其中的FE即"function end"。


.size   main, .-main

This directive is used to set the size associated with a symbol.

.size name , expression

This directive sets the size associated with a symbol name. The size in bytes is computed from expression which can make use of label arithmetic. This directive is typically used to set the size of function symbols.

A symbol is one or more characters chosen from the set of all letters (both upper and lower case), digits and the three characters _.$.

The special symbol . refers to the current address that as is assembling into. Thus, the expression melvin: .long . defines melvin to contain its own address. Assigning a value to . is treated the same as a .org directive. Thus, the expression .=.+4 is the same as saying .space 4.



.ident  "GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-16)"

This directive is used by some assemblers to place tags in object files. The behavior of this directive varies depending on the target. When using the a.out object file format, as simply accepts the directive for source-file compatibility with existing assemblers, but does not emit anything for it. When using COFF, comments are emitted to the .comment or .rdata section, depending on the target. When using ELF, comments are emitted to the .comment section.


--> gcc -o hello hello.c
--> strings hello | grep GCC
--> GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-16)

可以看到,生成的可执行文件中确实存在指定的标记。嗯,我似乎可以用它放一些小秘密到程序中了 😛


.section    .note.GNU-stack,"",@progbits










