Mobile

Assembly program structure. Abstract: Plan: Preface

Introduction.

The language in which the original program is written is called input language, and the language into which it is translated for execution by the processor - weekend language. The process of converting an input language into an output language is called broadcast. Since processors are capable of executing programs in binary machine language, which is not used for programming, translation of all source programs is necessary. known two ways translations: compilation and interpretation.

At compilation the source program is first completely translated into an equivalent program in the target language, called object program and then executed. This process is carried out using a special programs, called compiler. A compiler for which the input language is a symbolic form of representation of the machine (output) language of binary codes is called assembler.

At interpretations each line of source program text is parsed (interpreted) and the command specified in it is immediately executed. The implementation of this method lies with interpreter program. Interpretation takes a long time. To increase its efficiency, instead of processing each line, the interpreter first converts all command strings to characters (

). The generated sequence of symbols is used to perform the functions assigned to the original program.

The assembly language discussed below is implemented using compilation.

Features of the language.

The main features of the assembler:

● instead of binary codes, the language uses symbolic names - mnemonics. For example, for the addition command (

) mnemonic is used

Subtractions (

multiplication (

Divisions (

etc. Symbolic names are also used to address memory cells. To program in assembly language, instead of binary codes and addresses, you need to know only the symbolic names that the assembler translates into binary codes;

● each statement corresponds one machine command(code), that is, there is a one-to-one correspondence between machine instructions and operators in an assembly language program;

● language provides access to all objects and teams. High-level languages do not have this ability. For example, assembly language allows you to check a flag register bit, and a high-level language (for example,

) does not have this capability. Note that languages for systems programming (for example, C) often occupy an intermediate position. In terms of accessibility, they are closer to assembly language, but they have the syntax of a high-level language;

● assembly language is not a universal language. Each specific group of microprocessors has its own assembler. High-level languages do not have this disadvantage.

Unlike high-level languages, writing and debugging an assembly language program takes a lot of time. Despite this, assembly language has become wide use due to the following circumstances:

● A program written in assembly language is much smaller and much faster than a program written in a high-level language. For some applications, these indicators play a paramount role, for example, many system programs(including compilers), programs in credit cards, cell phones, device drivers, etc.;

● some procedures require full access to hardware, which is usually not possible in a high-level language. This case includes interrupts and interrupt handlers in operating systems, as well as device controllers in real-time embedded systems.

In most programs, only a small percentage of the total code is responsible for a large percentage of the program's execution time. Typically, 1% of the program is responsible for 50% of the execution time, and 10% of the program is responsible for 90% of the execution time. Therefore, to write a specific program in real conditions, both assembler and one of the high-level languages are used.

Operator format in assembly language.

An assembly language program is a list of commands (statements, sentences), each of which occupies a separate line and contains four fields: a label field, an operation field, an operand field, and a comment field. Each field has a separate column.

Label field.

Column 1 is allocated for the label field. A label is a symbolic name, or identifier, addresses memory. It is necessary in order to be able to:

● make a conditional or unconditional transition to the command;

● get access to the place where the data is stored.

Such statements are labeled. To designate the name, Xia (capital) letters of the English alphabet and numbers are used. The name must start with a letter and end with a colon. The colon label can be written on a separate line, and the opcode can be written on the next line in column 2, which simplifies the compiler's work. The absence of a colon makes it impossible to distinguish between a label and an opcode if they are on separate lines.

In some versions of assembly language, colons are placed only after instruction labels, not after data labels, and label length can be limited to 6 or 8 characters.

The label field should not contain the same names, since the label is associated with the addresses of commands. If during program execution there is no need to call a command or data from memory, then the label field remains empty.

Transaction code field.

This field contains the command mnemonic or pseudo-command (see below). The command mnemonic code is chosen by the language designers. In assembly language

mnemonic selected to load register from memory

), and to store the contents of the register in memory - the mnemonic

). In assembly languages

you can use the same name for both operations, respectively

If the choice of mnemonic names can be arbitrary, then the need to use two machine instructions is due to the processor architecture

Operand field.

Here is the additional information required to perform the operation. In the field of operands for jump instructions, the address where you want to jump is indicated, as well as addresses and registers that are operands for the machine instruction. As an example, here are the operands that can be used for 8-bit processors

● numerical data,

presented in different number systems. To indicate the number system used, the constant is followed by one of the Latin letters: B,

Accordingly, binary, octal, hexadecimal, decimal number systems (

may not be recorded). If the first digit of the hexadecimal number is A, B, C,

Then an insignificant 0 (zero) is added in front;

● codes of microprocessor internal registers and memory cells

M (sources or receivers of information) in the form of letters A, B, C,

M or their addresses in any number system (for example, 10V - register address

v binary system);

● identifiers,

for registered aircraft pairs,

The first letters B

H; for a pair of accumulator and feature register -

; for the program counter -

; for stack pointer -

● labels indicating addresses of operands or next instructions in conditional

(when the condition is met) and unconditional transitions. For example, operand M1 in the command

means the need for an unconditional transition to the command, the address of which in the label field is marked with the identifier M1;

● expressions,

which are built by linking the data discussed above using arithmetic and logical operators. Note that the way data space is reserved depends on the version of the language. Assembly language developers for

Define the word), and later introduced an alternative.

which from the very beginning was in the language for processors

In language version

used

define a constant).

Processors process operands of different lengths. To define it, assembler developers have made different decisions, for example:

II registers of different lengths have different names: EAX - for placing 32-bit operands (type

); AX - for 16-bit (type

and AN - for 8-bit (type

● for processors

suffixes are added to each opcode: suffix

For type

; suffix ".B" for type

for operands of different lengths, different opcodes are used, for example, to load a byte, a halfword (

) and words in 64-bit register use opcodes

respectively.

Comments field.

This field provides explanations about the actions of the program. Comments do not affect the operation of the program and are intended for a person. They may be needed to modify a program that, without such comments, may be completely incomprehensible even to experienced programmers. A comment begins with a character and is used to explain and document programs. The start character of a comment can be:

● semicolon (;) in languages for processors of the company

● Exclamation point(!) in languages for

Each separate line reserved for a comment is preceded by a start character.

Pseudo commands (directives).

In assembly language, two main types of commands can be distinguished:

● basic instructions that are equivalent to the machine code of the processor. These commands do all the processing provided by the program;

● pseudo-commands or directives, designed to serve the process of translating the program into the language of code combinations. As an example, in Table. 5.2.2 shows some pseudo-commands from the as-assembler

for family

When programming, there are situations when, according to the algorithm, the same chain of commands must be repeated many times. To get out of this situation, you can:

● write the desired sequence of commands whenever it occurs. This approach leads to an increase in the volume of the program;

● arrange this sequence into a procedure (subroutine) and call it if necessary. Such an exit has its drawbacks: each time you have to execute a special procedure call instruction and a return instruction, which, with a short and frequently used sequence, can greatly reduce the speed of the program.

The most simple and effective method repeated repetition of a chain of commands is to use macro, which can be thought of as a pseudo-command designed to re-translate a frequently occurring group of commands in a program.

A macro, or macro instruction, is characterized by three aspects: macro definition, macro inversion, and macro expansion.

macro definition

This is a designation for a repeatedly repeated sequence of program commands, used for references in the text of the program.

A macro has the following structure:

List of expressions; macro definition

In the above macro definition structure, three parts can be distinguished:

● header

macro containing the name

Pseudo-command

and a set of parameters;

● dotted body macro;

● team

graduation

macro definitions.

A macro parameter set contains a list of all parameters given in the operand field for the selected instruction group. If these parameters are given earlier in the program, then they can be omitted in the macro definition header.

To reassemble the selected group of instructions, a call is used, consisting of the name

macro and parameter list with other values.

When the assembler encounters a macro definition during compilation, it stores it in the macro definition table. With subsequent appearances in the program of the name (

) of a macro, the assembler replaces it with the body of the macro.

Using a macro name as an opcode is called macro-reversal(macro call), and its replacement by the body of the macro - macro expansion.

If the program is represented as a sequence of characters (letters, numbers, spaces, punctuation and carriage returns to jump to new line), then macro expansion consists in replacing some chains from this sequence with other chains.

Macro expansion occurs during the assembly process, not during program execution. Ways to manipulate strings of characters is assigned to macro tools.

The assembly process is carried out in two passes:

● On the first pass, all macro definitions are kept and macro calls are expanded. In this case, the source program is read and converted into a program in which all macro definitions are removed, and each macro call is replaced by a macro body;

● The second pass processes the received program without macros.

Macros with parameters.

To work with repeating sequences of commands, the parameters of which can take on different values, macro definitions are provided:

● with actual parameters that are placed in the operand field of the macro call;

● with formal parameters. During macro expansion, each formal parameter that appears in the body of the macro is replaced by the corresponding actual parameter.

using macros with parameters.

Program 1 shows two similar sequences of commands, differing in that the first of them swaps P and

And the second

Program 2 includes a macro with two formal parameters P1 and P2. During macro expansion, each P1 character inside the macro body is replaced by the first actual parameter (P,

), and the symbol P2 is replaced by the second actual parameter (

) from program No. 1. In a macro call

program 2 is marked: P,

The first actual parameter,

The second actual parameter.

Program 1

Program 2

MOV EBX,Q MOV EAX,Pl

MOV Q,EAX MOV EBX,P2

MOV P,EBX MOV P2,EAX

Extended capabilities.

Consider some advanced features of the language

If a macro containing a conditional branch instruction and a label to jump to is called two or more times, the label will be duplicated (label duplication problem), which will cause an error. Therefore, each call is assigned (by the programmer) a separate label as a parameter. In language

the label is declared local (

) and thanks to the advanced features, the assembler automatically generates a different label each time the macro is expanded.

allows you to define macros inside other macros. This advanced feature is very useful when combined with conditional program linking. Consider

IF WORDSIZE GT 16 M2 MACRO

Macro M2 can be defined in both parts of the statement

However, the definition depends on whether the program is being assembled on a 16-bit or 32-bit processor. If M1 is not called, then macro M2 will not be defined at all.

Another advanced feature is that macros can call other macros, including themselves - recursive call. In the latter case, in order to avoid an infinite loop, the macro must pass a parameter to itself, which changes with each expansion, and also check this parameter and end the recursion when the parameter reaches a certain value.

On the use of macros in assembler.

When using macros, the assembler must be able to perform two functions: save macro definitions and expand macro calls.

Saving macro definitions.

All macro names are stored in a table. Each name is accompanied by a pointer to the corresponding macro so that it can be called if necessary. Some assemblers have a separate table for macro names, others have a common table in which, along with macro names, there are all machine commands and directives.

When encountering a macro during assembly created:

● new table element with the name of the macro, the number of parameters and a pointer to another macro definition table where the macro body will be stored;

● list formal parameters.

The body of the macro, which is simply a string of characters, is then read and stored in the macro definition table. Formal parameters occurring in the loop body are marked with a special symbol.

Internal representation of a macro

from the above example for program 2 (p. 244) is:

MOV EAX, MOV EBX, MOV MOV &

where the semicolon is used as the carriage return character, and the ampersand & is used as the formal parameter character.

Macro call extension.

Whenever a macro definition is encountered during assembly, it is stored in the macro table. When a macro is called, the assembler temporarily suspends reading input data from the input device and starts reading the saved macro body. The formal parameters extracted from the macro body are replaced by the actual parameters and provided by the call. An ampersand & in front of the parameters allows the assembler to recognize them.

Although there are many versions of assembler, assembly processes have common features and are similar in many ways. The work of a two-pass assembler is considered below.

Two pass assembler.

The program consists of a number of statements. Therefore, it would seem that the following sequence of actions can be used during assembly:

● translate it into machine language;

● transfer the received machine code to a file, and the corresponding part of the listing - to another file;

● repeat the above procedures until the entire program is broadcast.

However, this approach is not efficient. An example is the so-called problem leading link. If the first statement is a jump to the P statement at the very end of the program, then the assembler cannot translate it. He must first determine the address of the operator P, and for this it is necessary to read the entire program. Each complete reading of the original program is called passage. Let's show how we can solve the forward reference problem using two passes:

● on the first pass collect and store all symbol definitions (including labels) in the table, and on the second pass, read and assemble each operator. This method is relatively simple, but the second pass through the original program requires additional I/O time;

● on the first pass, convert program into an intermediate form and save it in a table, and the second pass is performed not according to the original program, but according to the table. This method of assembly saves time, since no I/O operations are performed on the second pass.

First pass.

Purpose of the first pass- build a symbol table. As noted above, another goal of the first pass is to save all macro definitions and expand the calls as they appear. Therefore, both character definition and macro expansion occur in the same pass. The symbol can be either label, or meaning, which is assigned a specific name using the -you directive:

;Value - buffer size

By giving meaning to the symbolic names in the instruction label field, the assembler essentially sets the addresses that each instruction will have during program execution. To do this, the assembler during the assembly process saves instruction address counter(

) as a special variable. At the beginning of the first pass, the value of the special variable is set to 0 and incremented after each command processed by the length of that command. As an example, in Table. 5.2.3 shows a fragment of the program indicating the length of commands and counter values. Tables are generated during the first pass symbol names, directives and operation codes, and if necessary literal table. A literal is a constant for which the assembler automatically reserves memory. We note right away that modern processors contain instructions with direct addresses, so their assemblers do not support literals.

Symbol table

contains one element for each name (Table 5.2.4). Each entry in the symbol table contains the name itself (or a pointer to it), its numerical value, and sometimes some additional information, which may include:

● the length of the data field associated with the symbol;

● memory remapping bits (which indicate whether the value of a character changes if the program is loaded at a different address than the assembler intended);

● information about whether the symbol can be accessed from outside the procedure.

Symbolic names are labels. They can be specified using operators (for example,

Table of directives.

This table lists all the directives, or pseudo-commands, that occur when assembling a program.

Operation code table.

For each opcode, the table has separate columns: opcode designation, operand 1, operand 2, hexadecimal value of the opcode, instruction length and instruction type (Table 5.2.5). Operation codes are divided into groups depending on the number and type of operands. The command type determines the group number and specifies the procedure that is called to process all commands in that group.

Second pass.

Purpose of the second pass- creating an object program and printing, if necessary, an assembly protocol; output information needed by the linker to link procedures that were assembled at different times into one executable file.

In the second pass (as in the first), the lines containing the statements are read and processed one after the other. The original operator and the output derived from it in hexadecimal object the code can be printed or buffered for later printing. After resetting the command address counter, the command is called next statement.

The original program may contain errors, for example:

● the given symbol is not defined or defined more than once;

● The opcode is represented by an invalid name (due to a typo), not provided with enough operands, or has too many operands;

● no operator

Some assemblers may detect an undefined symbol and replace it. However, in most cases, when a statement with an error is found, the assembler displays an error message on the screen and tries to continue the assembly process.

Articles dedicated to the assembly language.

Topic 2.5 Processor Programming Basics

As the length of the program increases, it becomes more difficult to remember the codes for various operations. Mnemonics provide some help in this regard.

The symbolic instruction encoding language is called assembler.

assembly language is a language in which each statement corresponds to exactly one machine instruction.

Assembly called converting a program from assembly language, i.e. preparing a program in machine language by replacing symbolic names of operations with machine codes, and symbolic addresses with absolute or relative numbers, as well as including library programs and generating sequences of symbolic instructions by specifying specific parameters in microinstructions. This program usually placed in ROM or entered into RAM from some external medium.

Assembly language has several features that distinguish it from high-level languages:

1. This is a one-to-one correspondence between assembly language statements and machine instructions.

2. The assembly language programmer has access to all objects and commands present on the target machine.

An understanding of the basics of programming in machine-oriented languages is useful for:

Better understanding of PC architecture and better use of computers;

To develop more rational structures of algorithms for programs for solving applied problems;

Viewing and editing options executable programs with the .exe and .com extensions compiled from any high-level languages, in case of loss of the source programs (by calling the indicated programs into the DEBUG program debugger and decompiling their display in assembly language);

Compiling programs for solving the most critical tasks (a program compiled in a machine-oriented language is usually more efficient - shorter and faster by 30-60 percent than programs obtained as a result of translation from high-level languages)

For the implementation of procedures included in the main program as separate fragments in the event that they cannot be implemented either in the high-level language used or using OS service procedures.

An assembly language program can only run on computers of the same family, while a program written in a high-level language can potentially run on different machines.

The assembly language alphabet is made up of ASCII characters.

Numbers are only integers. Distinguish:

Binary numbers ending with the letter B;

Decimal numbers ending with D;

Hexadecimal numbers ending with the letter N.

RAM, registers, data representation

For a certain series of MPs, an individual programming language is used - assembly language.

Assembly language occupies an intermediate position between machine codes and high-level languages. Programming in this language is easier. An assembly language program uses the capabilities of a particular machine (more precisely, MP) more rationally than a program in a high-level language (which is easier for a programmer than assembler). We will consider the basic principles of programming in machine-oriented languages using the assembly language for MP KR580VM80 as an example. For programming in the language, a general technique is used. Specific techniques for recording programs are related to the architecture and command system features of the target MP.

Software model microprocessor system based on MP KR580VM80

The program model of the MPS in accordance with Figure 1

MP Ports Memory

Picture 1

From the programmer's point of view, the KR580VM80 MP has the following program-accessible registers.

A– 8-bit accumulator register. It is the main register of MP. Any operation performed in the ALU involves placing one of the operands to be processed in the accumulator. The result of the operation in the ALU is also usually stored in A.

B, C, D, E, H, L– 8-bit general purpose registers (RON). MP internal memory. Designed to store the processed information, as well as the results of the operation. When processing 16-bit words from registers, pairs BC, DE, HL are formed, and the dual register is called the first letter - B, D, H. In the register pair, the first register is the highest. special property have registers H, L, used both for storing data and for storing 16-bit addresses of RAM cells.

FL– flag register (feature register) An 8-bit register that stores five features of the result of performing arithmetic and logical operations in the MP. FL format according to the picture

Bit C (CY - carry) - transfer, set to 1 if there was a transfer from the high-order byte during execution arithmetic operations.

Bit P (parity) - parity, is set to 1 if the number of units in the bits of the result is even.

The AC bit is an additional carry, designed to store the carry value from the least significant tetrad of the result.

Bit Z (zero) - set to 1 if the result of the operation is 0.

The S (sign) bit is set to 1 if the result is negative and to 0 if the result is positive.

SP-- the stack pointer, a 16-bit register, is designed to store the address of the memory location where the last byte entered on the stack was written.

RS– program counter (program counter), 16-bit register, designed to store the address of the next executable instruction. The content of the program counter is automatically incremented by 1 immediately after the next instruction byte is fetched.

In the initial memory area of address 0000H - 07FF is located control program and demo programs. This is the ROM area.

0800 - 0AFF - address area for recording the programs under study. (RAM).

0В00 - 0ВВ0 - address area for data recording. (RAM).

0BB0 is the starting address of the stack. (RAM).

Stack is a specially organized area of RAM designed for temporary storage of data or addresses. The last number pushed onto the stack is the first number popped off the stack. The stack pointer stores the address of the last stack location where information is stored. When a subroutine is called, the return address to the main program is automatically stored on the stack. As a rule, at the beginning of each subroutine, the contents of all registers involved in its execution are stored on the stack, and at the end of the subroutine, they are restored from the stack.

Assembly Language Data Format and Command Structure

Memory MP KR580VM80 is an array of 8-bit words called bytes. Each byte has its own 16-bit address that determines its position in the sequence of memory cells. The MP can address 65536 bytes of memory, which can contain both ROM and RAM.

Data format

Data is stored in memory as 8-bit words:

D7 D6 D5 D4 D3 D2 D1 D0

The least significant bit is bit 0, the most significant bit is bit 7.

The command is characterized by the format, i.e., the number of bits allocated for it, which are divided byte-by-byte into certain functional fields.

Command format

MP KR580VM80 commands have one, two or three-byte format. Multi-byte instructions must be placed in neighboring PLs. The format of the command depends on the specifics of the operation being performed.

The first byte of the command contains the opcode written in mnemonic form.

It defines the format of the command and the actions that must be performed by the MP on the data during its execution, and the method of addressing, and may also contain information about the location of the data.

The second and third bytes can contain data to be operated on, or addresses that indicate the location of the data. The data on which operations are performed are called operands.

Single-byte command format according to Figure 2

Figure 4

In assembly language instructions, the opcode has an abbreviated form of writing English words - a mnemonic notation. Mnemonics (from the Greek mnemonic - the art of memorization) makes it easier to remember commands according to their functional purpose.

Before execution, the source program is translated using a translation program, called assembler, into the language of code combinations - machine language, in this form it is placed in the memory of the MP and then used when executing the command.

Addressing methods

All operand codes (input and output) must be located somewhere. They can be in the internal registers of the MP (the most convenient and fastest option). They can be located in system memory (the most common option). Finally, they can be in I / O devices (the rarest case). The location of the operands is determined by the instruction code. There are various methods by which the instruction code can determine where to take the input operand from and where to put the output operand. These methods are called addressing methods.

For MP KR580VM80 there are the following addressing methods:

Immediate;

indirect;

Stack.

Immediate addressing assumes that the operand (input) is in memory immediately after the instruction code. The operand is usually a constant that needs to be sent somewhere, added to something, etc. data is contained in the second or second and third bytes of the instruction, with the low data byte in the second command byte, and the high data byte in the third command byte.

Straight (aka absolute) addressing assumes that the operand (input or output) is located in memory at the address whose code is located inside the program immediately after the instruction code. Used in three-byte commands.

Register addressing assumes that the operand (input or output) is in the internal MP register. Used in single byte commands

Indirect (implicit) addressing assumes that the internal register of the MP is not the operand itself, but its address in memory.

Stack addressing assumes that the command does not contain an address. Addressing to memory cells by the contents of the 16-bit SP register (stack pointer).

Command system

The MP command system is complete list elementary actions that MP is capable of performing. The MP controlled by these commands performs simple steps, such as elementary arithmetic and logical operations, data transfer, comparison of two values, etc. The number of commands MP KR580VM80 - 78 (including modifications 244).

Distinguish following groups commands:

Data transmission;

Arithmetic;

Brain teaser;

Jump commands;

Commands for input-output, control and work with the stack.

Symbols and abbreviations used in describing commands and writing programs

Symbol	Reduction
ADDR	16 bit address
DATA	8-bit data
DATA 16	16 bit data
PORT	8-bit I/O address (I/O devices)
BYTE 2	Second command byte
BYTE 3	Third command byte
R, R1, R2	One of the registers: A, B, C, D, E, H, L
RP	One of the register pairs: B - sets a pair of aircraft; D - sets a pair of DE; H - specifies a pair of HL
RH	First register of the pair
RL	Second register of the pair
Λ	Boolean multiplication
V	Boolean addition
	Modulo two addition
M	Memory cell whose address specifies the contents of the HL register pair, i.e. M = (HL)

By purpose, commands can be distinguished (examples of mnemonic opcodes of commands of a PC assembler such as IBM PC are given in brackets):

l perform arithmetic operations (ADD and ADC - add and add with carry, SUB and SBB - subtract and subtract with a loan, MUL and IMUL - unsigned and signed multiplications, DIV and IDIV - unsigned and signed divisions, CMP - comparisons etc.);

l performing logical operations (OR, AND, NOT, XOR, TEST, etc.);

l data transfer (MOV - send, XCHG - exchange, IN - enter into the microprocessor, OUT - withdraw from the microprocessor, etc.);

l transfer of control (program branches: JMP - unconditional branch, CALL - procedure call, RET - return from the procedure, J* - conditional branch, LOOP - loop control, etc.);

l processing character strings (MOVS - transfers, CMPS - comparisons, LODS - downloads, SCAS - scans. These commands are usually used with a prefix (repetition modifier) REP;

l program interrupts (INT - software interrupts, INTO - conditional interrupts on overflow, IRET - return from interrupt);

l microprocessor control (ST* and CL* - set and clear flags, HLT - stop, WAIT - standby, NOP - idle, etc.).

WITH complete list assembler commands can be found in the works.

Data transfer commands

l MOV dst, src - data transfer (move - move from src to dst).

Transfers: one byte (if src and dst are in byte format) or one word (if src and dst are in word format) between registers or between register and memory, and writes an immediate value to a register or memory.

The operands dst and src must have the same format - byte or word.

Src can be of type: r (register) - register, m (memory) - memory, i (impedance) - immediate value. Dst can be of type r, m. Operands cannot be used in one command: rsegm together with i; two operands of type m and two operands of type rsegm). Operand i can be simple expression:

mov AX, (152 + 101B) / 15

Expression evaluation is performed only during translation. Flags do not change.

l PUSH src - putting a word on the stack (push - push through; push to the stack from src). Pushes the contents of src onto the top of the stack - any 16-bit register (including segment) or two memory locations containing a 16-bit word. The flags do not change;

l POP dst - extracting a word from the stack (pop - pop; count from the stack in dst). Removes a word from the top of the stack and places it in dst - any 16-bit register (including segment) or two memory locations. Flags do not change.

Programming at the level of machine instructions is the minimum level at which programming is possible. The system of machine instructions must be sufficient to implement the required actions by issuing instructions to the computer hardware.

Each machine instruction consists of two parts:

operating room - determining "what to do";
operand - defining processing objects, “what to do with”.

The machine instruction of the microprocessor, written in assembly language, is a single line with the following syntactic form:

label command/directive operand(s) ;comments

In this case, a mandatory field in a line is a command or directive.

The label, command/directive, and operands (if any) are separated by at least one space or tab character.

If a command or directive needs to be continued on the next line, then the backslash character is used: \.

By default, assembly language does not distinguish between uppercase and lowercase letters in commands or directives.

Example lines of code:

Countdb 1 ;Name, directive, one operand
mov eax,0 ;Command, two operands
cbw ; Team

Teams

Team tells the translator what action the microprocessor should perform. In a data segment, a command (or directive) defines a field, workspace, or constant. In a code segment, an instruction defines an action, such as a move (mov) or an addition (add).

directives

The assembler has a number of operators that allow you to control the process of assembling and generating a listing. These operators are called directives . They act only in the process of assembling the program and, unlike instructions, do not generate machine codes.

operands

Operand – an object on which a machine command or a programming language operator is executed.
An instruction may have one or two operands, or no operands at all. The number of operands is implicitly specified by the instruction code.
Examples:

No operands ret ;Return
One operand inc ecx ;Increment ecx
Two operands add eax,12 ;Add 12 to eax

The label, command (directive), and operand do not have to start at any particular position in the string. However, it is recommended to write them in a column for greater readability of the program.

Operands can be

identifiers;
strings of characters enclosed in single or double quotes;
integers in binary, octal, decimal, or hexadecimal.

Identifiers

Identifiers – sequences of valid characters used to designate program objects such as operation codes, variable names, and label names.

Rules for writing identifiers.

The identifier can be one or more characters.
As characters, you can use letters of the Latin alphabet, numbers and some special characters: _, ?, $, @.
An identifier cannot start with a digit character.
The ID can be up to 255 characters long.
The translator accepts the first 32 characters of the identifier and ignores the rest.

Comments

Comments are separated from the executable line by a character; . In this case, everything that is written after the semicolon character and up to the end of the line is a comment. The use of comments in a program improves its clarity, especially where the purpose of a set of instructions is unclear. The comment can contain any printable characters, including spaces. The comment can span the entire line or follow the command on the same line.

Assembly program structure

A program written in assembly language may consist of several parts, called modules . Each module can define one or more data, stack, and code segments. Any complete assembly language program must include one main, or main, module from which its execution begins. A module may contain code, data, and stack segments declared with the appropriate directives. Before declaring segments, you must specify the memory model using the .MODEL directive.

An example of a "doing nothing" program in assembly language:

686P
.MODEL FLAT, STDCALL
.DATA
.CODE
START:

RET
END START

This program contains only one microprocessor instruction. This command is RET . It ensures the correct termination of the program. In general, this command is used to exit a procedure.
The rest of the program is related to the operation of the translator.
.686P - Pentium 6 (Pentium II) protected mode commands are allowed. This directive selects the supported assembler instruction set by specifying the processor model. The letter P at the end of the directive tells the translator that the processor is running in protected mode.
.MODEL FLAT, stdcall is a flat memory model. This memory model is used in operating system Windows. stdcall
.DATA is a program segment containing data.
.CODE is a program block containing code.
START is a label. In assembler, labels play a big role, which cannot be said about modern high-level languages.
END START - the end of the program and a message to the translator that the program must be started from the label START .
Each module must contain an END directive marking the end source code programs. All lines that follow the END directive are ignored. Omitting the END directive generates an error.
The label after the END directive tells the compiler the name of the main module from which program execution begins. If the program contains one module, the label after the END directive can be omitted.

Topic 1.4 Assembler mnemonics. Command structure and formats. Types of addressing. Microprocessor instruction set

Plan:

1 Assembly language. Basic concepts

2 Assembly language symbols

3 Types of assembler statements

4 Assembly Directives

5 Processor instruction set

1 Iassembly language. Basic concepts

assembly languageis a symbolic representation of machine language. All processes in the machine at the lowest, hardware level are driven only by commands (instructions) of the machine language. From this it is clear that, despite the common name, the assembly language for each type of computer is different.

An assembly language program is a collection of blocks of memory called memory segments. A program may consist of one or more of these block-segments. Each segment contains a collection of language sentences, each of which occupies a separate line of program code.

Assembly statements are of four types:

1) commands or instructions which are symbolic analogues of machine commands. During the translation process, assembly instructions are converted into the corresponding commands of the microprocessor instruction set;

2) macros -the sentences of the text of the program, which are formalized in a certain way, are replaced by other sentences during the broadcast;

3) directives,which are instructions to the assembler translator to perform some actions. Directives have no counterparts in machine representation;

4) comment lines , containing any characters, including letters of the Russian alphabet. Comments are ignored by the translator.

Assembly program structure. assembler syntax.

The sentences that make up a program can be a syntactic construct corresponding to a command, macro, directive, or comment. In order for the assembler translator to recognize them, they must be formed according to certain syntactic rules. To do this, it is best to use a formal description of the syntax of the language, like the rules of grammar. The most common ways similar description programming language - syntax diagrams and extended forms of Backus-Naur. For practical use more comfortable syntax diagrams. For example, the syntax of assembly language statements can be described using the syntax diagrams shown in the following figures 10, 11, 12.

Figure 10 - Assembly sentence format

Figure 11 - Format of directives

Figure 12 - Format of commands and macros

On these drawings:

label name- identifier, the value of which is the address of the first byte of the sentence of the source code of the program, which it denotes;

name -an identifier that distinguishes this directive from other directives of the same name. As a result of processing by the assembler of a certain directive, certain characteristics can be assigned to this name;

operation code (COP) and directive - these are mnemonic symbols for the corresponding machine instruction, macro instruction, or compiler directive;

operands -parts of a command, macro, or assembler directive, denoting objects on which actions are performed. Assembler operands are described by expressions with numeric and text constants, variable labels and identifiers using operator signs and some reserved words.

Syntax diagrams help find and then traverse the path from the diagram's input (left) to its output (right). If such a path exists, then the sentence or construction is syntactically correct. If there is no such path, then the compiler will not accept this construction.

2 Assembly language symbols

Allowed characters when writing the text of programs are:

1) all latin letters: A-Z,a-z. In this case, uppercase and lowercase letters are considered equivalent;

2) numbers from 0 before 9 ;

3) signs ? , @ , $ , _ , & ;

4) separators , . () < > { } + / * % ! " " ? = # ^ .

Assembler sentences are formed from tokens, which are syntactically inseparable sequences of valid language characters that make sense for the translator.

tokens are:

1) identifiers - sequences of valid characters used to designate program objects such as opcodes, variable names, and label names. The rule for writing identifiers is as follows: an identifier may consist of one or more characters;

2) character strings - character sequences enclosed in single or double quotes;

3) integers in one of the following number systems : binary, decimal, hexadecimal. Identification of numbers when writing them in assembler programs is carried out according to certain rules:

4) decimal numbers do not require any additional symbols for their identification, for example, 25 or 139. For identification in the source code of the program binary numbers it is necessary, after writing the zeros and ones included in their composition, to put the Latin “ b”, for example 10010101 b.

5) hexadecimal numbers have more conventions in their notation:

First, they are made up of numbers. 0...9 , lowercase and uppercase letters of the Latin alphabet a,b, c,d,e,f or A,B,C,D,E,F.

Secondly, the translator may have difficulty recognizing hexadecimal numbers due to the fact that they can consist of both the digits 0 ... 9 (for example, 190845) and begin with a letter of the Latin alphabet (for example, ef15). In order to "explain" to the translator that the given lexeme is not a decimal number or an identifier, the programmer must specially allocate the hexadecimal number. To do this, at the end of the sequence of hexadecimal digits that make up the hexadecimal number, write the Latin letter “ h". This required condition. If a hexadecimal number starts with a letter, it is preceded by a leading zero: 0 ef15 h.

Almost every sentence contains a description of the object on which or with the help of which some action is performed. These objects are called operands. They can be defined like this: operands- these are objects (some values, registers or memory cells) that are affected by instructions or directives, or these are objects that define or refine the action of instructions or directives.

It is possible to carry out the following classification of operands:

constant or immediate operands;

address operands;

moved operands;

address counter;

base and index operands;

structural operands;

records.

Operands are elementary components that form part of the machine instruction, denoting the objects on which the operation is performed. In a more general case, operands can be included as components in more complex formations called expressions.

Expressions are combinations of operands and operators considered as a whole. The result of expression evaluation can be the address of some memory cell or some constant (absolute) value.

3 Types of assembler statements

Let's list the possible types assembler statements and syntactic rules for the formation of assembler expressions:

arithmetic operators;

shift operators;

comparison operators;

logical operators;

index operator;

type override operator;

segment redefinition operator;

structure type naming operator;

operator for obtaining the segment component of the address of the expression;

expression offset get operator.

1 Assembly Directives

Assembler directives are:

1) Segmentation directives. In the course of the previous discussion, we found out all the basic rules for writing instructions and operands in an assembly language program. The question of how to properly format the sequence of commands so that the translator can process them and the microprocessor can execute them remains open.

When considering the architecture of the microprocessor, we learned that it has six segment registers, through which it can work simultaneously:

with one code segment;

with one stack segment;

with one data segment;

with three additional data segments.

Physically, a segment is a memory area occupied by commands and (or) data whose addresses are calculated relative to the value in the corresponding segment register. The syntactic description of a segment in assembler is the construction shown in Figure 13:

Figure 13 - Syntactic description of the segment in assembler

It is important to note that the functionality of a segment is somewhat broader than simply breaking the program into blocks of code, data, and stack. Segmentation is part of a more general mechanism related to concept of modular programming. It involves the unification of the design of object modules created by the compiler, including those from different programming languages. This allows you to combine programs written in different languages. It is for the implementation various options such a union and the operands in the SEGMENT directive are intended.

2) Listing control directives. Listing control directives are divided into the following groups:

general listing control directives;

output directives to include file listing;

output directives for conditional assembly blocks;

output directives to the listing of macros;

directives for displaying information about cross-references in the listing;

listing format change directives.

2 Processor instruction set

The processor instruction set is shown in Figure 14.

Consider the main groups of commands.

Figure 14 - Classification of assembly instructions

Commands are:

1 Data transfer commands. These instructions occupy a very important place in the instruction set of any processor. They perform the following essential functions:

saving in memory the contents of the internal registers of the processor;

copying content from one memory area to another;

writing to I/O devices and reading from I/O devices.

In some processors, all these functions are performed by a single instruction MOV (for byte transfers - MOVB ) but with different methods of addressing operands.

In other processors besides the instruction MOV there are several more commands to perform the listed functions. Data transfer commands also include information exchange commands (their designation is based on the word Exchange ). It may be possible to provide for the exchange of information between internal registers, between two halves of one register ( SWAP ) or between a register and a memory location.

2 Arithmetic commands. Arithmetic instructions treat operand codes as numeric binary or BCD codes. These commands can be divided into five main groups:

commands for operations with a fixed point (addition, subtraction, multiplication, division);

floating point instructions (addition, subtraction, multiplication, division);

cleanup commands;

increment and decrement commands;

comparison command.

3 Fixed-point instructions operate on codes in processor registers or in memory as they would with normal binary codes. Floating point (point) instructions use a number representation format with an exponent and a mantissa (usually these numbers occupy two consecutive memory locations). In modern powerful processors, the floating point instruction set is not limited to only four arithmetic operations, but also contains many other more complex instructions, for example, the calculation of trigonometric functions, logarithmic functions, and also complex functions necessary for sound and image processing.

4 Clear commands are designed to write a zero code to a register or memory cell. These commands can be replaced by zero-code transfer instructions, but special clear instructions are usually faster than transfer instructions.

5 Increment (increase by one) and decrement commands

(reductions by one) are also very convenient. They could in principle be replaced by add-one or subtract-one instructions, but increment and decrement are faster than add and subtract. These instructions require one input operand that is also an output operand.

6 Compare instruction is for comparing two input operands. In fact, it calculates the difference of these two operands, but does not form the output operand, but only changes the bits in the processor status register based on the result of this subtraction. The instruction following the compare instruction (usually a jump instruction) will parse the bits in the processor's status register and perform actions based on their values. Some processors provide instructions for chain-comparing two sequences of operands in memory.

7 Logic commands. Logic instructions perform logical (bitwise) operations on operands, that is, they consider the operand codes not as a single number, but as a set of individual bits. In this they differ from arithmetic commands. Logic commands perform the following basic operations:

logical AND, logical OR, modulo 2 addition (XOR);

logical, arithmetic and cyclic shifts;

checking bits and operands;

setting and clearing bits (flags) of the processor status register ( PSW).

Logic instructions allow bit-by-bit computation of basic logic functions from two input operands. In addition, the AND operation is used to force the clearing of the specified bits (as one of the operands, this uses the mask code, in which the bits that require clearing are set to zero). The OR operation is used to force the setting of the specified bits (as one of the operands, the mask code is used in which the bits that require setting to one are equal to one). The XOR operation is used to invert the given bits (as one of the operands, the mask code is used in which the bits to be inverted are set to one). Instructions require two input operands and form one output operand.

8 The shift commands allow you to shift the operand code bit by bit to the right (towards the lower bits) or to the left (towards the higher bits). The type of shift (logical, arithmetic, or cyclic) determines what the new value of the most significant bit (when shifting right) or least significant bit (when shifting left) will be, and also determines whether the old value of the most significant bit will be stored somewhere (when shifting left) or least significant bit (when shifted to the right). Rotary shifts allow you to shift the bits of the operand code in a circle (clockwise when shifting to the right or counterclockwise when shifting left). In this case, the shift ring may or may not include the carry flag. The carry flag bit (if used) is set to the most significant bit for left-rotation and the least significant bit for right-rotation. Accordingly, the value of the carry flag bit will be rewritten to the least significant bit on a left cyclic shift and to the most significant bit on a right cyclic shift.

9 Jump commands. Jump commands are designed to organize all kinds of loops, branches, subroutine calls, etc., that is, they disrupt the sequential flow of the program. These instructions write a new value to the instruction counter register and thereby cause the processor to jump not to the next instruction in order, but to any other instruction in the program memory. Some jump commands allow you to go back to the point from which the jump was made, while others do not. If a return is provided, then the current processor parameters are stored on the stack. If no return is provided, then the current processor parameters are not saved.

Jump commands without backtracking are divided into two groups:

commands of unconditional jumps;

conditional jump instructions.

These commands use the words Branch (branch) and Jump (jump).

Unconditional jump instructions cause a jump to a new address no matter what. They can cause a jump to the specified offset value (forward or backward) or to the specified memory address. The offset value or new address value is specified as the input operand.

Conditional jump commands do not always cause a jump, but only when the specified conditions are met. Such conditions are usually the values of the flags in the processor status register ( PSW ). That is, the transition condition is the result of the previous operation that changes the values of the flags. In total, there can be from 4 to 16 such jump conditions. Some examples of conditional jump commands:

jump if equal to zero;

jump if non-zero;

jump if there is an overflow;

jump if there is no overflow;

jump if greater than zero;

jump if less than or equal to zero.

If the transition condition is met, then a new value is loaded into the instruction counter register. If the jump condition is not met, the instruction counter is simply incremented, and the processor selects and executes the next instruction in sequence.

Specifically for checking branch conditions, a comparison instruction (CMP) is used that precedes a conditional jump instruction (or even several conditional jump instructions). But flags can be set by any other command, such as a data transfer command, any arithmetic or logic command. Note that the jump commands themselves do not change the flags, which just allows you to put several jump commands one after the other.

Interrupt commands occupy a special place among the jump commands with a return. These instructions require an interrupt number (vector address) as input operand.

Conclusion:

Assembly language is a symbolic representation of machine language. The assembly language for each type of computer is different. An assembly language program is a collection of blocks of memory called memory segments. Each segment contains a collection of language sentences, each of which occupies a separate line of program code. Assembly statements are of four types: commands or instructions, macros, directives, comment lines.

Valid characters when writing the text of programs are all Latin letters: A-Z,a-z. In this case, uppercase and lowercase letters are considered equivalent; figures from 0 before 9 ; signs ? , @ , $ , _ , & ; separators , . () < > { } + / * % ! " " ? = # ^ .

The following types of assembler statements and syntax rules for the formation of assembler expressions apply. arithmetic operators, shift operators, comparison operators, logical operators, index operator, type redefinition operator, segment redefinition operator, structure type naming operator, expression address segment component obtaining operator, expression offset obtaining operator.

The command system is divided into 8 main groups.

Control questions:

1 What is assembly language?

2 What symbols can be used to write commands in assembler?

3 What are labels and what is their purpose?

4 Explain the structure of assembly instructions.

5 List 4 types of assembler statements.