Story

Command structure in assembly language programming at the level. General characteristics of the Assembler language instruction system for IBM-PC (basic instruction set, main operand addressing methods)

Structures in assembly language

The arrays we have considered above are a collection of elements of the same type. But often in applications there is a need to consider a certain set of data different type as some single type.

This is very relevant, for example, for database programs, where it is necessary to associate a collection of data of different types with one object.

For example, earlier we looked at Listing 4, which worked with an array of three-byte elements. Each element, in turn, consisted of two elements of different types: a one-byte counter field and a two-byte field that could carry some other information needed for storage and processing. If the reader is familiar with one of the high-level languages, then he knows that such an object is usually described using a special data type - structures.

In order to improve the usability of assembly language, this data type was also introduced into it.

By definition structure is a data type consisting of a fixed number of elements of different types.

To use structures in a program, you need to do three things:

Ask structure template .

In essence, this means defining a new data type, which can later be used to define variables of this type.

Define structure instance .

This stage involves the initialization of a specific variable with a predefined (using a template) structure.

Organize accessing structure members .

It is very important that you understand from the very beginning what is the difference between description structures in the program and its definition.

describe structure in a program means only to indicate its scheme or pattern; memory is not allocated.

This template can only be considered as information for the translator about the location of the fields and their default value.

Define structure means to instruct the translator to allocate memory and assign a symbolic name to this memory area.

You can describe the structure in the program only once, and define it any number of times.

Structure template description

Structure template declaration has the following syntax:

structure_name STRUC

structure_name ENDS

Here is a sequence of data description directives db, dw, dd, dq and dt.

Their operands determine the size of the fields and optionally initial values. These values will possibly initialize the corresponding fields when the structure is defined.

As we already noted when describing the template, no memory is allocated, since this is just information for the translator.

Location template in the program can be arbitrary, but, following the logic of the one-pass translator, it must be located before the place where the variable with the type of this structure is defined. That is, when describing a variable with the type of some structure in a data segment, its template must be placed at the beginning of the data segment or before it.

Consider working with structures using the example of modeling a database of employees of a certain department.

For simplicity, in order to get away from the problems of information conversion during input, we will agree that all fields are symbolic.

Let's define the record structure of this database with the following pattern:

Defining data with a structure type

To use the structure described with the help of the template in the program, it is necessary to define a variable with the type of this structure. The following syntax is used for this:

[variable name] structure_name

variable name- variable identifier of the given structural type.

Specifying a variable name is optional. If it is not specified, a memory area with the size of the sum of the lengths of all elements of the structure will be simply allocated.

list of values- a comma-separated list of initial values of structure elements enclosed in angle brackets.

His task is also optional.

If the list is incomplete, then all fields of the structure for the given variable are initialized with values from the template, if any.

It is allowed to initialize individual fields, but in this case the missing fields must be separated by commas. Missing fields will be initialized with values from the struct template. If, when defining a new variable with the type of this structure, we agree with all the field values in its template (that is, set by default), then you just need to write angle brackets.

For example: victor worker.

For example, let's define several variables with the type of the structure described above.

Structure Methods

The idea of introducing a structural type into any programming language is to combine variables of different types into one object.

The language must provide a means of accessing these variables within a particular struct instance. In order to refer in a command to a field of some structure, a special operator is used - symbol ". " (dot). It is used in the following syntax:

address_expression- a variable identifier of some structural type or an expression in brackets in accordance with the syntax rules indicated below (Fig. 1);

structure_field_name- field name from structure template.

This, in fact, is also an address, or rather, the offset of the field from the beginning of the structure.

So the operator " . " (dot) evaluates the expression

Rice. 5. Syntax of an address expression in a structure field access operator

Let's demonstrate on the example of the structure we have defined worker some techniques for working with structures.

For example, extract to ax field values with age. Since it is unlikely that the age of an able-bodied person will be more than 99 years, after placing the contents of this character field in the register ax it will be convenient to convert it to binary representation with the command aad.

Be careful, because due to the principle of data storage “low byte at low address” the highest digit of the age will be placed in al, and the youngest in Ah.

To correct it, just use the command xchg al,ah:

mov ax,word ptr sotr1.age ;at al age sotr1

and it's possible like this:

Further work with an array of structures is carried out in the same way as with a one-dimensional array. Several questions arise here:

How to deal with the size and how to organize the indexing of array elements?

Like other identifiers defined in the program, the translator assigns the name of the structure type and the name of the variable with the structure type a type attribute. The value of this attribute is the size in bytes occupied by the fields of this structure. You can extract this value using the operator type.

Once the size of a structure instance has become known, organizing indexing in an array of structures is not particularly difficult.

For example:

How to copy a field from one structure to the corresponding field of another structure? Or how to copy the entire structure? Let's copy the field nam third employee in the field nam fifth employee:

mas_sotr worker 10 dup()

mov bx,offset mas_sotr

mov si,(type worker)*2 ;si=77*2

mov di,(type worker)*4 ;si=77*4

It seems to me that the craft of a programmer sooner or later makes a person look like a good housewife. He, like her, is constantly in search of where to save something, cut back and make a wonderful dinner out of a minimum of food. And if this succeeds, then the moral satisfaction is no less, and maybe more, than from a wonderful dinner at the housewife. The degree of this satisfaction, it seems to me, depends on the degree of love for one's profession.

On the other hand, progress in the development of software and hardware somewhat relaxes the programmer, and quite often there is a situation similar to the well-known proverb about the fly and the elephant - to solve some small problem, heavy-weight tools are involved, the effectiveness of which, in the general case, is significant only when implementation of relatively large projects.

The presence in the language of the following two types of data is probably due to the desire of the “hostess” to use the working area of the table (RAM) as efficiently as possible when preparing food or for placing products (program data).

In order for the machine to be able to execute human commands at the hardware level, it is necessary to set a certain sequence of actions in the language of “zeros and ones”. Assembler will become an assistant in this matter. This is a utility that works with the translation of commands into machine language. However, writing a program is a very time-consuming and complex process. This language is not intended to create light and simple actions. On the this moment any programming language you use (Assembler works great) allows you to write special, efficient tasks that greatly affect how the hardware works. The main purpose is to create micro-instructions and small codes. This language provides more features than, for example, Pascal or C.

Brief description of assembly languages

All programming languages are divided into levels: low and high. Any of the syntactic systems of the "family" of Assembler is different in that it combines at once some of the advantages of the most common and modern languages. They are also related to others by the fact that you can fully use the computer system.

A distinctive feature of the compiler is its ease of use. In this it differs from those that work only with high levels. If any such programming language is taken into account, Assembler functions twice as fast and better. To write in it light program won't take too long.

Briefly about the structure of the language

If we talk in general about the work and structure of the functioning of the language, we can say for sure that its commands are fully consistent with the commands of the processor. That is, the assembler uses mnemonic codes that are most convenient for a person to write.

Unlike other programming languages, Assembler uses specific labels instead of addresses to write memory cells. They are translated into the so-called directives with the code execution process. These are relative addresses that do not affect the operation of the processor (they are not translated into machine language), but are necessary for recognition by the programming environment itself.

Each processor line has its own. In this situation, any process will be correct, including the translated one.

Assembly language has several syntaxes, which will be discussed in the article.

Language pros

The most important and convenient adaptation of the assembly language will be that it can be used to write any program for the processor, which will be very compact. If the code is huge, then some processes are redirected to RAM. At the same time, they all perform quite quickly and without failures, unless, of course, they are controlled by a qualified programmer.

driver, OS, BIOS, compilers, interpreters, etc. are all assembly language programs.

When using a disassembler that translates from machine to machine, you can easily understand how a particular system task works, even if there are no explanations for it. However, this is only possible if the programs are light. Unfortunately, it is quite difficult to understand non-trivial codes.

Cons of the language

Unfortunately, it is difficult for novice programmers (and often professionals) to understand the language. Assembler requires detailed description the required command. Due to the fact that you need to use machine instructions, the probability of erroneous actions and the complexity of execution increases.

In order to write even the most a simple program, the programmer must be qualified, and his level of knowledge is high enough. The average specialist, unfortunately, often writes bad codes.

If the platform for which the program is being created is updated, then all commands must be rewritten manually - this is required by the language itself. The assembler does not support the function of automatic regulation of the health of processes and the replacement of any elements.

Language commands

As mentioned above, each processor has its own set of instructions. The simplest elements that are recognized by any type are the following codes:

Using directives

Programming microcontrollers in the language (Assembler allows this and does an excellent job of functioning) of the lowest level in most cases ends successfully. It is best to use processors with a limited resource. For 32-bit technology given language fits great. You can often see directives in codes. What is this? And what is it used for?

To begin with, it is necessary to emphasize that directives are not translated into machine language. They govern how the compiler does work. Unlike commands, these parameters, having different functions, differ not due to different processors, but due to a different translator. The main directives include the following:

origin of name

What is the name of the language - "Assembler"? We are talking about a translator and a compiler, which encrypt the data. From English Assembler means nothing more than an assembler. The program was not compiled by hand, an automatic structure was used. Moreover, at the moment, users and specialists have already erased the difference between the terms. Often assembler is called programming languages, although it is just a utility.

Because of the generally accepted collective name, some people have the erroneous assumption that there is a single low-level language (or standard norms for it). In order for the programmer to understand what kind of structure we are talking about, it is necessary to clarify for which platform this or that assembly language is used.

macro tools

Assembler languages, which are relatively recent, have macro facilities. They make it easier to both write and run a program. Due to their presence, the translator executes the written code many times faster. When creating a conditional choice, you can write a huge block of commands, but it's easier to use macros. They will allow you to quickly switch between actions, in case of a condition being met or not being met.

When using macro language directives, the programmer receives Assembler macros. Sometimes it can be widely used, and sometimes its functionality is reduced to a single command. Their presence in the code makes it easier to work with it, makes it more understandable and visual. However, you should still be careful - in some cases, macros, on the contrary, worsen the situation.

By purpose, commands can be distinguished (examples of mnemonic opcodes of commands of a PC assembler such as IBM PC are given in brackets):

l execution arithmetic operations(ADD and ADC - additions and additions with carry, SUB and SBB - subtractions and subtractions with borrowing, MUL and IMUL - unsigned and signed multiplications, DIV and IDIV - unsigned and signed divisions, CMP - comparisons, etc. .);

l performing logical operations (OR, AND, NOT, XOR, TEST, etc.);

l data transfer (MOV - send, XCHG - exchange, IN - enter into the microprocessor, OUT - withdraw from the microprocessor, etc.);

l control transfer (program branches: JMP - unconditional branch, CALL - procedure call, RET - return from procedure, J * - conditional branch, LOOP - loop control, etc.);

l processing character strings (MOVS - transfers, CMPS - comparisons, LODS - downloads, SCAS - scans. These commands are usually used with a prefix (repetition modifier) REP;

l program interrupts (INT - software interrupts, INTO - conditional interrupts on overflow, IRET - return from interrupt);

l microprocessor control (ST* and CL* - set and clear flags, HLT - stop, WAIT - standby, NOP - idle, etc.).

WITH complete list assembler commands can be found in the works.

Data transfer commands

l MOV dst, src - data transfer (move - move from src to dst).

Transfers: one byte (if src and dst are in byte format) or one word (if src and dst are in word format) between registers or between register and memory, and writes an immediate value to a register or memory.

The operands dst and src must have the same format - byte or word.

Src can be of type: r (register) - register, m (memory) - memory, i (impedance) - immediate value. Dst can be of type r, m. Operands cannot be used in one command: rsegm together with i; two operands of type m and two operands of type rsegm). Operand i can be simple expression:

mov AX, (152 + 101B) / 15

Expression evaluation is performed only during translation. Flags do not change.

l PUSH src - putting a word on the stack (push - push through; push to the stack from src). Pushes the contents of src onto the top of the stack - any 16-bit register (including segment) or two memory locations containing a 16-bit word. The flags do not change;

l POP dst - extracting a word from the stack (pop - pop; count from the stack in dst). Removes a word from the top of the stack and places it in dst - any 16-bit register (including segment) or two memory locations. Flags do not change.

Assembly language commands (Lecture)

LECTURE PLAN

1. Main groups of operations.

Pentium.

1. Main groups of operations

Microprocessors execute a set of instructions that implement the following main groups of operations:

forwarding operations,

arithmetic operations,

logical operations,

shift operations,

comparison and test operations,

bit operations,

Program management operations;

Processor control operations.

2. Mnemocodes of processor commands Pentium

When describing commands, their mnemonic designations (mnemonic codes) are usually used, which serve to specify the command when programming in Assembly language. For various versions Assembler mnemonics for some commands may differ. For example, for a command to call a subroutine, the mnemonic code is usedCALL or JSR (“Jump to subroutine”). However, the mnemonic codes of most commands for the main types of microprocessors are the same or differ slightly, since they are abbreviations of the corresponding English words that define the operation being performed. Consider command mnemonics adopted for processors Pentium.

Forward commands. The main command of this group is the commandMOV , which provides data transfer between two registers or between a register and a memory cell. Some microprocessors implement a transfer between two memory cells, as well as a group transfer of the contents of several registers from memory. For example, microprocessors of the 68 family Motorola xxx execute the commandMOVE , which provides transfer from one memory cell to another, and the commandMOVEM , which writes to memory or loads from memory the contents of a given set of registers (up to 16 registers). TeamXCHG performs a mutual exchange of the contents of two processor registers or a register and a memory cell.

Input commands IN and output OUT implement the transfer of data from the processor register to an external device or the receipt of data from an external device to the register. These commands specify the number of the interface device (I/O port) through which data is being transferred. Note that many microprocessors do not have special instructions for accessing external devices. In this case, the input and output of data in the system is performed using the commandMOV , which specifies the address of the required interface device. Thus, an external device is addressed as a memory cell, and a specific section is allocated in the address space, in which the addresses of interface devices (ports) connected to the system are located.

Commands for arithmetic operations. The main commands in this group are addition, subtraction, multiplication and division, which have a number of options. Addition commands ADD and subtraction SUB perform appropriate operations withcpossessing two registers, a register and a memory location, or using an immediate operand. Teams AD C , SB B perform addition and subtraction, taking into account the value of the attributeC, set during the formation of the transfer in the process of performing the previous operation. With the help of these commands, the sequential addition of operands is implemented, the number of digits of which exceeds the processor capacity. Team NEG changes the sign of the operand, converting it to two's complement.

Multiplication and division operations can be performed on signed numbers (commandsI MUL, I DIV ) or unsigned (commands MUL, DIV ). The result of the operation is located in the register. When multiplying (commandsMUL , IMUL ) results in a double-digit result, which uses two registers to accommodate. When dividing (commandsDIV , IDIV ) as a dividend, an operand of doubled capacity is used, placed in two registers, and as a result, the quotient and the remainder are written to two registers.

Logic Commands . Almost all microprocessors perform logical operations AND, OR, Exclusive OR, which are performed on the operand bits of the same name using commands AND, OR, X OR . Operations are performed on the contents of two registers, a register and a memory location, or using an immediate operand. Team NOT Inverts the value of each bit of the operand.

Shift Commands. Microprocessors carry out arithmetic, logical and cyclic shifts of the addressed operands by one or more bits. The operand to be shifted can be in a register or memory location, and the number of shift bits is specified using the immediate operand contained in the instruction, or determined by the contents of the specified register. The transfer sign is usually involved in the implementation of the shiftCin the status register (SR or EFLAGS), which contains the last bit of the operand that is pulled out of the register or memory location.

Comparison and testing commands . Operand comparison is usually done with the commandCMP , which performs the subtraction of operands with setting the values of the features N, Z, V, C in the status register according to the result. In this case, the result of the subtraction is not saved, and the values of the operands do not change. The subsequent analysis of the obtained characteristic values allows determining the relative value (>,<, =) операндов со знаком или без знака. Использование различных способов адресации позволяет производит сравнение содержимого двух регистров, регистра и ячейки памяти, непосредственно заданного операнда с содержимым регистра или ячейки памяти.

Some microprocessors execute a test command TST , which is a single operand variant of the compare instruction. When this command is executed, the signs are set N, Z according to the sign and value (equal or non-zero) of the addressed operand.

Bit operation instructions . These commands set the value of the attributeCin the status register according to the value of the bit being testedbn in the addressed operand. In some microprocessors, according to the result of testing a bit, a sign is setZ. Test bit numbernis set either by the contents of the register specified in the command, or by an immediate operand.

The commands of this group implement different options for changing the tested bit. Command BT keeps the value of this bit unchanged.Command B T S after testing sets value bn=1, and the command B T C - meaning bn=0.Command B T C inverts the value of bit bn after testing it.

Program management operations. To control the program, a large number of commands are used, among which are:

- unconditional control transfer commands;

- conditional jump commands;

- commands for organizing program cycles;

- interrupt commands;

- feature change commands.

Unconditional transfer of control is performed by the commandJMP , which loads into the program counterPCnew content that is the address of the next command to be executed. This address is either directly specified in the commandJMP (direct address), or calculated as the sum of the current contentPCand the offset specified in the command, which is a signed number (relative addressing). BecausePCcontains the address of the next command of the program, then the last method sets the address of the transition, offset relative to the next address by a given number of bytes. If the offset is positive, the transition to the next commands of the program is performed, if the offset is negative, to the previous ones.

The subroutine is also called by unconditional transfer of control using the commandCALL (or JSR ). However, in this case, before loading intoPC new content that specifies the address of the first instruction of the subroutine, it is necessary to save its current value (the address of the next instruction) in order to ensure a return to the main program after the execution of the subroutine (or to the previous subroutine when nesting subroutines). Conditional jump instructions (program branches) are loaded intoPCnew content if certain conditions are met, which are usually set according to the current value of various attributes in the status register. If the condition is not met, then the next program command is executed.

Trait management commands provide writing - reading the contents of the status register, which stores traits, as well as changing the values of individual traits. For example, Pentium processors implement commands LAHF and SAHF , which load the low byte, which contains the signs, from the status register EFLAG to low byte of register EAX and padding low byte EFLAGS from the register E AX.. Commands CLC, STC set the values of the transfer flag CF=0, CF=1, and the command CMC causes the value of this feature to be inverted. Since traits determine the flow of program execution during conditional jumps, trait change instructions are usually used to control the program.

Processor Control Commands . This group includes stop commands, no operation, and a number of commands that determine the mode of operation of the processor or its individual blocks. TeamHLT terminates program execution and puts the processor into a halt state, exiting from which occurs upon receipt of interrupt or restart signals ( reset). Team NOP An (“empty” command), which does not cause any operations to be performed, is used to implement program delays or fill gaps formed in the program.

Special teams CLI, STI disable and enable the service of interrupt requests. In processors Pentium a control bit (flag) is used for thisIF in register EFLAGS.

Many modern microprocessors issue an identification command that allows the user or other devices to obtain information about the type of processor used in a given system. In processors Pentuim this is what the command is for CPUID , during which the necessary data about the processor enters the registers EAX,ebx,ECX,EDX and can then be read by the user or the operating system.

Depending on the operating modes implemented by the processor and the specified types of processed data, the set of executable commands can be significantly expanded.

Some processors perform arithmetic operations on BCD numbers or perform special result correction instructions when processing such numbers. Many high performance processors include FPU - number processing unit c "floating point".

In a number of modern processors, group processing of several integers or numbers is implemented. c “floating point” with a single command according to the principle SIMD (“Single Instruction – Multiple Data ”) - “One command – Lots of data”. Simultaneous execution of operations on several operands significantly increases the performance of the processor when working with video and audio data. Such operations are widely used in image processing, audio signal processing, and other applications. To perform these operations, special blocks are introduced into the processors that implement the corresponding sets of instructions, which in various types of processors ( Pentium, Athlon) got the nameMMX (“ Milti-Media Extension ”) – Multimedia Extension,SSE(“ Streaming SIMD Extension ”) – Streaming SIMD - extension, “3 D – Extension” - 3D Expansion.

A characteristic feature of the company's processors Intel , starting with model 80286, is the priority control when accessing memory, which is provided when the processor is operating in the protected virtual address mode - “ Protected Mode ” (protected mode). To implement this mode, special groups of commands are used, which serve to organize memory protection in accordance with the accepted priority access algorithm.

Programming at the level of machine instructions is the minimum level at which programming is possible. The system of machine instructions must be sufficient to implement the required actions by issuing instructions to the computer hardware.

Each machine instruction consists of two parts:

operating room - determining "what to do";
operand - defining processing objects, “what to do with”.

The machine instruction of the microprocessor, written in assembly language, is a single line with the following syntactic form:

label command/directive operand(s) ;comments

In this case, a mandatory field in a line is a command or directive.

The label, command/directive, and operands (if any) are separated by at least one space or tab character.

If a command or directive needs to be continued on the next line, then the backslash character is used: \.

By default, assembly language does not distinguish between uppercase and lowercase letters in commands or directives.

Example lines of code:

Countdb 1 ;Name, directive, one operand
mov eax,0 ;Command, two operands
cbw ; Team

Teams

Team tells the translator what action the microprocessor should perform. In a data segment, a command (or directive) defines a field, workspace, or constant. In a code segment, an instruction defines an action, such as a move (mov) or an addition (add).

directives

The assembler has a number of operators that allow you to control the process of assembling and generating a listing. These operators are called directives . They act only in the process of assembling the program and, unlike instructions, do not generate machine codes.

operands

Operand – an object on which a machine command or a programming language operator is executed.
An instruction may have one or two operands, or no operands at all. The number of operands is implicitly specified by the instruction code.
Examples:

No operands ret ;Return
One operand inc ecx ;Increment ecx
Two operands add eax,12 ;Add 12 to eax

The label, command (directive), and operand do not have to start at any particular position in the string. However, it is recommended to write them in a column for greater readability of the program.

Operands can be

identifiers;
strings of characters enclosed in single or double quotes;
integers in binary, octal, decimal, or hexadecimal.

Identifiers

Identifiers – sequences of valid characters used to designate program objects such as operation codes, variable names, and label names.

Rules for writing identifiers.

The identifier can be one or more characters.
As characters, you can use letters of the Latin alphabet, numbers and some special characters: _, ?, $, @.
An identifier cannot start with a digit character.
The ID can be up to 255 characters long.
The translator accepts the first 32 characters of the identifier and ignores the rest.

Comments

Comments are separated from the executable line by a character; . In this case, everything that is written after the semicolon character and up to the end of the line is a comment. The use of comments in a program improves its clarity, especially where the purpose of a set of instructions is unclear. The comment can contain any printable characters, including spaces. The comment can span the entire line or follow the command on the same line.

Assembly program structure

A program written in assembly language may consist of several parts, called modules . Each module can define one or more data, stack, and code segments. Any complete assembly language program must include one main, or main, module from which its execution begins. A module may contain code, data, and stack segments declared with the appropriate directives. Before declaring segments, you must specify the memory model using the .MODEL directive.

An example of a "doing nothing" program in assembly language:

686P
.MODEL FLAT, STDCALL
.DATA
.CODE
START:

RET
END START

This program contains only one microprocessor instruction. This command is RET . It ensures the correct termination of the program. In general, this command is used to exit a procedure.
The rest of the program is related to the operation of the translator.
.686P - Pentium 6 (Pentium II) protected mode commands are allowed. This directive selects the supported assembler instruction set by specifying the processor model. The letter P at the end of the directive tells the translator that the processor is running in protected mode.
.MODEL FLAT, stdcall is a flat memory model. This memory model is used in the Windows operating system. stdcall
.DATA is a program segment containing data.
.CODE is a program block containing code.
START is a label. In assembler, labels play a big role, which cannot be said about modern high-level languages.
END START - the end of the program and a message to the translator that the program must be started from the label START .
Each module must contain an END directive that marks the end of the program's source code. All lines that follow the END directive are ignored. Omitting the END directive generates an error.
The label after the END directive tells the compiler the name of the main module from which program execution begins. If the program contains one module, the label after the END directive can be omitted.