Hardware and software setup

Sybase eaServer delivery options. Database management systems and information storage support (IBM DB2)

The IBM DB2 database management system began its development in the distant 70s and now occupies a strong position in the corporate DBMS market, meeting high requirements for performance, reliability, security and scalability.

Igor Bulatenko, Information Security Specialist, Positive Technologies

The IBM DB2 database management system began its development in the distant 70s and now occupies a strong position in the corporate DBMS market, meeting high requirements for performance, reliability, security and scalability. In the private sector, DB2 has not been widely adopted, despite the availability of a free version of IBM DB2 Express. Perhaps this is the reason why there aren't many articles on the Internet about configuring and using DB2.

The DB2 security model has a wide range of functionality and allows you to protect data both from external influences and to differentiate access rights for internal users using the DBMS itself.

However, it is difficult for an unprepared user to understand all this diversity from scratch, so some important aspects will be discussed in this article.

Point of entry

The entry point to DB2 looks like this: DBMS -> instance (instance) that can be bound to a specific port -> specific database name. Security settings can be changed both in a specific instance and in a specific database.

Authentication

Authentication is the primary security mechanism that is applied when you try to connect to a DB2 server. Authentication checks if the credentials provided are correct. Main Feature in DB2 is that user authentication is done by external plugins only. Internal users, unlike Oracle or MS SQL Server, do not exist here. Even the user creation feature found in IBM Data Studio does not actually create a user, but assigns the database connection privilege to the specified user.

There are several authentication options, the desired option is controlled by the AUTHENTICATION parameter in the database manager. The value of this parameter affects where clients will be authenticated (on the server side or on the client side) and whether data will be transmitted in encrypted form (values ​​ending with _ENCRYPT). Supported values ​​for this parameter are available at:

You can view the database manager configuration by querying the sysibmadm.dbmcfg table, but this requires access to any of the databases, which is not always possible. If you have local access to the server, then you can open the processor command line(db2 or db2.exe on Windows), connect to the instance and run the following commands:

db2 => attach to db2inst1
db2 => get database manager configuration

The default value for the AUTHENTICATION parameter is SERVER. Verification of the correctness of the provided user credentials is performed on the server side by means of operating system, but all data is transmitted in clear text and can be intercepted by an attacker.

Let's see how the captured information looks like in Wireshark.


The login and password transmitted from the client are visible in the package when viewing EBCDIC.

When changing the authentication type to SERVER_ENCRYPT, the login and password will be transmitted in encrypted form and checked on the server side.

The value changes as follows:

db2 => attach to db2inst1
db2 => update database manager configuration using authentication server_encrypt
db2 => db2stop force
db2 => db2start

The authentication package will then look like this:


However, the text of requests and the result will still be transmitted in the clear.

Wireshark request packet:


Packet with response in Wireshark:


If the AUTHENTICATION parameter is set to DATA_ENCRYPT, then the user's credentials are encrypted, as well as information transmitted between the client and the server.

The value is changed in the same way as the example above:

db2 => attach to db2inst1
db2 => update database manager configuration using authentication data_encrypt
db2 => db2stop force
db2 => db2start

After that, the transmitted data will also be encrypted:


Also, you need to pay attention to the CLIENT authentication type. With this type of authentication, it is considered that there is a secure communication channel between the client and the server, and if the user has gained access to the client, he can access the server without checking the correctness of the credentials. That is, authentication as such occurs on the client side, verification is not performed on the server side. Even if the user who connects to the server does not have access rights, he still receives all the privileges that are assigned to the PUBLIC group. Therefore, you should not use this type of authentication, it will provide attackers with the opportunity to gain access to the server without much effort.

If suddenly, for some reason, this type of authentication is needed, then you need to take into account that there are two more additional parameters that ultimately affect how the user's credentials will be verified. These are the trust_allclnts parameter, with which you can specify which clients are considered trusted, and the trust_clntauth parameter, which determines where to check the login and password, if they were passed during the connection. Both of these options affect authentication only if the AUTHENTICATION option is set to CLIENT.

If successful, the user ID is mapped to a DB2 ID. Typically, the ID is the same as the username, but it uses uppercase characters.

Authorization

The authorization process checks whether the user has the necessary rights for the actions requested by him. There are authorities (authorities) of an instance of a DBMS and a database.

Permissions at the level of a particular instance are written in the configuration of the database manager. These are the following powers:

  • SYSADM (system administrator authority);
  • SYSCTRL (permission to control the system);
  • SYSMAINT (permission to maintain the system);
  • SYSMON (permission to monitor the system).

These privileges are set by specifying the group that the user will belong to. For this, the following parameters of the dbmcfg file are used (according to the above permissions):

It is not easy to get a list of users who are members of a group using DB2 tools, you need to do it in the operating system itself or analyze which groups a particular user belongs to (see the "useful queries" tab for a query).

When configuring DB2, be sure to check the list of users that have been assigned the SYSADM authority. This permission allows you to manage all database objects.

The permissions of a particular database can be viewed in the SYSCAT.DBAUTH view. Pay attention to the CONNECTAUTH privilege, which determines whether the user will have access to the database or not, and the NOFENCEAUTH privilege, which is responsible for creating non-fenced (not fenced) procedures and functions. Such procedures are performed in the database address space and, in the event of an error, may violate the integrity of the database and tables in it.

Privilege

Privileges in DB2 can be granted on various objects. Table access privileges can be viewed in the SYSCAT.TABAUTH view. Data about the type of granted privilege is stored in separate columns, depending on the privilege itself (SELECTAUTH, DELETEAUTH, etc.). When granting a privilege using the GRANT command for the REFERENCES and UPDATE privileges, you can also specify the names of the columns to which the given privileges will be extended. Information about this can be viewed in the SYSCAT.COLAUTH view

The privileges of routines (functions, procedures, and methods) can be viewed in the SYSCAT.ROUTINEAUTH view. Everything here is not entirely trivial, depending on the SPECIFICNAME and TYPENAME fields, privileges can be issued to all subroutines of a given scheme.

Users, groups, roles

All database powers and various privileges can be granted to users, groups or roles. The existence of users, groups, and the membership of users in groups is regulated outside of the database itself. In this regard, it is desirable to take into account certain recommendations and know some subtleties when issuing powers and privileges. It is not recommended to grant database privileges and authority, in particular the ability to connect to the database (CONNECTAUTH), to groups. You should grant privileges to specific users or roles that need it. Support for roles has been available in DB2 since version 9.5. Role membership is managed within the database itself.

Also, DB2 has a built-in PUBLIC role. A database user does not need to be granted the PUBLIC role: it is not possible to revoke the PUBLIC role from a user. When a privilege is granted to the PUBLIC role, the privilege is actually granted to all database users. You should not grant any database authority to the PUBLIC role. Privileges on tables and views should be granted with extreme caution, only for viewing and without the possibility of reassignment (WITH GRANT OPTION).

Due to the peculiarity of authentication when granting privileges, the existence of a user or group in the system is not checked. As a result, authentication users may appear in the system without being tied to real users of the system. You can find such users using the following SQL query:

SELECT authid FROM sysibmadm.authorizationids WHERE authidtype = "U" AND authid NOT IN (SELECT username FROM TABLE(sysfun.USERS()) AS W)

To search for such groups, a similar query is used, but the query indicates that data about PUBLIC should not be displayed:

SELECT authid FROM sysibmadm.authorizationids WHERE authidtype = "G" AND authid NOT IN (SELECT groupname FROM TABLE(sysfun.groups()) AS W) AND authid !="PUBLIC"

LBAC

DB2 has a powerful mechanism for restricting access to data in tables based on labels (Label-based access control). The mechanism allows you to set security labels on specific rows or columns in such a way that a user who does not have access to protected data will not even know about their existence. It makes no sense to go into detail about the methods of implementing LBAC, since the manufacturer has a tutorial on this topic:

Automatic scanning tools

When configuring the security of an IBM DB2 server, it is important to use some kind of security scanner (eg NGS SQuirreL for DB2, MaxPatrol, etc.). Scanners will clearly indicate the vulnerabilities of the settings that you might have missed, or display information in a form convenient for analysis:

Useful queries and commands

Get database manager settings:

select name, value from sysibmadm.dbmcfg

or

db2 => getdbmcfg

Change database manager option:

db2 => update database manager configuration using

After that, you need to restart the instance:

db2 => db2 stopforce
db2 => db2 start

Get database settings:

select name, value from sysibmadm.dbcfg

or

db2 => get db cfg for

List of operating system users:

select username from table(sysfun.USERS()) AS t

List of operating system groups:

select groupname from table(sysfun.GROUPS()) AS t select AUTHID, AUTHIDTYPE from sysibmadm.AUTHORIZATIONIDS

Display the current database name:

select current server from sysibm.sysdummy1

Enter current username:

selectuserfromsysibm. sysdummy1

Get a list of groups the user belongs to:

select GROUPNAME from table(sysfun.groups_for_user(" ")) as t

List of all installed DBMS:

$ db2ls

List of all instances in the DBMS:

$ db2 ilist

Limit the number of output lines:

select * from tabname fetch first 5 rows only

The IBM DB2 DBMS is the result of nearly 30 years of research and development work by IBM. The latest version of this DBMS (6.x) features one of the most thoughtful sets of management and optimization tools and a database engine that can grow from a Windows 95 laptop to an entire cluster of S/390 mainframes running OS/390.

The DB2 package is available in two editions: DB2 Workgroup and DB2 Enterprise Edition. This DBMS implements all the innovative database engine technologies known from previous versions of DB2, such as parallel query processing, a full set of replication tools, query summary tables to improve database performance, object-oriented database design features, and Java language features. In addition, the DB2 system is equipped with a complete set of multimedia extensions that allow you to save and manipulate text, sound and video fragments, images and geographic data. We can say that in terms of scalability, the database clustering technology developed by IBM specialists has no analogues. These extensions greatly facilitate the process of developing applications for the Web, as well as programs containing photographic images and voluminous text reports. The DB2 system is also quite competitive as an application development platform, as there is a Stored Procedure Builder tool that automatically converts the SQL statement into the appropriate Java class and includes it in the database structure. In DB2 6.1, interoperability with other DBMSs has been greatly improved by enabling the use of Microsoft's OLE DB specification, a new database access standard. DB2 administrative controls that are new version rewritten in Java and can be obtained from the Web deserve the highest praise.

The main disadvantages of this DBMS are the relative complexity of administration and the lack (yet) of implementations for popular server operating systems, such as LINUX.

In this DBMS, thanks to the Index Smart-Guide, it is possible to carry out tuning, forming optimal indexes for a given number of accesses, which characterizes the typical load on the database. DB2 is the only package that allows you to generate pivot tables, which significantly improves the efficiency of the DBMS as data warehouses. A PivotTable is a temporary workspace used by the database to store answers to frequently asked queries. Well, it can be said that with new functionality, as well as parallelization and the ability to select almost any type of join and index (except perhaps raster indexes), the DB2 6.1 model turns into the most inexpensive of high-performance systems. The administrative tools of this DBMS are quite appropriate for the level of tasks being solved, in addition, it provides exceptionally wide opportunities for working with multimedia data and for programming (which is clearly lacking in Microsoft SQL Server).

DBMS from Informix.

Recently, there has been a transition from relational DBMS to object-oriented ones (which is clearly seen in the example of Oracle). Informix also following this concept announced a new Centaur DBMS solution based on the Informix Dynamic Server 7.3 relational database and the Informix Universal Data Option object-relational database and combining the high performance of Dynamic Server when working with data with universality and multimedia functions of the Universal Data Option. This implementation is intended for the development of Internet systems. It is expected that this DBMS will have a flexible development environment with scalability to match the intensive workloads characteristic of the Internet, and tools for working with new types of data that have become ubiquitous with the development of the Web. The Java features implemented in the new system will allow developers to create stored procedures, user programs, and DataBlades components in this language, which Informix calls custom database extensions.

From the point of view of Inforix customers, this is a big step forward, because until now, when working with DataBlades, they could only use C and SPL, Informix's internal language for writing stored procedures. In addition, the Centaur package will be equipped with built-in ActiveX object handling. This will make it possible, for example, to create database stored procedures in the Visual Basic language; however, this requires the Centaur package to run in a Windows NT environment.

Centaur will be an add-on to Informix Dynamic Server and work with the traditional database format for this package, so that users will have all the old functions at their disposal, and upgrading the system to the new version will not be very difficult. In addition, the Centaur package will retain all of the design and programming capabilities that made the Informix Universal Server system an outstanding engineering achievement. The new system will be equipped with facilities for object-oriented database design, creation of specialized tables and indexing programs; it will allow users to embed their own functions in queries and not rely solely on standard SQL tools.

Conclusions.

Having considered the main characteristics of architectures for building AIS, server operating systems and DBMS, in the future, as the AIS architecture, we will choose the Internet / Intranet architecture, as the Linux server OS, as the Oracle 8i DBMS. The summary table presents the comparative characteristics of the two most common solutions for today on Microsoft base SQL Server 7.0 (on NT) and Oracle8i (on Unix, Linux).

Microsoft SQL Server 7.0

Administrative management

Graphic tools

Ease of maintenance

Data engine

Working with multiple CPUs

Acceptable

Join function and index selection

Simultaneous access by multiple users

Multimedia data processing

Connecting to the Web

Audio, video, image processing

Search in this text

Interoperability

Acceptable

Interfacing with other databases

single sign-on

Work under various operating systems

Acceptable

Programming options

Acceptable

Stored procedures and triggers

Internal programming language

Database building

Object Oriented Systems

Working with branches

Replication

Distributed transaction processing

Remote administration

Organization of data warehouses and preparation of reports

Loading Tools

Analysis tools

DB2(in Russian it is pronounced “dibi two”, tracing paper from English “dibi tu” is also common) - family software products in Information Management at IBM.

Most often, when referring to DB2, they mean the relational database management system DB2 Universal Database (DB2 UDB), developed and released by IBM.

The spelling "DB/2" is sometimes seen, but this spelling is incorrect: in the IBM notation, the number in the denominator of a fraction means the platform and "/2" means the product for the OS/2 operating system (or the PS/2 series of computers). For example, the version of DB2 for OS/2 was designated "DB2/2".

Implementations

The DB2 DBMS is currently available on the following platforms:

  • DB2 for Linux, UNIX and Windows v9 for AIX, HP-UX, Linux, Solaris, Windows platforms and beta for Mac OS X platform
  • DB2 for z/OS v9 for z/OS and OS/390 platforms
  • DB2 Server for VSE & VM v7 for z/VM and z/VSE platforms
  • DB2 for i for the IBM i platform (integrated into the system at the hardware and software level)

In the past, versions of the DB2 database server for OS/2, UnixWare, PTX have been released.

DB2 DBMS clients, in addition to the platforms listed, are or were released in various versions also for SINIX, IRIX, classic Mac OS and for MS-DOS, as well as in mobile version DB2 Everyplace for Windows CE, Palm OS, Symbian OS, Neutrino and the Java Virtual Machine.

Currently, in addition to the commercial products of the family, IBM also distributes a free distribution DB2 Express-C for Linux (x86, x86-64, POWER), Windows (x86, x86-64), Solaris (x86-64), Mac OS X (x86-64 beta) platforms. Free version has restrictions on the use of no more than one dual-core processor and 2 GB for the operation of the DBMS random access memory(the total number of processors and memory in the system can be any, but resources beyond the specified limits will not be used by the DBMS).

History

DB2 has a long history and is considered by some to be the first DBMS to use SQL.

From 1975 to 1982, the DB2 prototype was developed at IBM under the name System Relational, or System R. The SQL language was first implemented in IBM System R, but this system was of an exploratory nature, and commercial product, which includes SQL, was first released by Oracle in 1979.

DB2 got its name in 1982 with the first commercial release for SQL/DS, and then for MVS called DB2. For a long time, along with "DB2", the "Database 2" variant, also a trademark of IBM, was used. Apparently, this was meant to be the second flagship IBM DBMS after the old hierarchical IMS DBMS.

The development of DB2 goes back to the early 1970s, when Dr. E. F. Codd, who worked for IBM, developed the theory of relational databases and published a data manipulation model in June 1970. To implement this model, he developed a language relational databases data and named it Alpha. IBM chose to outsource further development to a group of programmers outside of Dr. Codd's control. Violating some principles relational model, they implemented it as "Structured English Query Language", SEQUEL for short. Since SEQUEL was already a registered trademark, the name was shortened to SQL - "Structured Query Language" and has remained so to this day.

Thus, historically, DB2 evolved from DB2 for MVS (of which DB2 for z/OS is a descendant) and its sister SQL/DS for VM (of which DB2 Server for VSE & VM is a descendant). Subsequently, another development team at IBM implemented the OS/2 EE Database Manager server, which later evolved into DB2 v2 for OS/2, AIX and then Windows, and then into DB2 UDB (its descendant is DB2 for Linux, UNIX and Windows). Another team completed the integration of the DB2 architecture with the embedded AS/400 database (a descendant - DB2 for i). IBM is gradually moving towards the integration of all these branches.

Peculiarities

The distinguishing features of DB2 include a dialect of the SQL language, which defines, with rare exceptions, a purely declarative meaning of language constructs, and a powerful multi-phase optimizer that builds an effective query execution plan from these declarative constructs. Unlike other SQL dialects, the DB2 SQL dialect has virtually no optimizer hints, little (and for a long time) lack of development of the stored procedure language, and thus everything is aimed at maintaining a declarative style of writing queries. At the same time, the DB2 SQL language is computationally complete, that is, it potentially allows you to define any computable correspondences between the source data and the result in a declarative form. This is achieved, among other things, through the use of table expressions, recursion and other advanced data manipulation mechanisms.

Due to IBM's focus on relational development and the firm's position in the computer industry, the DB2 SQL dialect has a significant impact on the ANSI/ISO SQL standards.

Stored procedures in DB2 are not very widely used, and traditionally, conventional high-level programming languages ​​(C, Java, PL/I, Cobol, etc.) are used to write stored procedures, this allows the programmer to easily format the same code either as part of the application, or as a stored procedure, depending on whether it is more appropriate to execute it on the client or on the server. DB2 also currently implements the SQL procedural extension for stored procedures, in accordance with the ANSI SQL/PSM standard.

The DB2 optimizer makes extensive use of statistics on the distribution of data in tables (if the collection process was done by the database administrator), so the same SQL query can be translated into completely different execution plans depending on statistical characteristics the data it processes.

Since historically DB2 has evolved from multi-user systems on mainframes, a lot of attention in the DB2 architecture is given to issues of security and the distribution of roles of specialists maintaining DB2. In particular, unlike many other DBMSs, DB2 has separate roles for a database administrator (responsible for configuring DB2 software components and running them optimally on a computer system) and a database administrator (responsible for managing the data in a particular database).

The use, if necessary, of static SQL in programs and the concept of packages, unlike most other DBMS, allows the implementation of such a security model when the rights to perform certain operations can be granted to application programs in the absence of such rights for users working with these programs. This allows in this case to guarantee the impossibility of the user working with the database bypassing application program, if the user has only rights to run the program, but not to independently manipulate data.

As part of the concept of increasing the level of integration of security tools in a computer system, DB2 does not have its own means of authenticating users, integrating with operating system tools or specialized security servers. Within DB2, only users authenticated by the system are authorized.

DB2 is the only general-purpose relational DBMS that has implementations at the hardware/software level (IBM i system; DB2 support is also implemented on IBM System z mainframe hardware).

Modern versions of DB2 provide enhanced support for using XML data, including operations on individual elements of XML documents.

Error processing

A useful feature of DB2 SQL Server is the ability to handle errors. The SQLCA structure is used for this purpose. SQL Communications Area- SQL link area) that returns error information to the application program after each execution of the SQL statement.

SQLCODE Structure Fields and Their Values

The main, but not always useful error diagnostics is contained in the field SQLCODE(data type - integer) inside the SQLCA block. It can take the following values:

  • 0 means success.
  • A positive number means success with one or more warnings. For example, +100 means no columns were found.
  • A negative number means failure with an error. For example, −911 means a detected expired lock wait interval (or deadlock) triggering a sequential rollback.

SQLERRM(data type - a string of 71 characters). Contains a text string describing the error if the SQLCODE field is less than zero.

SQLERRD(data type - array, 6 integers). Describes the result of executing the last SQL statement:

  • 1 element - internal information;
  • 2nd element - contains the value of the SERIAL type field generated by the server for the INSERT statement, or additional code errors;
  • 3rd element - equal to the number of processed records;
  • 4th element - the approximate cost of executing this operator;
  • 5th element - offset of the error in the text record of the SQL statement;
  • 6th element - internal information.

Notes

Links

  • Program page on the IBM website
  • DB2 on developerWorks - DB2 articles and training
  • PlanetDB2 - DB2 Blogs

Literature

  • Date K. DB2 Relational DBMS Guide. - M.: Finance and statistics, 1988. - 320 p. - ISBN 5-279-00063-9
  • Zikopoulos P.K., Baklarz J., deRus D., Melnik R.B. DB2 Version 8: The Official Guide = DB2 Version 8: The Official Guide. - M.: KUDITS-OBRAZ, 2004. - 400 p. - ISBN 5-9579-0031-1
  • Smirnov S. N. Working with IBM DB2: Tutorial. - M.: Helios, 2001. - 304 p. - ISBN 5-85438-007-2 (recommended by the UMO of universities in the field of information security as a textbook in the specialties "Integrated information security of automated systems" and "Computer security")
  • Susan Visser, Bill Wong. Teach Yourself DB2 Universal Database in 21 Days = Sams Teach Yourself DB2 Universal Database in 21 Days. - 2nd ed. - M.: Williams, 2004. - 528 p. - ISBN 0-672-32582-9
  • Hook J., Harbus R., Snow D. The Universal Guide to DB2 for Windows NT®. - New Jersey: Prentice Hall PTR, 1999. - P. 504. - ISBN 0-13-099723-4

Wikimedia Foundation. 2010 .

See what "IBM DB2" is in other dictionaries:

    IBM DB2- Developer(s) IBM Initial release 1983 (1983) ... Wikipedia

    IBM DB2- DB2 ist ein kommerzielles relationales Datenbank Management System (RDBMS) der Firma IBM, dessen Ursprünge auf das System R und die Grundlagen von E. F. Codd vom IBM Research aus dem Jahr 1970 zurückgeht. Inhaltsverzeichnis 1 Eigenschaften 1.1… … Deutsch Wikipedia

    IBM DB2- Développeur IBM Dernière version ... Wikipedia en Français

    IBM DB2 Commonstore- DB2 CommonStore Archiving software produced by IBM for managing e mail messages or SAP ERP data. Part of the IBM Information Management portfolio which builds upon the DB2 database platform. DB2 CommonStore is one of several products which are… … Wikipedia

An overview of key concepts and a general description of the IBM DB2 database architecture for Linux, Unix, and Windows platforms

Content series:

This content is part # of a series # of articles: Overview of DB2 LUW

http://www.?q=%D0%9E%D0%B1%D0%B7%D0%BE%D1%80+DB2+LUW&co=ru&lo=ru&ibm-submit.x=11&ibm-submit.y=13&sn= mh&lang=ru&cc=RU&en=utf&hpp=

Stay tuned for new articles in this series.

Source materials and contact information

Special thanks to Mark Barinshtein for taking the time to proofread the material of the articles, attention to detail and valuable comments.

The main part of the article material is a free interpretation of the official DB2 documentation. The information presented has been restructured and reformulated to be concise and at the same time as clear as possible. References to the sources used in all cases are provided in the text of the articles and in the "Resources" section.

As part of a series of articles, it is planned to review the following issues:

  1. (the article you are currently reading)
  2. (installation, configuration, diagnostics, backup and recovery);
  3. advanced administration procedures (transfer of information, performance optimization, management of execution priorities);
  4. tools for building analytical data warehouses;
  5. in-memory analytics technologies - DB2 BLU;
  6. massively parallel analytical data processing with DB2 DPF (Database Partitioning Feature);
  7. distributed databases (failover configurations, data replication and federated data access);
  8. DB2 pureScale clustering capabilities for fault tolerance and scalability.

Articles in the series will be published as the relevant materials are prepared.

DB2 product family

The name "DB2" is used by IBM for a whole family of products that differ from each other both in the composition of the hardware and software platforms on which they are used, and in functionality, architecture and technological features. These differences are due to the close integration of most DB2 family products with the capabilities of the operating systems they run on, as well as the specifics of those operating systems.

The DB2 product family currently includes:

  • DB2 for Linux, Unix and Windows, or DB2 LUW - a DBMS for systems running Linux (RedHat, SuSE, Ubuntu), UNIX (AIX, HP-UX, Solaris) and Microsoft Windows, to which this article and other articles in this series are devoted;
  • DB2 for z/OS- DBMS for the z/OS operating system used on IBM System z mainframes;
  • DB2 Server for VSE & VM- DBMS for operating z/VM and z/VSE used on IBM System z mainframes;
  • DB2 for i- DBMS for the System i operating system used on the IBM Power platform.

Each of the listed DBMS is architecturally adapted for the most efficient functioning in the corresponding operating systems, and includes its own specific set of tools and administration tools.

The terminology used in the documentation for the various DB2 family DBMSs is not uniform, and the same terms may have different meanings for different variants of DB2: for example, the terms "database" and "tablespace" have different meanings for DB2 LUW and DB2 for z/OS, due to architectural differences between these types of DBMS.

Thus, when working with information resources dedicated to DB2, it is necessary to clearly distinguish which of the products of the family is being discussed in order to avoid confusion and possible errors.

Deprecated Features DB2LUW

In one form or another, the DB2 LUW product has been on the market since 1989 (the year the OS/2 1.10 Extended Edition operating system was released, which included the Database Manager component - that is, the relational DBMS that was the basis for DB2 LUW).

During the long development of the product, some originally developed functions were rethought and replaced by another implementation, or completely excluded from the product due to their lack of need. Therefore, when working with materials prepared for older versions of DB2 (for example, version 9.7), be aware that some of the features described in these materials may be replaced in newer versions of DB2 (for example, 10.5 and 11.1). Detailed information about deprecated and replaced functions is given in .

The most notable changes for administrators and developers include:

  • replacement of outdated graphical management tools "Control Center", "Task Center" and a number of others with the functions of the free IBM Data Studio package, as well as the functions of the tools included in the free edition of the IBM Data Server Manager product;
  • deprecation of the DB2 Administration Server (DAS), which was required to run legacy administration tools;
  • replacing the DB2 Governor and Query Patroller workload management tools with DB2 Workload Manager (DB2 WLM) functionality.

Purpose of preparing this article series

The main purpose of writing a series of review articles on IBM DB2 is to fill the lack of materials on this subject in Russian. Indeed, despite the availability of translations of a significant part of the documentation into Russian and accessible books on DB2, there is still a lack of available overview information that would allow interested readers to get an idea of ​​​​DB2 features, built-in functionality and administration specifics.

However, it is not the intention of the author to prepare an overview of all products in the DB2 family (see the sidebar "The DB2 Product Family"), instead, it is planned to focus on the DB2 variant for Linux, Unix, and Windows operating systems - i.e. on a DB2 LUW product.

For readers interested in a practical guide to getting started with DB2, I recommend that you refer to the freely distributed book "", translated into Russian. This book provides many examples of common DB2 software operations, making it easy to get started with both the DB2 9.7 version described in the book and newer DB2 versions (10.5 and 11.1). When working with current versions software DB2, be aware that some functionality in version 9.7 has been deprecated and is not supported in new versions of the product (see the "Deprecated DB2 LUW Features" sidebar).

DB2 LUW functionality

IBM DB2 uses a relational DBMS client-server architecture that is generally accepted at the moment, with information storage on the server and client applications connecting to databases locally or via a network.

To provide concurrent access to data from parallel applications, DB2 uses a transactional mechanism based on locking and transaction logging to provide standard ACID guarantees (Atomicity, Consistency, Isolation, Durability). This mechanism has come a long way of evolution to ensure maximum performance, reliability and minimize delays in application execution.

DB2 provides support for all common industry standards for application data access, including a standard language SQL queries, ODBC and JDBC interfaces, work with typical text table formats, etc. In addition, DB2 includes advanced capabilities for storing and working with semi-structured data in XML, JSON/BSON formats. For developing stored procedures, DB2 provides support for a variety of procedural languages, including:

  • standard for DB2 language SQL PL,
  • the SQL/PL language used in Oracle's DBMS,
  • the ability to develop "external" stored procedures on Java languages, C, C++ and COBOL.

Distinctive features of DB2 are:

  • scalability, limited only by available computing resources, and the most economical use of computing resources;
  • powerful built-in means of differentiation and access control, which provides the ability to granularly restrict access to information in the context of objects (tables, views), as well as implementing a model of mandatory access control;
  • advanced integrated data backup and recovery system;
  • availability of a full set of technologies for building "classic" analytical data warehouses (dividing tables into sections, materialized views, optimizing data caching and scanning tables and indexes, "internal" parallelism in the execution of complex queries, etc.);
  • support for building massively parallel analytical data processing (MPP) configurations from multiple servers connected via a communication network based on the DB2 Database Partitioning Feature (DB2 DPF);
  • maximum fault tolerance and near-linear scaling of DB2 pureScale cluster configurations, with data stored on shared disks;
  • DB2 BLU technology that implements support for modern in-memory "column" analytics without the use of manual optimization of the database structure.

To facilitate the migration of applications from other types of DBMS (primarily Oracle Database), DB2 provides advanced compatibility tools, including support for the necessary data types, stored procedures, and standard system views.

There are several editions of the DB2 LUW product, united by a single set of basic functions and differing from each other in the presence of restrictions on the used computing resources and support for advanced functionality. The DB2 Express-C edition, which is available at no cost, can be used for product evaluation, learning, and even small production deployments. The functionality and resource limitations of the various editions of DB2 LUW are detailed in the article "".

Structure of the database server

A DB2 database server is a computer on which the DB2 server software (DB2 engine) is installed and which provides structured information management services.

Access to DB2 services by applications is provided by the DB2 client software (IBM Data Server Driver Package) that communicates with the DB2 server in accordance with supported application connection methods (including ODBC, JDBC, OLE DB, ADO, CLI, and other methods). In most cases, the required client software is installed with the DB2 server, allowing applications hosted directly on the database server to connect to the DB2 server.

A DB2 database server can host multiple copies of DB2 software, with different software versions and installation directories. Multiple copies of DB2 software may reside independently on the same server, as long as there are no resource conflicts between them (including sufficient server computing resources and no conflicts over operating system logical resources: network names, port numbers, file system directories, and so on. ).

The direct provision of DBMS services is provided by the DB2 database manager (DB2 DBM) component. Each copy can have multiple instances of the DB2 database manager, or more briefly, DB2 instances. An instance is an independent environment in which databases can be created and applications can run. Each DB2 instance has its own configuration and provides access to a specific set of databases. DB2 instances are independent in the sense that the execution of operations on one instance does not affect others, except for resource constraints imposed by running multiple instances on the same physical or virtual server.

Starting and stopping DB2 services is done at the instance level, i.e. each DB2 instance can be in a running or stopped state. The parameters of a DB2 instance can define its resource limits (for example, in terms of memory usage). The resources of a DB2 instance are used to maintain the databases that exist within the instance.

A database is a collection of objects that make up a single information array (tables, views, indexes, etc.). Databases are independent entities and therefore do not typically share objects with other databases (an exception may be distributed database configurations that use data access federation mechanisms).

A schematic example of the database server structure is shown in the figure.


In many cases, the DB2 database server contains a single installed copy DB2 with a single instance created serving a single database. With this configuration, all database server resources are used to run a single DB2 database.

Serving requests from connected applications on the database server side is performed by so-called DB2 agents. For each connected application within a DB2 instance, a coordinating (primary) agent is started, which can start several additional (auxiliary) agents as needed. Technically, each agent is a separate thread of execution, or (for older versions of DB2) a separate operating system process, with associated resources needed to run it.

DB2 configuration options

DB2 server configuration can be set at four different levels:

  • Environment Variables;
  • DB2 profile registry;
  • database manager configuration file (DBM CFG);
  • database configuration file (DB CFG).

Environment variables are set at the server operating system level and by means of the operating system. For Windows OS, these variables are actually global for the server, for OS of Unix and Linux families, their specific environment variable settings can be set for each instance.

DB2 profile registry settings can be set at the operating system level (globally) or at the instance level, with the instance level settings overriding those set at the operating system level. Viewing and setting DB2 profile registry values ​​is done using the db2set command.

The database manager configuration file options are defined at the instance level, while the database configuration options are defined at the database level.

Many parameters are dynamic, i.e. changes made take effect immediately; however, there are settings that require you to stop and start the instance to change. This can be done on the command line using the db2stop and db2start commands. All applications must shut down before stopping the instance. You can use the db2stop force command to force an instance to stop.

The database manager configuration file includes settings that affect the instance and all the databases it contains. The database manager configuration file can be viewed or modified using the command line (GET DBM CFG and UPDATE DBM CFG commands) and IBM Data Studio.

The database configuration file includes options that affect a particular database. The database configuration file can be viewed or modified using the command line (GET DB CFG and UPDATE DB CFG commands) and IBM Data Studio.

A detailed description of the supported , as well as is given in the official DB2 documentation.

Organization of data storage

The smallest physical storage unit in DB2 is page. Allowed page sizes are 4K, 8K, 16K, and 32K. Database object information (such as table entries and index entries) is placed on pages.

Selection extra space for data storage is allocated by groups of pages, which are called extents. Performing extra space allocation operations at the extent level improves the performance of record inserts and updates.

The storage of information in DB2 databases is organized into objects called tablespaces. A tablespace is a named set of containers for storing information placed on the file system of a database server.

One or more tablespaces are created for each tablespace. containers(files or directories in the file system) to store information, as well as set the page size and area for caching data (buffer pool, see below), as well as a number of other parameters.

There are the following types of tablespaces:

  • ordinary: used to host user tables and indexes;
  • large: used to host user tables and indexes, and large object (LOB) data, and XML data. In modern versions of DB2, large tablespaces are used by default instead of regular ones;
  • temporary: used to store temporary information when executing queries (system temporary tablespaces) and temporary tables defined by applications (user temporary tablespaces).

The type of a tablespace is specified when it is created and cannot be changed except by deleting and re-creating the tablespace.

Tablespaces can also be classified by the type of control that is set when the tablespace is created:

  • table spaces managed by the system (SMS, System Managed Storage) - directories are used as table space containers, data files are created to place storage objects in directories. Space is not preallocated, files grow dynamically. When defined, containers are fixed at the time they are created;
  • table spaces managed by a database (DMS, Database Managed Storage) - pre-allocated files are used as table space containers, containers can be added, deleted or changed;
  • automatically managed tablespaces (automatic storage) - automatic detection of the type and location of the container depending on the type of tablespace (DMS for regular and large tablespaces, SMS for temporary tablespaces). Specific definitions of containers are not specified when creating a table space, the necessary containers are created automatically. The growth of existing containers and the addition of new containers are entirely controlled by DB2.

To enable automatic tablespace management, you first need to create a database with automatic storage enabled (which is the default) and bind a set of storage paths to it.

By default, DB2 writes sequentially to extents, "stripping" between containers. For example, if you have a table space with a page size of 4 KB and an extent size of 8 pages, and you use 3 immediate containers in a DMS table space, this means that 32 KB of data (4 KB x 8 pages in extent = 32 KB) will be written one disc before recording to the next begins.

Starting with DB2 Version 10.1, a new concept has been introduced to simplify data storage management − storage group(storage group). A storage group is a named collection of paths in the DBMS server file system that can be used to store data. The composition of storage groups in a database typically defines the set of types of storage devices available to store information. When a database is created, a default storage group is always automatically created in the database.

Each automatically managed table space is associated with one of the created storage groups, which determines the physical location of the data stored in the corresponding table spaces. It is possible to move a table space from one storage group to another using the ALTER TABLESPACE ... USING STOGROUP ... command.

Transaction Logging

IBM DB2 LUW, like most other modern relational DBMSs that provide ACID guarantees, uses the transaction log as one of the primary mechanisms to enforce these requirements.

Data modification operations performed by DB2 are recorded in the transaction log as a series of log entries. Each database maintains its own transaction log, which is a sequence of files on disk. The size of a single file is determined by the LOGFILSIZ parameter, the number of initially created files is determined by the LOGPRIMARY parameter. DB2 can create additional log files as needed, the maximum number of log files that can be created is controlled by the LOGSECOND parameter.

Information is written to the transaction log using a special buffer in RAM. The contents of the buffer are flushed to disk (to transactional log files) as the buffer fills up, as well as when transactions are confirmed or canceled (by an application command or upon an abnormal closing of the connection with the application).

The transactional log file that is required to recover data after a crash is called the active log file. Active transaction log files must be available to the DB2 database manager at all times. Since the availability of transactional log files is critical for ensuring the DBMS performance, a mechanism is provided for mirroring transactional logs in two file systems (configured by the LOGMIRROR parameter).

When wrong choice the size and number of transaction log files that do not correspond to the level of current load, there may be situations when the transaction log is full due to insufficient number of log files allowed to be created or lack of available disk space. Depending on the database settings (see the BLK_LOG_DSK_FUL parameter), an appropriate error message may be returned to applications, or processing may be suspended until the situation is resolved by the administrator.

Also, situations of transaction log overflow can occur when there are long-running transactions that perform data modification operations. Even if such long running transaction performs a single small change to the database, which then remains uncommitted for a long time, the corresponding transaction log file remains active and cannot be reused.

There are two main modes of operation for DB2 transactional logging: circular logging and archive logging. In circular logging mode, DB2 cycles through the generated set of transactional log files. In archive logging mode, DB2 optionally copies transaction log files to the archive using the methods specified by the LOGARCHMETH1 and LOGARCHMETH2 parameters.

The circular logging mode ensures the restoration of the integrity of the database in the event of a crash of the DBMS server. Backing up such a database is possible only after shutting down all applications (that is, with the suspension of user access). Data recovery from backup only possible with bringing the database to the state at the time of the backup.

The archive logging mode also ensures that the integrity of the database is restored if the DBMS server crashes. Additionally, you can back up the database without interrupting user access and include active log files (needed to restore data integrity) in the backup. Restoring data from a backup can be supplemented by applying the changes made to the database since the backup was taken and bringing the database to a state at a selected point in time in the past (but not earlier than the time the backup was taken).

Archive logging mode requires additional resources to perform archiving operations, including increased I/O and additional disk storage space archive files transaction log.

Organization of data caching

To reduce the amount of I/O performed, DB2, like other modern relational database systems, caches reads and writes performed on tablespaces. Caching is done using areas of RAM called buffer pools. DB2 can define several different buffer pools (created by the CREATE BUFFERPOOL command) with the page size, size, and auto size control flag. Each table space is mapped to a specific buffer pool, and a single buffer pool can be shared across multiple table spaces.

When performing a read operation, it first searches for the desired page with data in the buffer pool. If a required page is found, the data is read from the buffer pool, otherwise the page is loaded from disk into the buffer pool. A mechanism is provided for asynchronous prefetching of pages into the buffer pool when a linear (predictable) nature of page accesses is detected. The prefetching mechanism in many cases reduces the waiting time for read operations of the necessary data from disk by performing reads in asynchronous mode.

When a write operation is performed, page adjustment is performed directly in the buffer pool. In this case, the page is not written to disk in synchronous mode, and data safety is ensured by the transactional logging mechanism. Changed tablespace pages are written to disk asynchronously, background, and provides a reasonable minimization of the amount of work that may be required to restore the state of the database when it is abnormally (incorrectly) closed. Closing the database correctly (for example, during a regular shutdown of the DBMS server) ensures that all modified pages of all buffer pools are written to disk.

RAM usage

The DB2 storage model consists of various storage areas at the DB2 instance, database, application, and agent levels.

For a detailed description of DB2 storage areas, see below short description appointments of different areas.

A list of major DB2 storage areas is shown in the figure below (originally taken from ).

The total memory used by the DBMS instance includes:

  • Monitor Heap - memory area for monitoring operations and status, the size is regulated by the MON_HEAP_SZ parameter;
  • FCM Buffers - a memory area for interaction between the coordinating agent and its subagents, as well as for providing internal interactions in partitioned databases;
  • Audit Buffer is a memory area where audit records are placed before being flushed to the audit log.

At the database level, it is customary to distinguish between:

  • a global database area, often referred to as the "Performance memory" area, which includes various caching areas and a locking area;
  • an application data area, often referred to as the "Functional memory", which includes the various working memory areas of agents servicing database connections.

The global database area consists of the following main components:

  • Buffer pools - buffer pools, i.e. areas for caching tablespace data;
  • Lock list - an area for storing information about locks, the size of which is controlled by the LOCKLIST parameter;
  • Package cache - area for caching query execution plans, the size is regulated by the PCKCACHESZ parameter;
  • Catalog cache - area for caching the system catalog, which includes descriptions of all database objects, the size is regulated by the CATALOGCACHE_SZ parameter;
  • Utility heap - RAM for performing database maintenance operations (including backup and restore operations), the size is regulated by the UTIL_HEAP_SZ parameter;
  • Database heap - RAM for servicing database operations (including a transaction log buffer and a cache to speed up access to the system catalog, as well as an audit buffer at the database level), the size is regulated by the DBHEAP parameter.

The total size of the global database area is limited by the DATABASE_MEMORY setting.

The application data area includes:

  • Application Global Memory - common memory areas shared when processing application requests, the maximum amount is regulated by the APPL_MEMORY parameter;
  • Agent Private Memory - private memory areas used for the operation of individual agents serving connected applications.

Optionally, you can allocate areas of memory that are allocated for the DB2 driver to run on the application side. For local applications (those that use IPC rather than network access to connect to the database manager), the DB2 settings you set control how much RAM is allocated (primarily the ASLHEAPSZ setting).

RAM management when performing sort operations

Many types of DBMS operations require sorting of data, so the management of RAM used for sorting is given special attention.

If it is impossible to allocate the sort area entirely in RAM, the data for sorting is placed in the system temporary table space. The performance of queries that require such large sort operations can be significantly degraded.

Parameters that control the allocation of RAM for sorting:

  • SORTHEAP - memory limit for sort operation;
  • SHEAPTHRES - size limit of the agent's private memory allocated for the sort operation;
  • SHEAPTHRES_SHR - the limit on the amount of total RAM that can be used to perform sort operations (by all consumers in total) at any given time.

DB2 supports three basic sort memory management models:

  • Shared sort model (shared sort) – used by default, implies setting the SHEAPTHRES parameter to 0. RAM allocation for sorting is carried out from the global area of ​​the database.
  • Private sort model (private sort) - used when the SHEAPTHRES parameter is non-zero and there is no configured shared sort memory. The allocation of RAM for sorting is carried out from the application data area (more precisely, from private areas belonging to agents).
  • Hybrid sorting model (hybrid sort) - used when the SHEAPTHRES parameter is non-zero and there is a configured shared sorting memory. Operations that require the use of shared sort memory are performed with memory allocation in the global area of ​​the database, other sort operations are performed with memory allocation in private areas of agents.

Using shared (global) memory to perform sort operations provides a number of important benefits:

  • more flexible management of RAM when executing queries, allowing you to increase the efficiency of using RAM;
  • the ability to use a parallel version of the sorting algorithm due to the simultaneous access to the sorting memory area of ​​the coordinating agent and its subordinate DB2 sub-agents.

One of the following settings can be used to enable the use of shared memory when performing sort operations:

  • the general sorting area model is enabled by setting the SHEAPTHRES parameter to 0;
  • parallelism of operations execution is enabled by setting the INTRA_PARALLEL parameter to YES;
  • the DB2_WORKLOAD variable is set to ANALYTICS;
  • the DB2 Connection Concentrator feature is enabled (usually used when accessing DB2 for z/OS and DB2 for i databases, see the description of this feature in ).

Automatic memory allocation management

Availability a large number different areas of RAM and the parameters that govern their size can be a significant effort to manually tune a DB2 server. Therefore, starting with version 9, IBM DB2 supports automatic management of the distribution of RAM between different areas using a self-tuning memory manager (STMM, Self-Tuning Memory Manager).

When bootstrapping is enabled, STMM dynamically allocates available memory resources to memory consumers in the database. STMM responds to changes in workload characteristics by adjusting memory configuration parameter values ​​and the size of buffer pools to optimize performance. To enable STMM, you must set the SELF_TUNING_MEM database configuration parameter to ON.

Automatic management of memory allocation is carried out for those memory areas for which it was explicitly allowed. When setting a configuration parameter value with the UPDATE DBM CFG and UPDATE DB CFG commands, to use STMM, the AUTOMATIC keyword is specified after the parameter value. The numerical value of the parameter specified in this case is used as the initial value, then STMM periodically adjusts the values ​​taking into account the current load, redistributing the RAM between different consumers.

STMM automatic management is supported for the following options:

  • INSTANCE_MEMORY is the total amount of RAM in the DB2 instance;
  • DATABASE_MEMORY - global database areas;
  • DBHEAP - an area for servicing database operations;
  • LOCKLIST – scope for keeping data on locks;
  • MAXLOCKS - the percentage of memory occupied by locks of one application to switch to lock escalation;
  • PCKCACHESZ - caching area for query execution plans;
  • SHEAPTHRES_SHR - general sort area;
  • SORTHEAP - sort area size for one operation;
  • APPL_MEMORY - area of ​​functional memory;
  • APPLHEAPSZ - private memory limit used by one agent;
  • STMTHEAP - limit on the size of the area used by the compiler of SQL and XQuery queries (per query);
  • STAT_HEAP_SZ is the maximum amount of RAM allocated for building statistics by the RUNSTATS utility and allocated from the functional memory area.

Types of database objects

This section provides an overview of the types of DB2 database objects. For a complete list of DB2 database object types and detailed information on each type of object, see the DB2 documentation:

Schemes

Schemas are namespaces for collecting database objects. Schemes are primarily used for:

  • providing an indication of ownership of objects or links to the application;
  • logical grouping of related objects.

All DB2 database objects (except for generic synonyms) have fully qualified two-part names; schema is the first part of such a name:<имя_схемы>.<имя_объекта>

The fully qualified object name must be unique. If you connect to a database and create or access an object without specifying a schema, DB2 will use the user ID that made the connection to the database as the schema name. You can also use the SET SCHEMA statement to set the schema for the current session.

Schema creation can be done explicitly, by calling the CREATE SCHEMA statement, or implicitly, on the first attempt to create an object without specifying a schema name. In the latter case, the user must be granted the IMPLICIT_SCHEMA permission to successfully create the schema.

Synonyms can be created for most kinds of database objects, allowing the original objects to be referred to by a different name (perhaps placed in a different schema). Synonyms are created using the CREATE SYNONYM / CREATE ALIAS statement. It also supports working with public synonyms that are not tied to a specific schema. Access to public synonyms is carried out without specifying the scheme, regardless of the established current scheme of the session. Public synonyms are created using the CREATE PUBLIC SYNONYM / CREATE PUBLIC ALIAS command.

tables

A table is a collection of related data organized logically in columns and rows.

Each table row consists of the same set of named columns. When creating a table, each column is assigned a data type that limits the allowable column values ​​in table rows (database records) and determines the semantics of possible operations on the corresponding values ​​(including comparison, sorting, computational operations).

A table is created with the CREATE TABLE command, and deleted with the DROP TABLE command. It is supported to change the description of a table with the ALTER TABLE command, including adding and deleting columns, changing the data types of columns. After performing some operations of changing the table description, it is required to reorganize it (restructure the physical storage of the table for optimal access to it) using the REORG command.

The classification of DB2 built-in data types that can be used to define table columns is shown in the figure below.

In addition to one of the allowed values ​​of the supported data type, column values ​​can be blank, i.e. empty (NULL) value. The ability for a column to store null values ​​is determined when the table is created.

Most of the data types listed in the figure have a direct counterpart in other modern relational DBMSs and are described in detail in the DB2 documentation. The following is a brief description of data type features that are specific to DB2 or that may be difficult to use.

When working with string data, unlike some other DBMS types, DB2 distinguishes between an empty string (string of zero length) and a NULL value of the string type. This feature affects the search order (using the equality predicate instead of the IS NULL expression) and the composition of the allowed column values ​​(if NULL values ​​are forbidden, an empty string can be stored in the column).

String values ​​of the GRAPHIC, VARGRAPHIC, and DBCLOB types differ from other string types in that they are always stored in UTF-16 encoding. When accessing the corresponding columns from the side of the client application, the DBMS provides data conversion to the encoding used by the client application.

Columns of type DATE (date) do not contain timestamps by default. In Oracle Database compatibility mode, DB2 additionally supports storing time attributes (hours, minutes, seconds) in DATE columns.

If necessary, provide efficient work with exact decimal numbers that include a fractional part (for example, in financial applications), it makes sense to use the DECFLOAT data type, which combines the exact representation of DECIMAL values ​​and the ability to efficiently calculate FLOAT values.

The BLOB data type provides the ability to store unstructured binary information (such as images or office documents) in a database. BLOB values ​​can be stored together with other record fields (if their size allows them to be compact enough), or separately, in a special storage object. In the latter case, the entry contains a reference to the stored BLOB value instead of the value itself. Storage of values ​​of CLOB and DBCLOB types is organized in a similar way.

The XML data type provides storage in table fields of structured hierarchical XML documents. For stored XML documents, attribute access operations are supported (without the need to parse the XML document when accessed), indexing of individual attributes, and other features.

In addition to built-in data types, DB2 supports user-defined data types that are defined based on built-in types. Working with user-defined data types is described in the DB2 documentation.

When creating a table, it is possible to specify rules for automatic filling of their values ​​for columns. A special case of autocomplete columns are identity columns, which are numeric columns that automatically generate a unique numeric value for each inserted row. Automatic filling can be carried out in one of two modes:

  • GENERATED ALWAYS - The value is always set by the DB2 server and cannot be explicitly set by the application;
  • GENERATED BY DEFAULT - The value is set by the DB2 server if the application did not specify an explicit assignment value when the record was inserted.

Also, at the table level, restrictions can be defined that set restrictions on the composition of attribute values. The following types of restrictions are supported:

  • primary key (PRIMARY KEY) - uniqueness constraint on a set of columns, mainly used to search for a single record, there can be only one primary key in a table;
  • uniqueness constraint (UNIQUE) – additional uniqueness constraint on a set of columns;
  • foreign key (FOREIGN KEY) – a reference in the form of a set of column values ​​pointing to a combination of columns of another table for which a foreign key or unique constraint is defined;
  • check (CHECK) - a logical condition that limits the possible values ​​​​of one or several columns in a record at once.

The restriction mechanism implements means of automatic control and ensuring the integrity of the database, including referential integrity of data (control of the presence in the "parent" table of records that are referenced through foreign keys of the records of "child" tables). Proper use of restrictions allows you to guarantee the formal correctness of filling the database and, to a certain extent, protect yourself from application and user errors when correcting data.

Since the restriction mechanism creates additional computational load when entering and updating data, in some cases its use is deliberately abandoned, placing the responsibility for the correct maintenance of the database on the application. At the same time, DB2 uses integrity constraint descriptions to determine the relationships between tables and select the most efficient query execution plan.

Temporary tables

To store temporary application data, DB2 provides a temporary table mechanism that provides full functionality for working with tabular data, but in the context of the current user session.

Accessing temporary tables greatly improves performance by minimizing or eliminating system catalog access conflicts, and by eliminating row locking, logging (optional, depending on table creation mode), and permission checking.

Temporary tables also support indexes, that is, any standard index can be created on a temporary table. You can also collect statistics on such tables (using the RUNSTATS command) to obtain the information needed by the query optimizer.

Temporary tables reside in a user temporary table space, which must be defined before they are created.

There are two main varieties of temporary tables in DB2:

  • declared temporary tables (DGTT - Declared Global Temporary Tables);
  • created global temporary tables (CGTT - Created Global Temporary Tables).

Declared temporary tables are in-memory tables used by the application and automatically dropped when it terminates. Such tables can only be accessed by the application that created them, and they are not stored in any of the DB2 system catalog tables.

Each declared temporary table has a SESSION schema; this scheme must be specified by referring to it. The user ID used to declare temporary tables will have full privileges on those tables. Each application that declares a temporary table will have its own copy of that table.

While DGTTs allow you to declare a temporary table, the definition of such a table cannot be shared across connections or sessions. You must issue a DECLARE GLOBAL TEMPORARY TABLE statement every time you start a session.

When using Generated Global Temporary Tables (CGTT), the temporary table definition only needs to be created once because it is stored in the DB2 system catalog. This means that other connections can use the table definition instead of creating it again.

Although the structure of the CGTT table can be used immediately, data from different connections is independent of each other and disappears when the connection is closed.

Indices

An index is an ordered set of keys, each pointing to a table row. Indexes enforce row uniqueness (that is, implement the unique constraints discussed in the previous section) and improve performance. The following describes some of the characteristics that can be defined for indexes:

  • indexes can be built in ascending or descending order of column values;
  • index keys can be unique or non-unique;
  • indexes can be built on several columns (such indexes are called combined);
  • if index and table data are grouped in the same index sequence, such an index is called clustered (CLUSTERED INDEX).

Index creation is provided by the CREATE INDEX statement, deletion by the DROP INDEX statement. When creating an index, its type (unique / non-unique) and the composition of the columns for building the index are specified.

DB2 provides tools that provide automated index selection to optimize query execution. The most convenient way to work with these tools is organized in IBM Data Studio.

Sequences

Although sequence objects are independent of tables, they function similarly to identity columns and provide the generation of unique numeric sequences. The difference between sequences and identity columns is that identity columns generate unique numbers strictly in the specified column of the table, while sequence objects can be used to generate sequential numeric values, the logic of which is determined by the application.

The creation of sequences is provided by the CREATE SEQUENCE command, the next and current received values ​​are accessed using the NEXT VALUE FOR and PREVIOUS VALUE FOR operators. For compatibility with Oracle Database, the syntax for accessing sequence values ​​via pseudo-columns "NEXTVAL" and "CURRVAL" is also supported.

Representation

A view is a display of data in tables. Data for views is not stored separately, it is fetched when the view is run. Nested views are supported, i.e. views created from other views.

Views are created with the CREATE VIEW command, and deleted with the DROP VIEW command. To facilitate updating (replacing) views, the CREATE OR REPLACE VIEW syntax is provided, which provides for the creation of a new view (if it does not already exist) or replacing an existing view with a new definition (if a view with given name has already been created).

triggers

A trigger is an object that automatically performs an operation on a table or view. A specific action on an object for which a trigger is defined causes the trigger to fire. Typically, a trigger is not considered an application object; accordingly, triggers are usually created not by developers, but by database administrators.

Stored procedures and functions

Stored procedures are database objects that contain SQL statements and business logic. Storing some of the application logic in the database improves performance by reducing the amount of traffic between the application and the database. In addition, stored procedures provide a centralized location for storing code so that other applications can use the same stored procedures. The CALL statement is used to call a stored procedure.

User Defined Functions (UDFs) are database objects that allow users to extend the SQL language with their own logic. A function always returns a value or values, usually as a result of the business logic included in the function. To call a function, use it as part of an SQL statement or with a VALUES statement.

In DB2, stored procedures and user-defined functions can be developed in several programming languages, including PL/SQL, SQL PL, Java, C, C++, COBOL.

System directory

One of the basic information resources of the DBMS is the system catalog, which stores and provides access to information about the structure of the database, including:

  • description of tables, columns and indexes;
  • description and text of views, triggers and stored procedures;
  • information about tablespaces and containers for storing data;
  • established permissions for access to database objects;
  • other database meta information.

Accessing the system catalog is required for many tasks, including automating database administration and maintenance tasks, application development, and more.

The most commonly used tables (really, views) that are part of the system catalog are:

  • SYSCAT.SCHEMAS - description of database schemas;
  • SYSCAT.TABLES - description of database tables;
  • SYSCAT.COLUMNS – description of table columns;
  • SYSCAT.INDEXES - description of indexes.

A detailed description and composition of columns for the above and other tables of the system catalog is given in .

Organization of parallel transactional processing

Transactions

A transaction (or unit of work) consists of one or more SQL statements, which, when executed, are treated as a separate unit; in other words, the failure of one transaction statement causes the entire transaction to fail, with all statements executed up to the point of failure being rolled back.

The transaction ends with a COMMIT statement. The transaction can also end with a ROLLBACK statement or an emergency (abnormal) shutdown of the application, after which all changes made by the application to the database will be canceled. The start of a transaction is the first statement executed after the application's connection to the database was opened, or after the previous transaction completed. Each application connection to the database can have at most one active transaction.

As mentioned earlier, changes to the database are recorded in the transactional log. To ensure that changes made by a transaction being undone can be "rolled back", transaction boundaries are also recorded in the transaction log. At the same time, transactions that perform only data read operations are not written to the transaction log. Information about the beginning of a transaction is placed in the transaction log before the start of the execution of the first (for a given transaction) data write statement.

In the event of an error in the execution of a single statement that writes data, all changes made by that statement are undone using the transaction log data. The application, having received a diagnostic message about the refusal to execute the statement, can cancel the entire transaction (with the ROLLBACK statement) or perform some other actions with the database and, as a result, confirm the changes made (with the COMMIT statement).

An application can define additional rollback points within a transaction (using the SAVEPOINT statement) and undo changes made after a rollback point has been created (using the ROLLBACK TO statement). The use of rollback points allows an application to selectively undo actions taken within a transaction, which can be useful when handling data integrity errors and other scenarios.

Locks

Parallel use means that multiple users can work on the same database objects at the same time. Data access must be properly coordinated to ensure data integrity and consistency.

To obtain consistent results of parallel transactions, control over the parallel use of shared resources is required. Such control is based on the use of locks.

The concepts of locking and concurrency are closely related. The lock temporarily prevents applications from performing other operations until the current operation has completed. The more actively locking is used in the system, the less opportunities for concurrency remain. On the other hand, the less often a lock is applied in a system, the more possibilities there are for concurrency.

A lock is acquired automatically as needed to maintain a transaction and is released when such a transaction is aborted (using a COMMIT or ROLLBACK command). Locks can be placed on tables or rows.

There are two main types of blocking:

  • Shared Lock (S) - Set when an application reads data and prevents other applications from making changes to the same row.
  • Exclusive lock (X) - Set when an application updates, inserts, or deletes a row.

If two or more applications need to perform an operation on the same object, one of them will have to wait to acquire the required lock. By default, the application will wait indefinitely. The application's lock timeout is controlled by the LOCKTIMEOUT database configuration parameter. The default value for this parameter is -1 (infinite wait).

You can use the CURRENT LOCK TIMEOUT session variable to set the lock timeout on a specific connection. By default, this variable is set to LOCKTIMEOUT. You can use the SET LOCK TIMEOUT statement to change this value.

In the case when two (or more) applications connected to the same database wait indefinitely for resources due to the incorrect sequence of accessing these resources, a deadlock situation occurs. The timeout period cannot expire because each application is holding a resource that the other application needs. In all cases, the deadlock problem is due to incorrect application structure or logic.

DB2 automatically detects deadlock situations by performing appropriate checks at the intervals specified by the DLCHKTIME parameter. When it detects that a deadlock has actually occurred, DB2 will use an internal algorithm to determine which of the two transactions should be rolled back and which should be continued.

Isolation levels

A detailed analysis of the problems that can occur if there is no concurrency control is provided in the DB2 documentation, as well as in the literature on the theory of the functioning of relational DBMS. Possible types of problems include:

  • lost update (if one data block is changed simultaneously by different transactions, one of the changes is lost);
  • unreliable reading (reading of data added or changed by a transaction, which will not be subsequently confirmed);
  • non-repeating reading (when re-reading within the same transaction, previously read data is changed);
  • phantom reading (the same selections in one transaction give different sets of rows due to adding, deleting or changing rows by other transactions).

Application-side control of DB2's built-in protection against the problems listed above is done by setting the isolation level to be used. Isolation levels can be thought of as blocking policies where, depending on the isolation level chosen, one can achieve various options database locks by the application. The isolation level required by an application can be set at the session level and at the level of a single query or subrequest that is being executed.

DB2 provides the following levels of protection for data isolation:

  • unreliable reading (Uncommitted Read, UR);
  • cursor stability (Cursor Stability, CS);
  • reading stability (Read Stability, RS);
  • repeated reading (Repeatable Read, RR).

Inauthentic reading also called "dirty". This is the lowest isolation level that allows the highest degree of concurrency. Rows are not locked during read operations, except when another application tries to drop or modify the table; update operations are performed in the same way as when using the next isolation level, the Cursor Stability level.

Using the Bad Read isolation level prevents the following problems:

  • lost update.

Cursor stability is the default isolation level. It provides the minimum degree of blocking. This isolation level locks the "current" row of the cursor. If the line is read-only, the lock is held until transition to new line or completion of the operation. If the row is updated, the lock is held until the operation completes.

Using the Cursor Stability isolation level prevents the following issues:

  • lost update;
  • incorrect reading.

Prior to DB2 9.7, when using the Cursor Stability isolation level, performing a write (UPDATE operation) closed a read (SELECT operation) access to the same row. The logical basis was that since the write operation makes changes to the row, the read should wait for the updates to complete in order to get the final committed value.

DB2 9.7 defaults to a different approach to the Cursor Stability isolation level for new databases. This new approach is implemented using "currently committed" (CC) semantics. When using CC semantics, a write operation does not close access to the same row for a read operation. Previously, this approach was possible using the UR isolation level; the difference with the current approach is that with UR the read operation receives invalid values, while with CC semantics it receives the values ​​currently accepted. The currently committed values ​​are the values ​​that were committed before the start of the write operation.

Reading stability provides a lock on all rows received by the application. For the given query, all rows that match the result set are blocked. So the use this mode Isolation can cause an application to acquire a large number of locks, and if the set limits are reached, locks can be escalated from the row level to the table level. However, at the Read Stability isolation level, the DB2 query optimizer does not explicitly acquire table-level locks in the execution plan of the queries it executes, even if the result set contains most of the records in the table.

Using the Read Stability isolation level prevents the following issues:

  • lost update;
  • incorrect reading;
  • non-repetitive reading.

Repetitive reading is the highest level of isolation. It provides the highest degree of locking and the least amount of concurrency. A lock is placed on the rows that are processed to build the result set; in other words, even rows that do not make it into the final result package can be blocked. Other applications cannot update, delete, or insert rows that will affect the result set until the operation in progress is completed. Repeated reading ensures that the same query, created by the application multiple times in a single operation, produces the same results each time.

The DB2 query optimizer, when using the Repeated Read isolation level, can include explicit table-level locking operations in the query execution plan when the corresponding queries involve scanning all rows of a table (which means that each row of the table must be locked during query execution).

Using the Read Stability isolation level prevents all possible problems competitive access, but at the same time, the possible parallelism of the execution of operations is limited as much as possible.

Conclusion

This article has reviewed the main functionality IBM DB2 LUW DBMS, database server structure, configuration settings and data storage organization. In addition, the basic principles of the DB2 server, the supported types of database objects, and the organization of parallel DB2 transactional processing are considered.

DB2 (pronounced in Russian "dibi two", tracing paper from English "dibitu" is also common) is a family of information management software products from IBM. Most often, referring to DB2, they mean the relational database management system DB2 Universal Database (DB2 UDB), developed and released by IBM.

Despite the favorable attitude towards the Linux operating system, which is distributed under an open source license, IBM has no plans to open source its DB2 database yet. It was declared by the director of the center of IBM Linux Technology Jim Vasco at the last (April, 2011) annual conference of Linux Foundation Collaboration Summit in San Francisco. Inside IBM, there is a constant struggle between representatives of different departments, Vasco explained. In some cases, opting for Linux or Windows means lower software revenues but higher services revenues, while in other cases it may be hardware revenues. We have to look for the optimal solution, he concluded. Transferring packages to Oracle from open source, being developed at Sun Microsystems, created some problems for IBM, Vasco said. Oracle is trying to convince customers to trade in IBM hardware for its own Exadata servers and Oracle database. In 2011, Linux Foundation director Jim Zemlin expects the development of Linux-based specialized high-performance systems like IBM Watson and ready-made devices that require minimal configuration.

Implementations

Currently, in addition to commercial products of the family, IBM also distributes a free distribution of DB2 Express-C for Linux (x86, x86-64, POWER), Windows (x86, x86-64), Solaris (x86-64), Mac OS X platforms. (x86-64 beta). The free version has restrictions on the use of no more than one dual-core processor and 2 GB of RAM for the DBMS (the total number of processors and memory in the system can be any, but resources beyond the specified limits will not be used by the DBMS).

2017: Announcement of additions for data control

Db2 on Cloud

The updated Db2 on Cloud solution is a fully managed service available on the IBM Cloud.

Technology features include:

  • Dynamic scale (“slider”) for setting performance and memory parameters - with a mouse click, you can instantly change the scale, increasing or decreasing data processing performance and the required amount of RAM, in the same way you can increase the amount of memory of the information storage system;
  • Web console - helps customers quickly master the service and speed up work with it.

Overall, Db2 on Cloud avoids the time-consuming process of negotiating and purchasing additional computing resources and complements IBM Db2 Hosted, the version of the database hosted on the IBM Cloud.

Db2 on Cloud Benchmark

DB2 Analytics Accelerator

Versions

2017: JSON and HTTP

DB2 10 represents the first major update to the DBMS in several years: z/OS version 10 was released in 2010, but this release is intended for both Linux, Unix and Windows systems.

Both products contain new functionality. DB2 now supports the RDF (Resource Description Framework) format and InfoSphere can interoperate with Apache Hadoop deployments. Other improvements in DB2 include, among others, faster backup and I/O processes.

DB2 10 is also more flexible. In particular, DBMS administrators received tools for distributing data for storage across different types storage media: for example, operational information can be stored on faster solid-state drives, while less valuable data can be stored on cheaper and slower tape drives.

A new feature called time travel allows for more efficient management of time data and has been a great success for users of version 10 for z/OS. With its help, the user or program can examine the data in the context of the time of their existence in the DBMS for given periods. The use of such environments is relevant for analytics.

DB2 10 can be downloaded free of charge for use in a production environment with up to two processor cores and 2GB of memory. More functional versions will cost starting from $ 6180, which includes the cost of annual maintenance. The cost of InfoSphere is based on the number of processors or the amount of data stored, the basic versions will cost about $40 thousand per TB.

IBM DB2 10.5 version

History

DB2 has a long history and is considered by some to be the first DBMS to use SQL.

From 1975 to 1982, the DB2 prototype was developed at IBM under the name System Relational, or System R. The SQL language was first implemented in IBM System R, but this system was of a research nature, and a commercial product that included SQL was first released by Oracle in 1979 year.

DB2 got its name in 1982 with the first commercial release for the VM called SQL/DS and then the MVS release called DB2. For a long time, along with "DB2", the "Database 2" variant, also a trademark of IBM, was used. Apparently, this was meant to be the second flagship IBM DBMS after the old hierarchical IMS DBMS.

The development of DB2 goes back to the early 1970s, when Dr. E. F. Codd, working for IBM, developed relational database theory and published a data manipulation model in June 1970. To implement this model, he developed a relational database language and called it Alpha. IBM chose to outsource further development to a group of programmers outside of Dr. Codd's control. Violating some of the principles of the relational model, they implemented it as a "Structured English Query Language", SEQUEL for short. Since SEQUEL was already a registered trademark, the name was shortened to SQL - "Structured Query Language" and has remained so to this day.

Thus, historically, DB2 evolved from DB2 for MVS (of which DB2 for z/OS is a descendant) and its sister SQL/DS for VM (of which DB2 Server for VSE & VM is a descendant). Subsequently, another development team at IBM implemented the OS/2 EE Database Manager server, which later evolved into DB2 v2 for OS/2, AIX and then Windows, and then into DB2 UDB (its descendant is DB2 for Linux, UNIX and Windows). Another team completed the integration of the DB2 architecture with the embedded AS/400 database (a descendant - DB2 for i). IBM is gradually moving towards the integration of all these branches.

Peculiarities

The distinguishing features of DB2 include a dialect of the SQL language, which defines, with rare exceptions, a purely declarative meaning of language constructs, and a powerful multi-phase optimizer that builds an effective query execution plan from these declarative constructs. Unlike other SQL dialects, the DB2 SQL dialect has virtually no optimizer hints, little (and for a long time) lack of development of the stored procedure language, and thus everything is aimed at maintaining a declarative style of writing queries. At the same time, the DB2 SQL language is computationally complete, that is, it potentially allows you to define any computable correspondences between the source data and the result in a declarative form. This is achieved, among other things, through the use of table expressions, recursion and other advanced data manipulation mechanisms.

Due to IBM's focus on relational development and the firm's position in the computer industry, the DB2 SQL dialect has a significant impact on the ANSI/ISO SQL standards.

Stored procedures are not very widely used in DB2, and traditionally, conventional high-level programming languages ​​(C, Java, PL/I, COBOL, etc.) are used to write stored procedures, this allows the programmer to easily format the same code either as part of the application, or as a stored procedure, depending on whether it is more appropriate to execute it on the client or on the server. DB2 also currently implements the SQL procedural extension for stored procedures, in accordance with the ANSI SQL/PSM standard.

The DB2 optimizer makes extensive use of table distribution statistics (if the data collection process was done by the DBA), so the same SQL query can be translated into completely different execution plans depending on the statistical characteristics of the data it processes.

Since DB2 has historically evolved from multi-user systems on the mainframe, the DB2 architecture has placed a lot of emphasis on security and the distribution of roles for DB2 maintainers. In particular, unlike many other DBMSs, DB2 has separate roles for a database administrator (responsible for configuring DB2 software components and running them optimally on a computer system) and a database administrator (responsible for managing the data in a particular database).

The use, if necessary, of static SQL in programs and the concept of packages, unlike most other DBMS, allows the implementation of such a security model when the rights to perform certain operations can be granted to application programs in the absence of such rights for users working with these programs. In this case, this makes it possible to guarantee the impossibility of the user working with the database bypassing the application program, if the user has only rights to run the program, but not to independently manipulate data.

As part of the concept of increasing the level of integration of security tools in a computer system, DB2 does not have its own means of authenticating users, integrating with operating system tools or specialized security servers. Within DB2, only users authenticated by the system are authorized.

DB2 is the only general-purpose relational DBMS that has implementations at the hardware/software level (IBM i system; DB2 support is also implemented on IBM System z mainframe hardware).

Modern versions of DB2 provide enhanced support for using XML data, including operations on individual elements of XML documents.

Liked the article? Share with friends!
Was this article helpful?
Yes
Not
Thanks for your feedback!
Something went wrong and your vote was not counted.
Thanks. Your message has been sent
Did you find an error in the text?
Select it, click Ctrl+Enter and we'll fix it!