OpenVMS®Languages and I/O Performance

OpenVMS^®Languages and I/O Performance Report

An analysis of the I/O characteristics of
OpenVMS^® languages

This document contains PROPRIETARY and CONFIDENTIAL information and such information may not be disclosed to others for any purpose without written permission from Touch Technologies, Inc.

Touch Technologies, Inc. 9988 Hibert Street, Suite 310 San Diego, California 92131 Phone (800) 525-2527 FAX (619) 566-3663

NOTICE

Touch Technologies, Inc. (TTI) has prepared this publication for use by TTI personnel, licensees, and customers. This information is protected by copyright. No part of this document may be photocopied, reproduced or translated to another language without prior written consent of Touch Technologies Incorporated.

TTI believes the information described in this publication is accurate and reliable; much care has been taken in its preparation. However, no responsibility, financial or otherwise, is accepted for any consequences arising out of the use of this material.

The information contained herein is subject to change without notice and should not be construed as a commitment by Touch Technologies, Inc.

The following are trademarks of Touch Technologies, Inc., and may be used only to describe products of Touch Technologies, Inc.:

DYNAMIC TAPE ACCELERATOR   INTOUCH 4GL   CleanDisk  REMOTE TAPE FACILITY

DYNAMIC LOAD BALANCER

The following are trademarks of Digital Equipment Corporation, and may be used only to describe products of Digital Equipment Corporation:

DBMS   DCL   DECNET   RDB   RMS   OpenVMS   VMS

Copyright ©1991, 1995 Touch Technologies, Inc.

Footnotes

® OpenVMS is a registered trademark of Digital Equipment Corporation.

Preface

Over the last 15 years, the speed of OpenVMS processors has increased by a factor of 30. Over the same period of time, disk I/O systems have sped up by a factor of only two. This imbalance has caused most OpenVMS/VMS systems to switch from being CPU bound to being I/O bound.

Traditional third-generation languages are very I/O intensive. As the number of users accessing data files increases, the I/O system becomes a severe bottleneck.

Purpose

The purpose of this manual is to compare different languages' capabilities to handle I/O and their effects on performance. The compared languages will include a number of OpenVMS/VMS traditional 3GLs (COBOL, FOR-TRAN, ...) as well as INTOUCH, a high-performance 4GL.

In addition, possible solutions to I/O bottlenecks will be discussed.

Topics to be covered include:

Buffered I/Os and Direct I/Os
Language I/O Benchmarks
Speeding up I/O Operations
Eliminating I/O Operations
RMS Buffering
Disk and File Defragmentation

Chapter 1
BUFFERED I/Os VERSUS DIRECT I/Os

1.1 I/O Request Logic Flow

An application executes a write statement (issues an RMS $PUT). The application sends a request to RMS to issue an I/O. RMS submits a QIO.
The QIO mechanism in VMS is used to issue an I/O request which goes to the Function Decision Table (FDT).
From the FDT, the I/O request then goes to the DEVICE DRIVER START I/O.
The DEVICE DRIVER START I/O sends the I/O request to the CONTROLLER.
Finally, the CONTROLLER submits the I/O request to the device.

1.2 QIO s

The QIO can be a direct I/O or a buffered I/O depending on the device being accessed. I/Os to disk, tapes, etc. are direct I/Os. I/Os to DECNET, terminals, etc. are buffered I/Os.

1.2.1 Direct I/Os

A direct I/O is when VMS accesses the user's buffer directly for the I/O operation. The DEVICE DRIVER START I/O then takes the data in the user's buffer and submits to the CONTROLLER the contents of the user's buffer. The CONTROLLER then submits the I/O request to the device.

The application cannot issue another I/O request until the first I/O operation is completed. Direct I/Os are designed for synchronous operations.

A synchronous I/O operation is when one I/O must complete before another I/O request can be issued.

1.2.2 Buffered I/Os

A buffered I/O is when VMS makes a copy of the user's buffer. The copied buffer is allocated out of Non-paged dynamic memory (NPAGEDYN).

VMS uses the copied buffer in NPAGEDYN to complete the requested I/O operation. Thus, the application can issue a new I/O request to the user's buffer without having to wait for the first I/O to complete. Buffered I/Os are designed for asynchronous operations.

An asynchronous I/O operation is when a second I/O request can be submitted before the previous I/O request has been completed. LANGUAGE I/O BENCHMARKS

Chapter 2
LANGUAGE I/O BENCHMARKS

2.1 Screen I/O s

Digital-provided traditional third generation languages (3GLs), by default, perform synchronous buffered I/Os when issuing screen write requests. A synchronous buffered I/O is a buffered I/O where the first I/O must complete before the next I/O can be issued.

Each time the application requests multiple screen writes, 3GLs perform a buffered I/O, wait for the I/O to complete, perform another buffered I/O, wait for the I/O to complete, and so on. The application cannot overlap I/Os.

Terminals perform buffered I/Os so that the application doesn't have to wait for an I/O to complete before issuing another I/O request. However, 3GLs, by default, wait for the I/O to complete, negating the advantage provided by buffered I/Os.

A multiple screen write request is issued such as:


        print ...          in BASIC 

        display ....       in COBOL

        print ...          in FORTRAN

        printf (...);      in C

For each screen write request, a synchronous buffered I/O is performed. If you have 1000 write requests, each request must be completed before the next write request can be issued.

2.1.1 Building a Buffer

INTOUCH is a high-performance 4GL designed to reduce I/O overhead. INTOUCH does this by packetizing the I/O requests.

INTOUCH packetizes the I/O requests by building a buffer the size of the SYSGEN parameter MAXBUFCNT. The first screen write request is put in the buffer. If an additional screen write request is issued, INTOUCH puts this request in the buffer. When either the buffer is full, or 1/10 of a second has gone by, INTOUCH issues a SINGLE asynchronous buffered I/O request.

The effect of issuing one asynchronous buffered I/O for multiple write requests is a reduction in QIOs. A reduction in QIOs results in a reduction in CPU time and a reduction in elapsed time.

2.1.2 Benchmark

A simple program was written to benchmark various programming languages' speed in performing multiple screen write requests. This program was written to count from 1 to 1000 and print each number to the screen. The following languages were used: BASIC, COBOL, FORTRAN, C, and INTOUCH.

The results of running this benchmark program are listed below. This test was run on a OpenVMS 3100.

Table 2-1 Buffered I/O to the Screen Benchmark Summary
Language Buffered I/Os CPU Time Seconds Elapsed Seconds
BASIC 1,000 2.76 8.00
FORTRAN 1,003 2.68 8.00
COBOL 1,002 2.87 8.00
C 1,050 2.86 7.00
INTOUCH 3 .79 6.00

Table 2-1 Buffered I/O to the Screen Benchmark Summary
Language	Buffered I/Os	CPU Time Seconds	Elapsed Seconds
BASIC	1,000	2.76	8.00
FORTRAN	1,003	2.68	8.00
COBOL	1,002	2.87	8.00
C	1,050	2.86	7.00
INTOUCH	3	.79	6.00

Because INTOUCH packetizes the screen I/O requests and sends the buffered I/Os asynchronously, the total number of buffered I/Os is significantly reduced. As a result, CPU time is also reduced significantly.

INTOUCH dramatically improves performance and removes bottlenecks associated with screen I/O and interactive applications.

2.2 File I/O s

Digital-provided 3GLs, by default, use RMS to issue I/Os to data files. The 3GL application issues an I/O request to RMS. RMS then issues a QIO. The QIO is a direct I/O when accessing a data file.

2.2.1 Populating a Data File

There are many RMS options that can be activated when opening a data file, but, by default, the traditional 3GLs use almost none of the RMS options. The use of the RMS options is complex and often not well understood.

2.2.2 Benchmark

A program was run to benchmark the number of direct I/Os that various programming languages use when populating (writing data to) a data file. The program populated an empty indexed data file by writing 1000 records.

The results of running the benchmark program are listed below. This test was run on a OpenVMS 3100.

Table 2-2 Populating an Indexed File Benchmark
Language Direct I/Os CPU Time Seconds Elapsed Seconds
BASIC 1,814 8.92 70.00
FORTRAN 422 6.40 23.00
COBOL 1,814 9.55 62.00
INTOUCH 164 2.66 11.00

Table 2-2 Populating an Indexed File Benchmark
Language	Direct I/Os	CPU Time Seconds	Elapsed Seconds
BASIC	1,814	8.92	70.00
FORTRAN	422	6.40	23.00
COBOL	1,814	9.55	62.00
INTOUCH	164	2.66	11.00

2.2.3 I/Os Performed

BASIC and COBOL both performed 1814 direct I/Os. However, FORTRAN performed only 422 direct I/Os. This is because FORTRAN uses the RMS deferred I/O option when writing new data records. In deferred I/O operations I/Os are written to local data buffers. Once the local data buffers are all full, they are written out to the data file, decreasing the number of direct I/Os.

INTOUCH performed only 164 direct I/Os. The results of the benchmark program show that the number of direct I/Os is lowest when using INTOUCH. INTOUCH also uses the RMS deferred I/O option, but INTOUCH packetizes the I/O requests. Packetized requests are written to the data file either when the buffers are all full or after one second has elapsed.

In addition, INTOUCH dynamically controls local and global data buffering to further reduce the direct I/Os to the data file.

INTOUCH, by default, optimizes I/O operations. The programmer does not have to know special programming to utilize this INTOUCH feature.

Because INTOUCH reduces the direct I/Os, both elapsed time and CPU time are dramatically reduced. (Over twice as fast as FORTRAN, and almost six times faster than COBOL!!)

2.2.4 Reading Data Records

When reading a data record in a sequential file, 3GLs, by default, read 16 blocks of data at a time.

INTOUCH dynamically adjusts the number of blocks read at one time to significantly reduce the number of direct I/Os. In addition, INTOUCH performs read ahead operations. The read ahead option tells RMS to read a second buffer of data as the first one is being processed by the application.

2.2.5 Benchmark

A program was written to benchmark the number of direct I/Os each programming language uses when reading 1300 records from a sequential file. The results of running the benchmark program are listed below. This test was run on a OpenVMS 3100.

Table 2-3 Reading Data Structures Benchmark Summary
Language Direct I/Os CPU Time Seconds Elapsed Seconds
BASIC 47 4.06 7.0
FORTRAN 44 5.56 9.0
COBOL 45 4.14 6.0
INTOUCH 10 3.01 6.0

Table 2-3 Reading Data Structures Benchmark Summary
Language	Direct I/Os	CPU Time Seconds	Elapsed Seconds
BASIC	47	4.06	7.0
FORTRAN	44	5.56	9.0
COBOL	45	4.14	6.0
INTOUCH	10	3.01	6.0

Because INTOUCH dynamically adjusts the number of blocks read at one time and does read aheads, the number of direct I/Os performed by INTOUCH was only 10. The 3GLs had to perform over 40 direct I/Os for the same operation. The CPU time required by INTOUCH was also 2.55 to 1.13 seconds LESS than any of the 3GLs.

2.2.6 Updating Data Records

When updating a data record, traditional 3GLs, by default, fetch the data record, make the requested change, then write the record to disk. This operation is done for each record.

2.2.7 Updating Data Record Example

A buffer of data is read---records A, B and C are read into the buffer.


                --------------------------------
                |    A    |     B    |     C   |
                --------------------------------

Record A is updated. The traditional 3GLs then write the WHOLE buffer back out to disk, even though only record A has been changed.

Record B now needs to be updated. The traditional language reads 16 blocks of data from disk again if needed, updates record B, then rewrites the WHOLE buffer back out to disk.

INTOUCH packetizes the requested I/O updates. The packetizing of the I/Os is internal to INTOUCH.

2.2.8 Benchmark

A program was written to benchmark the number of direct I/Os each programming language uses when updating 1300 data records from a sequential file. The results of running the benchmark program are listed below. This test was run on a OpenVMS 3100.

Neither FORTRAN or C have language syntax that allows them to update data in a sequential file. So those languages were left out of this benchmark.

Table 2-4 Updating Data Records Benchmark Summary
Language Direct I/Os CPU Time Seconds Elapsed Seconds
BASIC 1,429 9.12 60.00
COBOL 1,428 8.50 59.00
INTOUCH 20 6.46 8.00

Table 2-4 Updating Data Records Benchmark Summary
Language	Direct I/Os	CPU Time Seconds	Elapsed Seconds
BASIC	1,429	9.12	60.00
COBOL	1,428	8.50	59.00
INTOUCH	20	6.46	8.00

Because INTOUCH packetizes the I/O requests, the number of direct I/Os is dramatically reduced. INTOUCH performed only 20 direct I/Os as opposed to the other languages performing OVER 1400 direct I/Os. CPU time and elapsed time are also significantly lower.

If you have a 3GL, you can reduce the I/O overhead by using the Digital provided RMS options. REDUCING FILE I/O BOTTLENECKS

Chapter 3
REDUCING FILE I/O BOTTLENECKS

3.1 Reducing File I/O Activity

There are steps that can be taken to reduce the I/O overhead. The steps are:

determine which files are the hot files (those with high I/O counts)
take actions that reduce the I/O bottlenecks caused by the hot files

By analyzing file I/O operations, hot files (those with high I/O counts) are identified. Hot files consume valuable I/O resources. Once identified, hot files can be moved to your fastest disk devices. Files with high read/write ratios are excellent candidates for local and global data buffering, or can be moved to a RAM disk.

Two major actions can be taken to reduce the I/O bottlenecks caused by files with high I/O counts:

speed up the I/O operations
eliminate the I/O operations

3.2 Speeding Up I/O Operations

Speeding up a file's I/O operations can be accomplished by moving the file to a faster or less busy device or by moving the file across multiple spindles (as in a shadow set). Both read and write operations can be sped up using these methods.

3.3 Eliminating I/O Operations

Eliminating file I/O operations can be accomplished in a number of ways. Some of these ways include:

Table 3-1 Methods of Eliminating File I/O
Method Result
host based data caching speeds up file reads
RMS global buffering speeds up file reads
RMS file converts speeds up both reads and writes
RMS local buffering speeds up both reads and writes
disk defragmentation speeds up both reads and writes
file defragmentation speeds up both reads and writes

Table 3-1 Methods of Eliminating File I/O
Method	Result
host based data caching	speeds up file reads
RMS global buffering	speeds up file reads
RMS file converts	speeds up both reads and writes
RMS local buffering	speeds up both reads and writes
disk defragmentation	speeds up both reads and writes
file defragmentation	speeds up both reads and writes

Note
Both RMS local buffering and global buffering can be requested for a specific file.

3.4 Host Based Data Caching

Host based data caching uses free memory for high-speed data caching. I/O requests to the file are intercepted by the caching system. If the I/O request is a write operation, the data is passed to the disk device. No speed up occurs. If a read I/O request is intercepted and the requested data is already in the memory data cache, the request is satisfied with a very fast memory move. No actual I/O to the disk occurs. Host based data caching systems are available from a number of commercial software vendors.

3.5 RMS Buffering

RMS moves data from the disk into memory buffers. From the buffers, data is moved into the application program. Whenever the requested data can not be found in a data buffer, RMS must access the disk to find the data. Accessing the disk is much slower than getting information from a data buffer.

RMS provides two types of file data buffers. These are:

local buffers
global buffers

Local data buffers are not shared among processes. Local buffers can only be accessed by the process that they were created for. When RMS opens an indexed file, by default it creates two local data buffers.

Global data buffers are shared among processes. Global buffers can be accessed by all processes that have the file open. By default RMS does not create any global data buffers.

File I/Os can be reduced using either or both of these buffering methods. However, increased buffering requires additional system resources. To avoid running out of system resources, both SYSGEN and AUTHORIZATION (SYSUAF) parameter changes are needed.

3.5.1 RMS Local Buffering

RMS indexed files with high file I/O counts can benefit from increased local buffering. As the number of local buffers is increased, more I/O requests can be satisfied from the local buffer cache in memory, reducing the number of disk I/Os. In some cases, even write requests can be sped up using local buffering (for deferred write operations).

The number of local buffers used by RMS indexed files can be set on either a per-process or system-wide basis. In either case, the Digital provided SET RMS command is used to specify the number of local buffers.

For example, to set the number of local buffers used for indexed files for ALL users on the system to eight, the following DCL command is used:


  $ SET RMS/SYSTEM/INDEX/BUFFER=8

To set the number of local buffers used for indexed files for JUST THIS PROCESS to ten, the following DCL command is used:


  $ SET RMS/INDEX/BUFFER=10

The SET RMS command takes effect the next time a file is open.

3.5.2 RMS Global Buffering

RMS based hot files with high read I/O percentages (75% or greater) can benefit from increased global buffering. As the number of global buffers is increased, more read I/O requests can be satisfied from the global buffer cache. Write requests are written directly to the disk and are not sped up by global buffering.

To specify the number of global buffers to be used on a file, the file must be closed. To set the number of global buffers on file MYFILE.DAT to thirty, the following DCL command is used:


  $ SET FILE myfile.dat/GLOBAL=30

After the global buffers are set up on a cluster, the global buffers are created the first time the file is accessed cluster-wide.

3.5.3 A RMS Global Buffering Example

Global buffering uses address space and may use physical pages off the free list. For example, you have a file that has 2 buckets and you set up 30 global buffers. When the file is first accessed, 60 pages of address space is allocated (2 buckets x 30 global buffers) to the user's process. The number of physical pages allocated, for the first accessor to the file, can be from 0 to 60 pages depending on what the user is doing. The second accessor to the file would not use any additional physical pages because global buffers are shared among processes. The second accessor would, however, have 60 pages of address space allocated to their process. So, for each user accessing the file, an additional 60 pages of address space is allocated. However, no additional physical memory pages are used --- those are shared.

3.5.4 Monitoring RMS Cache Hits

VMS version 5.0 and higher provides a utility for monitoring RMS buffer caching activity.

3.5.5 Statistics Option

To perform RMS monitoring, the file to be monitored must first have the statistics option set. The statistics option takes up a small amount of space in the file header. However, there is no overhead in collecting statistics because VMS always collects this data. The statistics option just allows the user to display the data.

In order to SET the statistics option on a file, the file must be closed. To set statistics on the file MYFILE.DAT, the following DCL command is used:


  $ SET FILE myfile.dat/STATISTICS

After the statistics option has been set on the file, the following MONITOR command is used:


  $ MONITOR RMS/FILE=myfile.dat/ITEM=CAC

The Digital provided MONITOR RMS utility provides both LOCAL and GLOBAL buffer caching information. The higher the cache hit percent shown in the display, the better the I/O performance of the file.


                      OpenVMS/VMS Monitor Utility
                        RMS CACHE STATISTICS
                           on node TTI
                         1-DEC-1989 21:52:11
(Index)  SALES_MASTER.DAT;1
Active Streams:   2           CUR        AVE        MIN        MAX

  Local Cache Hit Percent    37.00      36.65       0.00      40.00
  Local Cache Attempt Rate   51.16       5.53       0.00      51.16
  Global Cache Hit Percent   57.00      57.02       0.00     100.00
  Global Cache Attempt Rate  31.89       3.50       0.00      31.89
  Global Buf Read I/O Rate   13.95       1.48       0.00      13.95
  Global Buf Write I/O Rate   0.00       0.00       0.00       0.00
  Local Buf Read I/O Rate     0.00       0.02       0.00       0.33
  Local Buf Write I/O Rate    0.00       0.00       0.00       0.00

If only Global buffers are set, the Local Cache Hit Percent will be zero because VMS looks in the Local Buffers before looking in the Global buffers. If the requested data is not in the Local Buffers, the Global buffers are searched for the data. If the data is not in the Global buffers, VMS gets the data from disk. VMS then puts the data in a Global buffer since Global buffers were the last place VMS checked for the data.

3.5.6 SYSGEN Parameter Changes

RMS global buffering requires increased use of VMS global pages and global sections. In addition, some RMS related SYSGEN parameters must be changed. The following MINIMUM SYSGEN parameter values are recommended when global buffering is specified:

Table 3-2 RMS Global Buffering Minimum SYSGEN Parameter Values
SYSGEN Parameter Name Minimum Value
GBLPAGFIL 16384
RMS_GBLBUFQUO 16384
GBLPAGES 50000
GBLSECTIONS 800

Table 3-2 RMS Global Buffering Minimum SYSGEN Parameter Values
SYSGEN Parameter Name	Minimum Value
GBLPAGFIL	16384
RMS_GBLBUFQUO	16384
GBLPAGES	50000
GBLSECTIONS	800

Both RMS local buffering and global buffering require increased use of VMS locking, address space and synchronization resources. The following MINIMUM SYSGEN parameter values are recommended when either local buffering or global buffering is specified:

Table 3-3 RMS Local Buffering Minimum SYSGEN Parameter Values
SYSGEN Parameter Name Minimum Value
IRPCOUNT 500
LOCKIDTBL 4000
LOCKIDTBL_MAX 16000
PQL_MENQLM 600
RESHASHTBL 2500
SRPCOUNT 4500
VIRTUALPAGECNT 35000
PQL_MPGFLQUO 35000
PQL_MBYTLM 35000

Table 3-3 RMS Local Buffering Minimum SYSGEN Parameter Values
SYSGEN Parameter Name	Minimum Value
IRPCOUNT	500
LOCKIDTBL	4000
LOCKIDTBL_MAX	16000
PQL_MENQLM	600
RESHASHTBL	2500
SRPCOUNT	4500
VIRTUALPAGECNT	35000
PQL_MPGFLQUO	35000
PQL_MBYTLM	35000

To view the number of global sections and global pages used you can enter:


        $ INSTALL:==$INSTALL/COMMAND
        $ INSTALL LIST/GLOBAL/SUMMARY

        Summary of Local Memory Global Sections

    272 Global Sections Used,  21964/13036 Global Pages Used/Unused

The SYSGEN parameter GBLSECTIONS is the total number of global sections.

3.6 Disk Defragmentation

Disk defragmentation is the process that causes files to become physically contiguous. Contiguous files can be accessed with fewer I/O operations than non-contiguous files. The two ways to defragment a disk are to do a full BACKUP and RESTORE to the target disk or to use a commercially available disk defragmentation product.

3.7 RMS File CONVERSION

As RMS based files are written to, they become internally fragmented and disorganized. Over time, both read and write operations cause extra physical I/O operations to the RMS file due to this fragmentation. The Digital provided CONVERT utility can be used to defragment and reorganize RMS files. To convert the file MYFILE.DAT, at the DCL prompt enter:


        $ CONVERT myfile.dat  myfile.new

        $ RENAME  myfile.new  myfile.dat;  (note the trailing ";")

This two-step process safely converts and reorganizes an RMS file.

Note
If the CONVERT fails, DO NOT DO THE RENAME. THIS INSURES THE INTEGRITY OF YOUR ORIGINAL UNCONVERTED FILE.

3.8 File Defragmentation

If you don't have the time to defragment all of your disks, you can instead defragment your most badly fragmented hotfiles one at a time.

VMS provides a way to defragment individual files. There are three steps to the defragmentation process:

create a .FDL for the file
customize the .FDL file as needed
convert and rename the file

3.8.1 Create a .FDL for the file

A .FDL is a file definition language file. This file can be used with the Digital provided VMS CONVERT utility to defragment a file. To create a .FDL for the file MYFILE.DAT you would use the following DCL command:


  $ ANALYZE/RMS/FDL MYFILE.DAT

The ANALYZE command creates a file called MYFILE.FDL. The .FDL is a text file containing a description of MYFILE.DAT.

3.8.2 Customize the .FDL file

Using the text editor of your choice, edit the .FDL file and insert the text "best_try_contiguous yes" as shown:


  FILE
          best_try_contiguous     yes      <--- the inserted text
          ALLOCATION              nnn
          ORGANIZATION            xxx
            .
            .
            .

3.8.3 Converting and Renaming

The Digital provided CONVERT utility can be used to defragment and reorganize your files using a .FDL. Any time you change an .FDL you need to do a convert. To convert and defragment the file MYFILE.DAT, at the DCL prompt enter:


        $ CONVERT/FDL=myfile.fdl myfile.dat  myfile.new

        $ RENAME  myfile.new  myfile.dat;  (note the trailing ";")