.LM10.RM70.PS58,70.SP1.NHY
.NF.NJ.NONUMBER
#
.B20.C
THE PME PERFORMANCE MEASUREMENT
.B2.C
AND EVALUATION PACKAGE
.B4.C
by
.B.C
Bert Beander
.B.C
Technical Languages
.B.C
Digital Equipment Comporation
.B4.C
December 5, 1979
.PG
#
.B10.C
TABLE OF CONTENTS
.B3.C
1.0  Introduction  . . . . . . . . . . . . . . . . .  1
.B.C
2.0  Information Flow in the Package   . . . . . . .  2
.B.C
3.0  PMECLOCK: Clock-Driven Sampling   . . . . . . .  5
.B.C
4.0  PMETRACE: Trace-Driven Sampling   . . . . . . .  7
.B.C
5.0  PMEBUILD: Building the Bucket File  . . . . . .  9
.C
5.1  Defining the Program Structure  . . . . . . . .  9
.C
5.2  Defining Program Unit Address Ranges  . . . . . 11
.C
5.3  Defining Sampling Buckets   . . . . . . . . . . 14
.C
5.4  Specifying Options  . . . . . . . . . . . . . . 16
.C
5.5  Error Recovery  . . . . . . . . . . . . . . . . 17
.C
5.6  Examples of Use   . . . . . . . . . . . . . . . 18
.B.C
6.0  PMEHISTO: Printing the Performance Histogram  . 20
.B.C
7.0  Suggested Command File Setup  . . . . . . . . . 22
.B.F.J.NUMBER0
.PG
.HL1 INTRODUCTION
.I5
The PME performance measurement and evaluation package is a tool
for measuring where a user's program is spending its time.  To do
so, the package periodically samples the program counter of the
running program, determines what program section each such sample falls
in, and displays the resulting information in histogram form.
.B.I5
The PME package consists of four parts called PMECLOCK, PMETRACE, PMEBUILD,
and PMEHISTO.  PMECLOCK consists of subroutines which collect program counter
samples by trapping a clock interrupt every 10 milliseconds.  PMETRACE
consists of subroutines which collect program counter samples by tracing
the user program; it thus retrieves every single instruction's program
counter value, but it also takes much more time than sampling on clock
interrupts.
.B.I5
PMEBUILD is the program through which the user speci- fies how
his program is to be divided into sections called ^&buckets\&.  Each
bucket is defined by an address range, and contains a counter which accumulates
the number of program counter samples in that address range.  Finally,
PMEHISTO is the program which prints the accumulated data in histogram form
with one histogram bar per bucket.
.B.I5
These four parts are described in detail in Sections 3 - 6 of this manual.
But first, Section 2 describes the overall structure of the PME package
and how information is communicated between the parts.
.PG
.HL1 INFORMATION FLOW IN THE PACKAGE
.I5
The input to the PME package consists of files and terminal input which
specify the structure and address ranges of the program sections whose
performance the user wants to measure.  The output is the histogram displays
which show where the program spends its time.  This section describes where
the input comes from, where the output goes to, and what intermediate files
are needed to collect the actual program counter samples.
.B.I5
The PMEBUILD program requires as input "program definition statements"
which can come from a "program definition file", from the user's terminal,
or from a combination of the two.  These statements specify the structure
of the program to be measured, i.e. how it is divided in phases, how the
phases are divided into modules, how the modules are broken into routines,
etc.  They can also specify the actual start and end addresses of these program
units, and they can specify certain options.  Finally, program
definition statements are used to specify how the program is to be broken
into "buckets" for the data collection.  As mentioned above, a ^&bucket\&
is defined by an address range (such as the start and end addresses of a program
module) and contains a counter which records the number of program counter
samples found within that range.
.B.I5
The exact formats of the program definition statements are described in
Section 5 below.  The program definition file normally has the extension .PMD.
.B.I5
If an appropriate program definition statement so specifies, the PMEBUILD
program can also retrieve much of the information it needs from the user
program's linker map (the .MAP file) or from the traceback information in
the executable image (the .EXE file).  In particular, module name and
start and end addresses can be extracted from the .MAP file, and both
module and routine names and the corresponding address ranges can be 
extracted from the .EXE file.
.B.I5
The output of PMEBUILD is a single file called a "bucket file".  This file
contains all necessary information about how the user has divided his
program into buckets.  Since no program counter values have been tallied
in it, it is an "empty" bucket file.  Bucket files have the extension .PME
by default.
.B.I5
When clock-driven traps are used to collect program counter values, the
user's program calls the PMECLOCK subroutines which write the collected
program counter values out to a "sample file".  Its default extension
is .PMS.  The sample file and the empty bucket file then serve as
input to PMEHISTO, which tallies the samples in the appropriate buckets
and produces a histogram showing the number of tallies in each bucket.
PMEHISTO also produces a "filled" bucket file, also with extension .PME,
which contains all information in the empty bucket file plus the counts
for each bucket.
.B.I5
When tracing is used to collect program counter values, no .PMS file is
written because the volume of data collected (one value for every instruction
executed) is too large.  Instead, the PMETRACE subroutines accept as input
an empty bucket file and produce as output a filled bucket file.  The filled
bucket file can then be passed to PMEHISTO to produce the histogram.
.B.I5
The PMEHISTO program produces two pieces of output.  One is a histogram
file, with a default extension of .HIS, which can be sent to a line printer.
The other is a histogram display on the user's terminal.  This display allows
the user to examine one page at a time and to cycle through the histogram
repeatedly if he so desires.  Each bar in the histogram corresponds to one
bucket and shows the relative proportion of the total processor time spent
in that bucket.  The bucket's symbolic name, address range, and percentage
of the total count is also displayed.
.B.I5
The overall structure of the PME package, including the flow of data between
programs and files, is summarized in the figure on the next page.
.PG.NF.NJ
  +--------------------+    +----------+    +------------+
  | Program Definition |    |  Linker  |    | Executable |
  |  File (.PMD) and   |    | Map File |    | Image File |
  |  user's terminal   |    |  (.MAP)  |    |   (.EXE)   |
  +---------+----------+    +----+-----+    +-----+------+
            |                    |                |
            |                    |                |
            V                    V                V
     **************************************************
     *                                                *
     *                PMEBUILD program                *
     *                                                *
     **************************************************
                             |
                             |
                             V
******************    +--------------+    ******************
*                *    | Empty Bucket |    *                *
*     User's     *    | File (.PME)  |    *     User's     *
*     program    *    +--------+--+--+    *     program    *
*                *             |  |       *                *
*  ***************             |  |       ***************  *
*  *  PMECLOCK   *             |  +------>*  PMETRACE   *  *
*  * subroutines *----+        |          * subroutines *  *
******************    |        |          ******************
                      |        |                 |
                      V        |                 |
               +-------------+ |                 |
               |  PC Sample  | |                 |
               | File (.PMS) | |                 |
               +------+------+ |                 |
                      |        |                 V
  +---------------+   |        |         +---------------+
  | Filled Bucket |   |  +-----+         | Filled Bucket |
  |  File (.PME)  |   |  |               |  File (.PME)  |
  +---------------+   |  |               +-------+-------+
          A           |  |                       |
          |Clock Input|  |           Trace Input |
          |           V  V                       V
     **************************************************
     *                                                *
     *                PMEHISTO program                *
     *                                                *
     **************************************************
                 |                        |
                 |                        |
                 V                        V
        +------------------+     +-----------------+
        | Histogram Print- |     | Terminal Histo- |
        | out File (.HIS)  |     |  gram Display   |
        +------------------+     +-----------------+
.B.F.J
.PG
.HL1 PMECLOCK: CLOCK-DRIVEN SAMPLING
.I5
The PMECLOCK subroutines sample the program counter by trapping clock
interrupts every 10 milliseconds.  They thus accumulate approximately
100 samples per second.  The simplest way of doing clock sampling is to link
the user program with the /DEBUG=PME qualifier:
.B.C
 $ LINK/DEBUG=PME  user-modules
.B
Here "PME" is an object module in the PME package which includes the
PMECLOCK subroutines.  When linked in this way, this module is invoked by
VMS as if it were the debugger.  It thus gets control before the user program.
This allows it to initiate clock sampling before starting the user program
and to terminate such sampling after the user program terminates.  The program
counter samples are accumulated in a file called PMEFILE.PMS.
.B.I5
For example, if the user program consists of modules A, B, and C, where A is
the main program, a clock sample file is built by these two commands:
.B.C
 $ LINK/DEBUG=PME  A,B,C
.C
 $ RUN  A              #
.B
Note that no editing or recompilation of any source modules is needed to
use PMECLOCK in this case.
.B.I5
However, if more flexibility is needed in initiating or terminating the
collection of program counter samples, or if a different name is desired
for the output file, the PMECLOCK subroutines must be explicitly called
from the user program.  To initiate sampling, the user program calls this
entry point:
.b.c
CALL PME__INIT
.b
This creates and initializes a sample file whose default name is PMEFILE.PMS. 
It also requests a clock interrupt in 10 milliseconds and sets up a handler
for it.  When the interrupt occurs, the handler retrieves the interrupted
program counter value, saves it for the output file, and requests another
clock interrupt 10 milliseconds later, when the whole cycle repeats
itself.  The file buffer holds 128 samples, so a file write occurs
approximately once every 1.3 seconds.
.B.I5
To terminate sampling, the user program calls this entry point:
.b.c
CALL PME__EXIT
.b
This stops the clock interrupts, writes the last buffer to the sample
file, and closes that file.  No more program counter samples are collected
thereafter.
.B.I5
If a name other than the default PMEFILE.PMS is desired for the output
file, this call can be used:
.b.c
CALL PME__SNAME (^&filename\&)
.b
where ^&filename\& is a Fortran character string containing the desired
file name.  If no extension is included, it defaults to .PMS.  This
string must be passed by descriptor (see Appendix C of the VAX11/780
Architecture Handbook); Fortran does this automatically, but other
languages may not.  The string may contain trailing blanks but must otherwise
be a valid VAX/VMS filename.
If called, PME__SNAME must be called ^&before\& PME__INIT.
.B.I5
PMECLOCK also has two dummy entry points called PME__INAME and
PME__ONAME; they both return immediately and do nothing.  They are
included for compatibility with PMETRACE so that a user program set up
to call PMETRACE can be changed to call PMECLOCK by simply relinking
it; no source changes or recompilations are needed.
.B.I5
After being compiled to call PME__INIT and PME__EXIT (and possibly
PME__SNAME), the user's program must be linked to include object module
PMECLOCK.  The /DEBUG=PME qualifier should ^&not\& be used in this case.
.PG
.HL1 PMETRACE: TRACE-DRIVEN SAMPLING
.I5
The PMETRACE subroutines sample the program counter by tracing the user's
program, thus retrieving every single instruction's program counter value.
To initiate sampling, the user program calls this entry point:
.b.c
CALL PME__INIT
.b
This opens, reads in, and closes an empty bucket file (whose default name
is PMEFILE.PME), and it starts tracing the user's program to collect
program counter samples.  These samples are directly tallied in the proper
buckets as defined by the input file.  Tracing is a slow process which
increases the program's execution time  by about 300-fold, but it collects
a large number of samples.
.B.I5
To terminate sampling, the user program calls this entry point:
.b.c
CALL PME__EXIT
.b
This stops the tracing, creates a filled bucket file whose default name
is PMEFILE.PME, writes the accumulated bucket information to that file,
and closes the file.  This file can then be used as input to PMEHISTO.  No
more program counter samples are collected after the PME__EXIT call.
.B.I5
If a name other than the default PMEFILE.PME is desired for the
input bucket file, this entry point may be called:
.b.c
CALL PME__INAME (^&filename\&)
.b
Similarly, if a different name is desired for the output bucket file,
this entry point may be called:
.b.c
CALL PME__ONAME (^&filename\&)
.b
In either case, ^&filename\& is a Fortran character string containing
the desired file name.  If no extension is included, it defaults to .PME.
This string must be passed by descriptor (see Appendix C of the VAX11/780
Architecture Handbook);  Fortran does this automatically but other languages
may not.  The string may contain trailing blanks but must otherwise be
a valid VAX/VMS file name.
.B.I5
If called, PME__INAME must be called before PME__INIT and PME__ONAME
must be called before PME__EXIT.
.B.I5
PMETRACE also has a dummy entry point called PME__SNAME.  This entry
point returns immediately and does nothing.  It is included for compatibility
with PMECLOCK so that a program set up to call PMECLOCK can be changed
to use PMETRACE by simply relinking it.
.B.I5
After being compiled to call PME__INIT and PME__EXIT (and possibly
PME__INAME and PME__ONAME), the user's program must be linked to
include object module PMETRACE.
.PG
.HL1 PMEBUILD: BUILDING THE BUCKET FILE
.I5
The PMEBUILD program creates a new bucket file based on the program
and bucket definitions entered by the user.  PMEBUILD is invoked as
follows:
.b.c
$ RUN PMEBUILD
.B.I5
PMEBUILD first asks the user if he wishes to specify the input (.PMD) and output (.PME)
file names.  If the answer is "N" (for No), these default names are used:
.b.tp2.nf.nj.i8
Input Program Definition File:  PMEFILE.PMD
.i8
Output Empty Bucket File:       PMEFILE.PME
.b.f.j
If the answer is "Y" (for Yes), PMEBUILD asks the user to enter each of
the two file names.  A blank response to either query causes the corresponding
default name to be used.  If SYS$INPUT is specified as the input file, all
input is taken from the user's keyboard.  Also, if the input file cannot
be opened, all input is taken from the keyboard.
.B.I5
Once the input file is opened, statements are read from this file until the
file ends or an END statement is encountered.  If no END statement is
found, additional input is solicited from the user's terminal until an END
statement is entered.
.B.I5
In the sections below, the actual input statements are described in the
order they would normally be entered through the program definition file
and then the user's keyboard.
.HL2 ^&Defining the Program Structure\&
.I5
The first thing PMEBUILD needs to know is how the user's program is
divided into subunits such as phases, modules, and routines.  The possible
^&program\& ^&unit\& ^&kinds\& are declared as follows:
.b.nf.nj.i11
DEFINE UNITS: ^&kind1\&, ^&kind2\&, ..., ^&kindn\&
.b.f.j
This says that the user's program is to be broken into units called
^&kind1\&, ^&kind2\&, ..., ^&kindn\& where ^&kind2\& is a subunit of ^&kind1\&,
^&kind3\& is a subunit of ^&kind2\&, and so on.  For example,
.b.nf.nj.i12
DEFINE UNITS: PROGRAM, PHASE, MODULE
.b.f.j
declares that the largest unit, called PROGRAM, is divided into smaller
units called PHASEs, and each PHASE is divided into smaller yet units
called MODULEs.  These are names of the user's choosing, and up to ten
such subdivisions may be declared.
.B.I5
In addition to declaring program unit kinds, DEFINE UNITS declares that
all following program unit specifica- tions are to be read in "define units
mode," meaning that they declare the structure of the user's program.  This
is best illustrated by example:
.B.tp11.NF.NJ
.I12
DEFINE UNITS: PROGRAM, PHASE, MODULE
.I12
PROGRAM CRUNCH
.I16
PHASE READ__DATA
.I20
MODULE INITIALIZE
.I20
MODULE READER
.I16
PHASE PROCESS__DATA
.I20
MODULE INVERT
.I20
MODULE MINIMIZE
.I20
MODULE COMPUTE
.I16
PHASE PRINT__DATA
.I20
MODULE PRINTER
.B.F.J
Here three ^&kinds\& of program units called PROGRAM, PHASE, and MODULE
are declared.  A program CRUNCH is then declared and its
structure is defined:  it consists of three PHASEs, each of which consists
of one to three MODULEs.  This structure can later be referenced to define
sampling buckets.
.B.I5
As the example illustrates, a program unit specifica- tion has, in its
simplest form, the following format:
.b.c
^&kind\& ^&unitname\&
.b
where ^&kind\& is a previously declared program unit kind and ^&unitname\&
is the name of the new program unit. In its most general form, a program
unit specification looks like this:
.B.C
^&kind\& ^&unitname\&, ^&start\& - ^&end\&, ^&step\&, ^&keyword\&
.b
Here ^&start\& and ^&end\& specify the program unit's start and end
addresses in hexadecimal, and ^&step\& specifies that this program
unit should be broken into equal-sized buckets, each ^&step\& bytes long,
if it is included in the output bucket file. ^&step\& is specified in
hexadecimal.  ^&keyword\& is only meaningful if ^&kind\& is MODULE;
the keyword LINES in this position specifies that high level language
line numbers (e.g., Fortran line numbers) should be extracted from the
executable image for this module (see Section 5.2 below). The keyword
NOLINES specifies that line numbers should not be retrieved.  Both the
LINES and NOLINES keywords can be overridden by the DEFINE OPTIONS
command; see Section 5.4.  If no keyword is specified, NOLINES is the
default.
.B.I5
The ^&start\& - ^&end\&, ^&step\&, and ^&keyword\& fields are
all optional.  This specification is thus legal:
.b.c
MODULE MUMBLE,,10
.b
This does not specify module MUMBLE's address range or a keyword, but
it does specify its step size to be 10 hexadecimal, or 16 decimal, bytes.
.B.I5
Program unit start and end addresses are not normally specified directly
in program unit specification statements since PMEBUILD can more conveniently
extract this informa- tion from the user program's linker map or the traceback
information in its executable image.  Even much program structure information
(such as what modules the program contains) can be determined from these
sources.  How this is done is explained in Section 5.2 below.
.B.I5
Program unit and kind names consist of 1 - 16 charac- ters from the set A - Z,
0 - 9, $, and __, with lower case letters being treated as upper case.
.B.I5
Three miscellaneous points should be noted about the DEFINE UNITS statement.
First, no program unit kind may be called "DEFINE" or "END" or have the
same name as a previously declared kind.  Second, the abbreviated statement
.b.c
DEFINE UNITS
.b
may be used to switch to "define units mode" from another mode.  This causes
subsequent program unit specifications to be treated as declarations of
additional program structure.  It requires that program unit kinds have
been declared with a previous DEFINE UNITS statement.  And third, the
colon and commas in the DEFINE UNITS statement are optional.  Thus
.b.c
DEFINE UNITS PROGRAM PHASE MODULE
.b
is valid.  It may be less readable, but it is easier to type.
.HL2 ^&Defining Program Unit Address Ranges\&
.I5
Once some program structure has been declared, the statement
.b.c
DEFINE ADDRESSES
.b
can be used to enter "define addresses mode".  This means that subsequent
program unit specifications are allowed to define address ranges, step sizes,
and keywords, but only for previously declared program units.  Thus
.b.tp2.c
DEFINE ADDRESSES     #
.c
MODULE READER, 200-2AF
.b
declares module READER to have a start address of 200 (hexadecimal) and an
end address of 2AF (also hexadecimal).  If module READER is not a previously
declared program unit, this statement is illegal and results in an error
message.
.B.I5
A more useful form of the DEFINE ADDRESSES statement is this:
.b.c
DEFINE ADDRESSES: MAP "CRUNCH.MAP"
.b
This says that all MODULE address definitions (start and end addresses)
are to be retrieved by scanning the specified linker map (file
CRUNCH.MAP in this example).  The addresses in program section (PSECT)
$CODE are normally used since this is the proper default for Fortran
programs.  If a different PSECT ("LIB$CODE", for example) should be
used, this can be specified as follows:
.b.c
DEFINE ADDRESSES: MAP "CRUNCH.MAP" PSECT LIB$CODE
.b.f.j
Use of "DEFINE ADDRESSES:#MAP" ^&requires\& that a program unit kind called
MODULE exists.  If it does not, the statement is in error.
.B.I5
If modules are found in the linker map which have not been declared in
define units mode, those modules are added to the end of PMEBUILD's program
unit list anyway.  Thus the three statements
.b.tp3.nf.nj.i13
DEFINE UNITS: PROGRAM, MODULE
.I13
PROGRAM CHESS
.I13
DEFINE ADDRESSES: MAP "CHESS.MAP"
.b.f.j
are enough to declare program CHESS to consist of all modules found
in the linker map CHESS.MAP.  In addition, PMEBUILD now knows the address
ranges of all those modules.  Hence enough information is available to
immediately specify sampling buckets.
.B.I5
A third form of the DEFINE ADDRESSES statement extracts module, routine,
and line number definitions from the traceback information in the user's
executable image:
.b.nf.nj.i13
DEFINE ADDRESSES: EXE "CRUNCH.EXE"
.b.f.j
For this to work, a program unit kind called MODULE must exist, i.e., must
have been declared with a DEFINE UNITS statement.  Routine information is
extracted only if program unit kind ROUTINE exists, where ROUTINE is a subunit
of MODULE.  Similarly, line numbers are extracted only if kind LINE exists,
where LINE is a subunit of both MODULE and ROUTINE.  Kind ROUTINE does not have
to exist for line numbers to be retrieved.  In Fortran, for example, where there
is exactly one routine per object module, it is enough that kinds MODULE and
LINE exist.
.B.I5
DEFINE ADDRESSES:#EXE also requires that the executable image has been linked,
and its modules compiled, with the /TRACEBACK or /DEBUG option.  However, since
/TRACEBACK is a default option for the linker and most languages, it usually
need not be explicitly specified.
.B.I5
If new modules are found in the .EXE file, PMEBUILD adds them and their
contained routines to the end of the program unit list.  Thus the three
statements
.b.tp3.nf.nj.c
DEFINE UNITS: PROGRAM, MODULE, ROUTINE
.c
PROGRAM CHECKERS                     #
.c
DEFINE ADDRESSES: EXE "CHECKERS.EXE" #
.b.f.j
are enough to declare program CHECKERS to consist of all modules and
routines found in the traceback information in CHECKERS.EXE.
Enough information is then available to create sampling buckets.
Similarly, if modules have been explicitly declared in define addresses
mode but routines and line numbers have not, the routines and line numbers
are inserted after the proper modules in PMEBUILD's program unit list.
.B.I5
Large high-level language programs can easily have enough lines to overflow
PMEBUILD's internal tables.  For this reason, line numbers are not extracted
from the executable image by default.  They are extracted for a module X only
under two circumstances:#(1) the DEFINE OPTIONS:#LINES command has been used,
or (2) the DEFINE OPTIONS:#SOMELINES option has been set and the LINES
keyword (Section 5.1) has been specified for module X.  (Since SOMELINES is
a default option, it is enough to specify the LINES keyword in the module
declaration.)##The user can thus specify that all line numbers be extracted
with the DEFINE OPTIONS:#LINES command, or he can specify which lines he wants
on a module by module basis.  Line numbers have numeric names without leading
zeroes.  Thus "LINE 10" names the line given as 0010 in the left margin of a
Fortran listing.
.B.I5
Like DEFINE ADDRESSES:#MAP, DEFINE ADDRESSES:#EXE nor- mally extracts only those
address ranges which are in program section $CODE, since this is the proper
default for Fortran programs.  If a different PSECT should be used, it is
specified as follows:
.B.C
DEFINE ADDRESSES:#EXE "CHECKERS.EXE" PSECT LIB$CODE
.B
Here LIB$CODE is the desired program section.
.B.I5
For typing convenience, the keyword ADDRESSES can be shortened to ADDR and
the colon thereafter omitted in all DEFINE ADDRESSES statements.
.HL2 ^&Defining Sampling Buckets\&
.I5
The actual creation of sampling buckets is done in "define sampling
mode" which is entered with the
.b.c
DEFINE SAMPLING
.B
statement.  Subsequent program unit specifications then cause sampling
buckets to be defined.  These specifications are of two kinds.  First, the
statement
.b.i14
^&kind\& ^&unitname\&, ^&start\& - ^&end\&, ^&step\&
.b
causes the specified program unit (which must be previously declared)
to be broken into equal-sized buckets covering ^&step\& bytes each.
The optional ^&start\& - ^&end\& field specifies the start and end of the
desired address range ^&relative\& to the start of the program unit.  Thus
if routine X covers addresses 2000 - 23AB (hexadecimal), "ROUTINE X, 100 - 1FF,
10" causes 16 buckets covering the address range 2100 - 21FF to be generated.
This address range must be wholly contained in the specified program unit
(routine X in this case).  If ^&start\& - ^&end\& is omitted, the whole program
unit is covered.  The ^&step\& field is also optional.  If a step size has
already been specified in define units or define addresses mode, it need not
be specified again.  If no step size is defined at all, the whole program unit
constitutes a single bucket.
.B.I5
The following are examples of legal specification statements:
.b.tp4.c
MODULE INVERT,,40          #
.c
MODULE COMPUTE             #
.c
ROUTINE GETLINE, 100-1FF, 20
.c
ROUTINE THINK, 40 - 225    #
.b
Note that ^&start\&, ^&end\&, and ^&step\& are all specified in hexadecimal.
.B.I5
The second kind of sampling specification is of this form:
.b.c
^&kind1\& ^&unitname\& BY ^&kind2\&
.b
Here ^&kind1\& and ^&kind2\& are program unit kinds where ^&kind2\& is a
subunit of ^&kind1\&.  This says that each subunit of kind ^&kind2\&
within the program unit defined by ^&kind1\& ^&unitname\& constitutes a
separate sampling bucket.  For example,
.b.i16
PHASE PROCESS__DATA BY MODULE
.b
causes each module in phase PROCESS__DATA to constitute a sampling bucket.
However, if one
of those modules has a defined step size, it will in turn be broken into
enough buckets of that size to cover the module.
.B.I5
Consider what happens in this simple example:
.b.NF.NJ.tp10
.I12
DEFINE UNITS: PROGRAM, MODULE
.I12
PROGRAM FIDO
.I16
MODULE MAIN
.I16
MODULE SUB1
.I16
MODULE SUB2
.I12
DEFINE ADDRESSES: EXE "FIDO.EXE"
.I12
MODULE SUB1,,20
.I12
DEFINE SAMPLING
.I12
PROGRAM FIDO BY MODULE
.I12
END
.B.F.J
The program structure is defined, all start and end addresses are extracted
from the executable image, and the step size is set to be 20 hexadecimal
(32 decimal) bytes for module SUB1.  Program FIDO is then broken into
sampling buckets by module.  Module MAIN becomes one sampling bucket, module
SUB1 is divided into as many 32-byte buckets as are needed to cover its
address range, and module SUB2 becomes one bucket.  Program counter
values will be collected and eventually displayed in terms of these buckets.
.B.I5
Each statement entered in define sampling mode creates what is called
a "bucket group".  The following restrictions apply to bucket groups:
(1)#At most ten bucket groups may be specified; (2)#No two bucket groups
may have overlapping address ranges; (3)#No two buckets within a bucket
group may have overlapping address ranges; and (4)#No bucket or bucket
group may include both positive and negative addresses (i.e., cover both
system space and user space).
.B.I5
After the last sampling specification, an
.b.c
END
.b
statement must be entered.  This causes PMEBUILD to output the desired bucket
file and then stop.
.HL2 ^&Specifying Options\&
.I5
Certain options can be communicated to PMEBUILD through the DEFINE OPTIONS
statement:
.b.nf.nj.i11
DEFINE OPTIONS: opt1, opt2, ..., optn
.b.f.j
where opt1, opt2, ..., optn are options keywords.  The following options
keywords are allowed:
.b.lm+20.i-15
LIST#########-#This causes all program definition statements to be listed on
the user's terminal as they are read from the .PMD file or keyboard.
.b.i-15
NOLIST#######-#This suppresses the listing of program definition statements.
.b.i-15
PRINT########-#This causes information about each bucket group and the buckets
therein to be printed out each time a bucket group is created in define
sampling mode.
.b.i-15
NOPRINT######-#This suppresses the bucket group print- out.
.b.i-15
RELADDR######-#This sets a flag in the bucket file which causes PMEHISTO to
print program- unit-relative addresses in the histogram printout.
.b.i-15
ABSADDR######-#This sets the same bucket file flag so that PMEHISTO prints
absolute addresses in the histogram printout.
.b.i-15
CLEARCOUNTS##-#This sets a flag in the bucket file which causes all bucket
counts to be cleared to zeroes before a new set of program counter values are
tallied in the file.  The effect is that a filled bucket file with this
bit set can serve as input to PMETRACE and PMEHISTO where an empty bucket
file is expected; it is not necessary to rerun PMEBUILD if a filled bucket
file already has the desired bucket definitions.
.b.i-15
ACCUMCOUNTS##-#This sets the same flag so that the bucket counts are not
cleared before new counts are tallied.  The effect is that repeated sampling
runs through PMETRACE (for trace data) or PMEHISTO (for clock- interrupt
data) with the same bucket file causes the sampling counts to be accumulated
through ascending versions of the bucket file.  This is useful for a user
who wants to collect representa- tive data from a large number of sam- pling
runs.
.b.i-15
LINES########-#This causes subsequent DEFINE ADDRESSES: EXE commands to extract
line numbers from the executable image for ^&all\& modules in the program.
.b.i-15
SOMELINES####-#This causes subsequent DEFINE ADDRESSES: EXE commands to extract
line numbers ^&only\& for those modules which were entered with the LINES
keyword in define units or define addresses mode (see Section 5.1).
.b.i-15
NOLINES######-#This prevents subsequent DEFINE ADDRESS- ES:#EXE commands from
extracting any line numbers from the executable image.
.LM-20
.B.I5
If no options are specified through DEFINE OPTIONS statements, default options
corresponding to these state- ment are in effect:
.b.tp2.nf.nj.c
DEFINE OPTIONS: NOLIST, NOPRINT, RELADDR
.c
DEFINE OPTIONS: CLEARCOUNTS, SOMELINES #
.f.j.b
For typing convenience, the colon and commas are optional in the DEFINE
OPTIONS statement.
.HL2 ^&Error Recovery\&
.I5
When erroneous input is entered, PMEBUILD gives an error message and in
general aborts the semantic effect of the errant statement.  However,
when defining sampling buckets, it is possible to unintentionally overflow
the bucket file or internal tables in PMEBUILD.  Furthermore, this may not
be detected until the END statement is entered.  To recover from this
situation, PMEBUILD solicits additional input from the user, who can then
enter this command:
.b.c
DEFINE CLEAR
.B
This clears PMEBUILD's internal bucket and bucket group tables, after which
the user can redo all his bucket definitions from scratch.  All program
structure definitions remain intact, however.
.HL2 ^&Examples of Use\&
.I5
Here we give two examples of using PMEBUILD, one simple and one more
complicated.  In the simple case, the user has a program whose performance
he wants to measure by module, but he does not set up a program definition
file.  The following  run stream sets up the bucket file:
.b.lm+5.tp5.nj.nf
$#LINK USERPROG,PMECLOCK
$#RUN PMEBUILD
Do you wish to specify file names?#(Yes or No):#N
Enter DEFINE statement:#DEFINE UNITS:#PROGRAM, MODULE
Enter program unit spec:#PROGRAM X
Enter program unit spec:#DEFINE ADDRESSES:#EXE "X.EXE"
Enter address spec:#DEFINE SAMPLING
.TP6
Enter sampling spec:#PROGRAM X BY MODULE
Enter sampling spec:#END
.b
***Bucket File created***
.b
$
.b.lm-5.f.j
Prompts and other output from PMEBUILD is shown in lower case and the
user's input is shown in upper case.  The six input lines (from "DEFINE
UNITS" to "END") represent the minimum input to PMEBUILD, but are enough
to give a complete breakdown of the whole program by module.
.B.I5
In the next example, a more complicated program struc- ture is declared with
the following  program definition file:
.b.lm+10.nf.nj.tp9
DEFINE UNITS:#PROGRAM, PHASE, MODULE
PROGRAM GRAVEL
.I8
MODULE GRAVEL
.I4
PHASE ONE
.I8
MODULE FEED
.I4
PHASE TWO
.I8
MODULE CRUNCH
.I8
MODULE CRUSH
.I8
MODULE GRIND
.I4
PHASE THREE
.I8
MODULE SIEVE
.I8
MODULE SHOVEL
PROGRAM SYSTEM$SPACE, 80000000-FFFFFFFF
.TP3
DEFINE ADDRESSES:#EXE "GRAVEL.EXE"
MODULE CRUSH,,100
DEFINE SAMPLING
.B.LM-10.F.J
Assuming that this file has the name PMEFILE.PMD, the following run
stream builds a bucket file:
.b.lm+5.tp9.nf.nj
$#LINK GRAVEL,FEED,CRUNCH,CRUSH, -
       GRIND,SIEVE,SHOVEL,PMECLOCK
$#RUN PMEBUILD
Do you wish to specify filenames?#(Yes or No):#N
Enter sampling spec:#PHASE TWO BY MODULE
Enter sampling spec:#END
.B
***Bucket File created***
.b
$
.b.f.j.lm-5
In this case, everything except the actual sampling specifi- cation has
been set up in the program definition file.  Hence only two lines ("PHASE
TWO BY MODULE" and "END") need to be entered when PMEBUILD is run.  A program
definition file takes more time to set up initially, but it saves time
and trouble when repeated PMEBUILD runs, specifying differ- ing bucket
configurations, are expected.
.PG
.HL1 PMEHISTO: PRINTING THE PERFORMANCE HISTOGRAM
.I5
PMEHISTO is the reduction program which displays what fraction of the user
program's total time is spent in each bucket.  It is invoked with this
command:
.b.c
$ RUN PMEHISTO
.B.I5
PMEHISTO first asks the user whether Clock or Trace generated data
is to be processed.  If the answer is "C" (for Clock), a sample file
(.PMS) and an empty bucket file (.PME) serve as input and a filled bucket
file (.PME) and the histogram is the output.  If the answer is "T" (for
Trace), a filled bucket file (.PME) is the input and the histogram is the
output.
.B.I5
In either case, PMEHISTO also asks the user whether he wants to specify
the names of the input and output files involved.  If the answer is "N"
(for No), the following default names are used:
.b.tp5.lm+12.nf.nj.i-2
For Clock-driven sampling:
Input Sampling File:    PMEFILE.PMS
Input Bucket File:      PMEFILE.PME
Output Bucket File:     PMEFILE.PME
Output Histogram File:  PMEFILE.HIS
.B.tp3.I-2
For Trace-driven sampling:
Input Bucket File:      PMEFILE.PME
Output Histogram File:  PMEFILE.HIS
.b.lm-12.f.j
If the answer is "Y" (for Yes), PMEHISTO asks the user for each of the
desired file names.  A blank response to any such query causes the corresponding
default name (as listed above) to be used, but with one exception:#for
clock-driven sampling, if the input bucket file's name is specified but
the output bucket file's is not, the specified name is used for both files.
.B.I5
PMEHISTO produces the reduced output in two forms.  The first is a histogram
file  whose default name is PMEFILE.HIS.  This file is suitable for listing
on the line printer.  It shows the symbolic name of each program unit
sampled, the relative or absolute address range of each bucket, the
percentage of the total time spent in that bucket, and a histogram bar
proportional to that percentage.  The histo- gram is scaled so that the bucket
with the largest reference count has a bar that spans the full width of the
histogram.  Some summary statistics are also printed at the end of the
histogram.
.B.I5
The second form of the reduced output is a histogram display on the user's
terminal.  This display is identical to that meant for the line printer
except that each page is much shorter--the terminal can only display 24
lines at a time.  After each page of the display, the user is asked to enter
a command.  The following commands are accepted:
.b.tp5.lm+7.nf.nj
Carriage Return#--#Display next histogram page
R#--#Restart histogram display from beginning
S#--#Skip to the summary statistics display
K#--#Kill the display and exit program
H#--#Help:#print this display command list
.b.lm-7.f.j
After completing the display, PMEHISTO automatically restarts it from the
beginning.  The user can thus cycle through the histogram as many times as
he wants.  The only way to stop the display is to enter the K command.
.B.I5
The following figure illustrates what one page of the terminal display looks
like:
.B2.NF.NJ.TP23
   Performance Measurement and Evaluation Histogram   Page 1
.B
          PROGRAM COMPILER BY PHASE
.B
             +----+----+----+----+----+----+----+----+
FRONT__END    |
    0 - C2E6 |**************************************** 22.4%
ALLOCATOR    |
    0 - 1CBD |**                                        0.8%
OUTBIN       |
    0 - 1C50 |*********                                 4.7%
OPTIMIZER    |
    0 - 46A6 |*************************                13.9%
CODE__GEN     |
    0 - 1FFF |*********                                 5.0%
 2000 - 3FFF |**********                                5.6%
 4000 - 5FFF |*************************                13.9%
 6000 - 6583 |*******                                   4.3%
FINAL        |
    0 - 227F |**********                                5.8%
             +----+----+----+----+----+----+----+----+
.B
                Scaling:      8.3 counts/asterisk
.B2.F.J
This illustration has been compressed a bit to make it fit the page.
The actual histogram is slightly wider.
.PG
.HL1 SUGGESTED COMMAND FILE SETUP
.I5
Frequent users of the PME package may find a command file useful for doing
routine setup operations.  The follow- ing command file, which assumes that
all PME components are found in the [PERFDIR] directory, is suggested as a
model:
.b.lm+5.nf.nj
$#WRITE SYS$OUTPUT "---PME Setup Run---"
$ !
$10:
$#INQUIRE ANSWER "Do you want to relink? (Y or N)"
$#IF ANSWER .EQS. "Y" THEN GOTO 20
$#IF ANSWER .EQS. "N" THEN GOTO 50
$#GOTO 10
$ !
$20:
$#INQUIRE ANSWER "Clock or Trace sampling? (C or T)"
$#IF ANSWER .EQS. "C" THEN GOTO 30
$#IF ANSWER .EQS. "T" THEN GOTO 40
$#GOTO 20
$ !
$30:
$#WRITE SYS$OUTPUT "---Clock-driven sampling---"
$#LINK/MAP  PROGNAME,...,[PERFDIR]PMECLOCK
$#GOTO 50
$ !
$40:
$#WRITE SYS$OUTPUT "---Trace-driven sampling---"
$#LINK/MAP  PROGNAME,...,[PERFDIR]PMETRACE
$ !
$50:
$#PURGE PROGNAME.*
$#COPY [PERFDIR]PMEFILE.PMD  PMEFILE.PMD
$#PMEBUILD :==#"RUN [PERFDIR]PMEBUILD"
$#PMEHISTO :==#"RUN [PERFDIR]PMEHISTO"
$ !
$#WRITE SYS$OUTPUT "---PME Setup Done---"
.lm-5.f.j
.B.I5
The command file handles the task of linking  the user's program with the
proper sampling subroutines and it sets up PMEFILE.PMD in the user's
default directory.  It also defines two new commands, namely
.b.c
$#PMEBUILD
.BR
and
.c
$#PMEHISTO
.b
which run the corresponding programs.  This is a convenient shorthand.
.B.I5
Using this command file, which copies PMEFILE.PMD to the user's directory,
and then using all the default file names has another advantage: all input
and output files have the name "PMEFILE"--only the extensions differ.  They
can thus all be deleted by this single command:
.b.c
$#DELETE PMEFILE.*;*
.b
All the garbage accumulated by repeated PME runs can be removed in a single
stroke.
