	NTSTEP  -   A TOOL FOR TUNING ALPHA NT APPLICATIONS



WHAT IT'S USEFUL FOR

--------------------

    NTSTEP is a performance tool which traces user mode execution of

    processes running under Microsoft's NT Operating System

    on Digital's Alpha platforms.  The NTSTEP options give the user a

    rich set of filters to apply to the trace information:

	- Using trace functions, the user can start with a broad look at

	  process flow, and narrow down.

	- NTSTEP provides tracing at the instruction level,

	  the function call level, the system call level, and more.

	- In addition, the user can specify where the trace should start

	  and end.

	- Special traces types include alignment faults, cache misses,

	  paging, and others.



    NTSTEP contains the tools for closely examining a portion of code which

    appears to be misbehaving, without requiring the constant user interaction

    that a debugger requires, or that the program be specially compiled for

    a debugger.  





WHEN TO USE NTSTEP

------------------

    NTSTEP can be an asset in finding software performance problems.

    However, before diving into NTSTEP, take the time to look at the

    problem at a higher level first using such tools as the NT

    Performance Monitor.  Make sure that the problem is not caused by

    a factor which has nothing to do with software performance, such

    as excessive disk I/O or network activity.



    Because NTSTEP's traces can become gigantic, it is easy to get lost

    in an avalanche of inappropriate or irrelevant information.  Therefore,

    use NTSTEP after NTPROF, when you have a pretty good idea what portion

    of the code contains a problem.



    Once you are reasonably sure there is a software performance

    problem, NTSTEP can be invaluable.



    For more information on when to use NTSTEP, see triage.doc, which

    is included in triage.zip, www address <TBS>.







WHAT IT DOES

------------

    NTSTEP is a program that uses the Win32 debug API and instruction

    emulation to trace Alpha AXP execution of the given process.



    NTSTEP provides a wide variety of ways to trace a program, at a

    variety levels.  NTSTEP will print a continuous trace of program

    flow on an instruction, function call, or system call basis.  It

    will also trace events such as cache misses, alignment faults,

    page faults etc. which, if excessive are a performance problem.

    All of these traces can be controlled with a start and end

    address.





NTSTEP FEATURES

---------------



  Does not require recompilation of the program.



  Tracing:

    - You can optionally start and end the trace at specified places,

      using either a count or symbols.

    - Prints a continuous trace of process execution:

	- each instruction, only function calls, and/or only

	  system calls.

	- You can also specify a watch address range to trace.

    - Prints a continuous trace of events

	- Instruction and Data cache misses

	- alignment faults

	- pagefaults

	- stack pointer violations

    - If the image has been linked with symbols, NTSTEP will display them

      when possible.



  Reports:

    As NTSTEP emulates a process, it can also generate summary reports about

    the process:

    	- function calls

        - primary cache line use

	- memory operations

	- page use

	- opcode distribution



  Other features:

    - will trace a process which is already running.

    - does not require special compilation with a debugger.

    - Displays symbols when the process was linked with symbols.



HOW TO USE NTSTEP

-----------------

  - Two ways of filtering the trace info: traces and reports

  - There are very few rules about what parameters may be used with

    what other parameters.  You can do two types of traces at once,

    for example.

  - Start tracing with large grainularity filters like /system and /stack.

    Work down to instruction tracing.

  - Use filters together to help see what is happening - for example,

    use /stack or /system with /unalign or /miss.  Before combining filters,

    use them separately to familiarize yourself with the output from each one.



  NTSTEP can provide impressive amounts of detailed information about a process.

  Use NTSTEP's options to filter the information into a manageable form.  For

  example, when tracing, use /start and /end or /min and /max to trace only

  those portions of the process's execution which are interesting.  For example,

  the following command produces a trace of all function calls executed by

  the program, ls.  It starts the trace at the symbol main and ends at exit:



	ntstep /stack /start:main /end:exit ls



  If this is more information than you need, reduce the trace by using

  different /start and /end points:  



	ntstep /stack /start:ls!fnext /end:ls!savefile ls



  You could also limit the trace by using /start and /max.  The following starts       the trace at ls!fnext, and ends the trace 50 instructions later:



	ntstep /stack /start:fnext /max:21050 ls



  Note the instruction count on the left.  This is the number of instructions

  executed in the function listed on that line.  For a graphic indicator of

  function size, use the /scale parameter to see a crude histogram on the

  right side of the display.  Use larger scale value to see the longest

  functions:



	ntstep /start:fnext /scale:40 ls



  You may also find it useful to use the /output option to write the trace to

  a file, and then search the file for items of particular interest.



  After examining the trace, you may wish to trace your program flow at a higher

  granularity.  You can trace only system service calls:

	ntstep /system ls



  Or you may wish to use a finer granularity and trace every instruction.  The

  following generates an instruction trace starting at ls!fnext and continuing

  for the next 50 instructions:

  

	ntstep /trace /start:fnext /max:50 ls



   NTSTEP has few rules about which options can be use with which.  See the

  examples section of this document for more examples interesting

  combinations.

 



NTSTEP COMMANDS AND OPTIONS

===========================



NTSTEP has two primary types of options: traces and reports.  



	Usage: ntstep [options] command [command parameters]...

	       ntstep [options] /process:name



    Main Options:

    -------------

	/output:X  - Set output file name (instead of stdout). Can

		     output to up to (n???) files in the same command.  See

		     example below.



	/process:# - Trace a process which is already active.  # = PID or

		     process name.  Note that, when NTSTEP terminates, the

		     process being traced will terminate as well.

	/help      - Usage statement





    Trace Control Options:

    ----------------------

    Trace start/stop points:

        /max:#     - Stop display at trace count #. Trace count is the

		     number on the left side of the /trace NTSTEP output.

		 

	/min:#     - Start display at trace count #.   Trace count is the

		     number on the left side of the /trace NTSTEP output.



	/start:N   - Set address at which to begin the trace.  N = a PC

		     address or symbol name if image is built with symbols.



	/end:N     - Set address at which to stop the trace.  N = a PC

		     address or symbol name if image is built with symbols.

    Parent/child process selection:

	/only      - Trace primary process only (no children)

	/select:P  - Select one [child] process only for tracing.  Do not

                     trace parent process.  P = process name.



    Traces: Instruction, Function, and System Call

    ----------------------------------------------

	/trace[:n] - Display instruction trace.  Without n, traces

		     every instruction.  With n, traces 1 instruction out

		     of every n instructions.  Useful for reducing huge

		     instruction traces when you don't want to look at

		     every instruction and are using /trace in combination

		     other filters such as /miss, /sp, /unalign, and

		     /page:#.

		Options:

		     /color   - Display stack or instruction trace on

				monitor in color.

		     /watch   - See /watch below.

		Example:

		     Trace the process, ls.exe, starting at symbol ls!main

		     and stopping 50 instruction later.

			ntstep /trace /start:ls!main /max:50 ls



	/stack     - Display stack call depth trace. On color monitor, use

		     with /color.

		Options:

        	     /arguments[:#] - Show # arguments for each function

				call.

		     /color   - Display stack or instruction trace on

				monitor in color.

	/nest      - Display stack call depth trace with braces. 



	/system	   - Display system service calls.

		Options:

        	     /backtrace[:#] - Display a backtrace of the last # call

				frames for each system call.  # is optional

				and default is all call frames.



	/time	   - Display system service elapsed time.  Same as /system

		     but includes time to execute system service.



	/scale:[#] - Display function trace with a bar chart with # 

		     instructions per character.  Indicates the size

		     (number of instructions) of each function called.

		     Provides a crude monogram which can be uickly

		     visually scanned for the largest functions.  Use

		     larger and larger values for # until the desired

		     scale is reached.



    Special Traces:

    --------------



	/unalign   - Locate unaligned data references.

		Options:

		     /backtrace[:#] - For each unaligned reference

				encountered during the trace, display a

				backtrace of the last # call frames.  # is

				optional.



	/miss	   - Display Instruction and Data cache misses.  Use with

		     /trace or /stack to determine program flow

		     around cache activity.



	/page:#	   - Display page fault trace.  # = number of page faults

		     for page.  Ex. 1 = first access to any page.  Display

		     a page use summary at the end.  Note: NTSTEP /page

		     displays only the summary.

 		Options:

		     /backtrace[:#]  - Prints a trace of page faults plus

				the instruction at which the page fault

				occurred, and a backtrace of the call frames

				of the last # calls on the stack. # is

				optional.



	/sp	   - Display stack pointer violations (non mod-16 sp)

		Options:

		     /backtrace[:#]  - For each stack pointer violation

				encountered, display a backtrace of the last

				# call frames.  # is optional.



	/watch:X   - Watch a pc or a memory address range and print trace

		     if executed or accessed.  Load and store? Can be used

		     to repeatedly trace a specific portion of a loop.

		     X = pc, memory, load, store.

		Required parameters:

		     /trace   - Required unless /backtrace is used.

            	     /range:N - Required.  Set watch address range(s).

				Format: <addr1>-<addr2>.  NTSTEP will display

				trace or events within specified range.

		Options:

		     /backtrace[:#]  - Display a backtrace of the last #

				call frames for each execution or access of

				the watched address range.  # is optional.



    

    Summary Report Options:

    ----------------------

	/function  - Generate function call summary. Produces a list of

		     function calls ordered by % usage.

	/line[:#]  - Generate analysis of primary cache line use.

		     # = cache line to summarize. Use with /verbose for

		     line usage detail.

	/memory    - Display memory operations (fetch, load and store)

		     summary

	/page      - Generate page use summary.  Use with /verbose for

		     summary of each page.

	/opcode    - Generate opcode distribution.  Produces two opcode

		     lists.  One lists opcodes by % usage, the other is

		     alphabetical by opcode.



    Other Options:

    -------------

	/safe      - Force real interlock operations.  Otherwise, these

                     instructions are emulated by NTSTEP.

??	/slow      - Normally, NTSTEP emulates the process being traced.  

		     This option forces single step breakpoint mode, which

		     will execute each instruction, but the cost is a

		     context switch for each instruction.  Be forewarned,

		     therefore, that this mode is extremely slow.

	/verbose   - Verbose output.  Use with: /page, /line, (any others)?





EXAMPLES

--------

1.  Start the process, ls, and trace all instructions during execution of

    the command: ls -l ntstep.exe

	    ntstep /trace ls -l ntstep.exe



2.  Start the process winver, and trace all instructions executed.  If 

    running on a color monitor highlight various instructions in color (as

    described for /color above). 

	    ntstep /trace /color winver



3.  Run winver to get the statistics that NTSTEP supplies.

	    ntstep winver



4.  Run winmsd and print a trace of all function calls, indented to indicate

    nesting.

	    ntstep /stack winmsd



5.	    ntstep /start:main /end:exit hello

6.	    ntstep /page freecell

7.	    ntstep /line freecell



8.  Localize cache misses: display every 100000th instruction executed, and

    all Instruction and Data cache misses during execution of vi:



	    ntstep /t:100000 /miss vi hello.c > vi.log



9.  Generate an instruction trace of sqlserver and put the output into the file

    trace.2.  Generate a function call trace of sqlserver, include function

    call arguments and put the output into the file calls.2:



	    ntstep /out:trace.2 /trace /out:calls.2 /stack /args sqlservr



10. To locate alignment faults and print a backtrace of the call stack to see

    what functions lead to the fault:



	    ntstep /align /backtrace excel

 

====================================================================



	    ntstep /trace ls -l ntstep.exe

	    ntstep /trace /color winver

	    ntstep winver

	    ntstep /stack winmsd

	    ntstep /start:main /end:exit hello

	    ntstep /page freecell

	    ntstep /line freecell

	    ntstep /t:100000 /miss vi hello.c > vi.log

	    ntstep /out:trace.2 /trace /out:calls.2 /stack /args sqlservr











