Trap Dump Reference 2017-07-03 -------------------- Steven H. Levine steve53@earthlink.net == Introduction == This note explains how to set up and record trap dumps and how to do basic dump analysis with PMDF. == Preparing the Dump Volume == The trap dump facility writes the dump files to a dedicated FAT formatted volume. The code that writes the dump file is located in OS2DUMP on your boot volume. The kernel contains code that passes control to OS2DUMP when a dump is requested. This dump volume must be a FAT formatted with the volume name SADUMP. The volume must be at least as large as the installed memory. The standard OS2DUMP is limited by the 2GiB FAT volume limit. If you have more than 2GiB of RAM, you will need to install and configure the DUMPFS IFS. This may also be the case for some systems with exactly 2GiB of RAM. The dump volume must be visible to the BIOS. This usually means it must be on the first or second drive and must be below the 1024 cylinder boundary unless the BIOS support the Int13 extensions. In a mixed IDE/SCSI system, the driver for the boot volume must load first. This applies even when booting from IDE and when the SCSI drive contains no bootable devices. Otherwise, OS2DUMP will hang. The dump partition must be located on a disk with standard logical geometry. OS2DUMP assumes there are 63 sectors/track. This means that OS2DUMP will not be able to find the dump on disks larger than 512GiB that use non-standard 127 sectors/track geometry. WARNING - OS2DUMP effectively formats the volume and overwrites any existing volume content. Don't store anything you don't want to lose on the volume. Use LVM or dfsee to create a volume for the file system. Verify that the partition size is no more than 2047MiB. LVM and dfsee round requests up to the next cylinder boundary and OS2DUMP can not handle partitions larger than 2047MiB. On some systems with less than 2GiB of RAM, it appears that a maximum size partition is required. A typical desktop system will have a logical geometry of 63 sectors/track and 255 heads/cylinder. A typical laptop system will have a logical geometry of 240 sectors/track and 63 heads/cylinder. Your drives probably use one of these geometries, but you should check with the dfsee GEO command from the command line or the menus or some other method. So, for the typical desktop system, the number of cylinders in a 2GiB partition is: 2*1024**3 / (63 * 255 * 512) = 261.083348 or rounding down, 261 cylinders. If you are using dfsee to create the volume, specify the size as 261,c, where c indicates cylinders. If your partitioning tool requires you to specify the size in MiB, try 2046.5MiB and check that this results in a 261 cylinder partition. The dfsee confirmation dialog will be similar to CREATE logical FAT 2047, @10 Freespace ID 10 : 8738.5 MiB disk 3 FAT32-Ext = 0c : 2047.3 MiB Logical For the typical laptop system, the number of cylinders in a 2GiB partition is: 2*1024**3 / (240 * 63 * 512) = 277.401057 or rounding down, 277 cylinders. If you are using dfsee to create the volume, specify the size as 277,c, where c indicates cylinders. If your partitioning tool requires you to specify the size in MiB, try 1.997MiB and check that this results in a 277 cylinder partition. Note that this may not be large enough for some systems with 2GiB of RAM. Format the volume with format X: /fs:FAT /v:SADUMP where X: is the drive letter you assigned to the volume. Test the volume with the command dir X:\ The volume name should display as SADUMP. Since the volume is empty, the shell may display a sys0002 error message. == OS2AHCI.ADD == If you want to place the SADUMP volume on a disk drive attached to a controller running in AHCI mode, you must use OS2AHCI.ADD 1.25 or newer. == Enabling the Trap Dump Facility == There are several ways to enable and configure the trap dump facility. Pick the one that best suits you needs. To permanently enable trap dumps, add the following line to config.sys TRAPDUMP=R0,x: where x is the volume where the dumps will be stored. WARNING - OS2DUMP effectively formats the volume and overwrites any existing volume content. Don't store anything you don't want to lose on the volume. On LVM aware systems, volume letters may or may not match BIOS assigned drive letters. The kernel maps the LVM volume letter to the SADUMP partition when invoking the dump routines. This works correctly in current kernels. Some older kernels had problems with this mapping. If you are working with an older Warp4 kernel and the trap dump facility can not find the SADUMP volume, try to position the SADUMP volume so that LVM volume letter matches the drive letter assigned by the BIOS scan. There are lots of different volume types and setups, so it may take some experimentation to determine the correct volume letter for some setups. WARNING: Do not use the OS2DUMP binary delivered with mcp2/acp2. It had a defect and can possibly OVERWRITE your volume(s). After editing CONFIG.SYS, reboot to active the feature. For more information on this command, type view cmdref trapdump from the command line. This information is accurate but incomplete. Recent FixPak's have added many new features. See \os2\install\readme.dbg for the details. One new command is PDUMPSYS, which can control the level of detail included in the system dump. These settings do not apply to kernel trap dumps, but can be used to control the detail of ring 0 process dumps. Also added is the capability to set up the dump configuration from the command line using the TRAPDUMP command. For more information, see \os2\system\ras\procdump.doc If a ring 0 system dump seems to be missing information needed to analyze your problem try pdumpsys paddr(all) If needed, this command and others can be invoked from config.sys. For example run=z:\os2\system\ras\pdumpsys.exe paddr(all) where z: is your boot volume. == Recording Trap Dumps == With the above configuration, this is usually automatic. The trap dump file will be created as the traps occur and will be written to the volume you chose. Any existing data on volume will be erased including any prior dump file. Most often the trap will occur at the same cs:eip repeatedly. If not you may need to save the trap dump files to a temp directory to allow you to analyze the common factors. Those using non-US keyboards might have trouble responding to the dump prompt with the Y key. If so, use F1 instead. If you are trying to capture a trap dump file for a hang condition, try Ctrl-Alt-NumLock-NumLock from the keyboard. On some keyboards, you may need to use Ctrl-Alt-F10-F10. To turn off the Trap Dump Facility, REM out the TRAPDUMP statement in config.sys and reboot. == Preparing to Use the PM Dump Facility - Overview == The PM Dump Facility (PMDF) is a generic tool which needs to be configured to understand the dump files generated by a specific kernel version. PMDF is configured with a set of files known as Dump Symbols which are a combination of programs and data files. The program files (df_ret.exe and df_deb.exe) are invoked by PMDF to retrieve data from the dump file. These programs understand the layout of the kernel data structures and are typically specific to a range of kernel revisions. The data files include System Definition (.sdf) files and Symbol (.sym) files. The program and data files collectively are referred to as the Dump Symbols. The System Definition files configure df_ret.exe and df_deb.exe and are typically specific to a single kernel revision. The Symbol files contain data used to translate binary addresses within an executable to symbolic names. These files are typically specific to a specific version of the executable. This means that a set of Dump Symbols is typically kernel version specific, FixPak specific and application version specific. Assuming a standard install of the Dump Facility, the Dump Symbols sets are stored in subdirectories of \os2\pdpsi\pmdf on your boot volume and identified by the index file \os2\pdpsi\pmdf\pmdfvers.lst. The pmdfvers.lst index file cross references kernel version to subdirectory names. Note that the Dump Facility and the Kernel Debugger use the same set of symbol files, but the requirements differ as to where the symbol files must be located. The Dump Facility requires that the symbol files be placed in a directory that can be found via pmdfvers.lst. The Kernel Debugger requires that the symbols be stored in the same directory as the associated executable. For example, os2krnl.sym must be located in the root of the boot volume along with os2krnl. The reason for this is that the Kernel Debugger runs on the Machine Under Test (MUT) and may need access to the symbol files before the file system drivers are loaded. This places some limits on where the symbol files can be located and the easiest solution was to require the symbol files to be in the same directory as the associated executable. PMDF is often run on a system other than the MUT which generated the dump. Using pmdfvers.lst to locate the correct symbols for the dump allows PMDF to be used to analyze dumps capture on systems other than the system running PMDF. Pmdfvers.lst will have a line defining where to find the symbols for each specific kernel revision to be analyzed. == Preparing to Use the PM Dump Facility - pmdfvers.lst == Each pmdfvers.lst entry is a single line of the form directory;version;comment or directory:version:comment Directory is the data directory that contains the df_ref.exe and symbol files files for the indicated kernel version. This directory must be a subdirectory of \os2\pdpsi\pmdf. A typical entry is warp45_s;14.106_SMP;eComStation 2.x standard install which states that the warp45_s is the data directory which contains the df_ret.exe and symbol data files for the 14.106 SMP kernel. The version string is case sensitive and must match the kernel revision exactly. Anything after the second delimiter is a comment to help you remember what the other two values mean. If you are not sure of what kernel revision you are running, use your favorite hex editor and search for the string "Internal revision" without the quotes. The string that follows is the value PMDF uses to match the version string in pmdfvers.lst. You can find similar information with bldlevel \os2krnl but the revision number reported in the build level string is often not in exactly the format that PMDF is looking for. Check if \os2\pdpsi\pmdf\pmdfvers.lst exists. If it does not, create it using the following as a template warp45_s;14.106_SMP;WSeB/ACP SMP warp45_u;14.106_UNI;WSeB/ACP UNI warp45;14.106_W4;Warp 4/MCP Replace the version string with one that matches the installed kernel. Check that the data directory for your kernel revision exists. If not create it as a subdirectory of \os2\pdpsi\pmdf. Check the data directory for an existing set of Dumps Symbols. For a retail kernel, at a minimum, the set will include df_ret.exe os2krnlr.sym os2krnl.sdf doscall1.sym Depending on the issue, you may need additional symbol files. == Preparing to Use the PM Dump Facility - eComStation Details == If you are running eComStation 1.2 or newer, the dump symbols may already be installed. Check the data directory named in the pmdfvers.lst entry for your kernel version. If not, they are available on your installation CD in the \os2image\debug and the \os2image\fi\sysmgmt directories. They are also available from your account at the eComStation website (www.ecomstation.com). If you need to copy files to the PMDF data directory, copy the files to the data directory and not a subdirectory of the data directory. This is important. PMDF only searches the data directory. Continue with step 7. == Preparing to Use the PM Dump Facility - Warp4/WSeB details == Depending on what components you installed when you installed Warp or WSeB on the box where the trap occurred, you might already have the files needed to examine the dump installed. If not, the following examples describe how to get the files you need and how to install the files so that PMDF can use them. The examples that follow assume FixPak 15 Kernel version 14.062 Boot volume c: Be sure to replace the values shown in the examples with values that match your specific system. This applies to kernel revision numbers, boot volume letters and other values that are specific to your system. Add pathname prefixes as needed to match where you have stored the files. If you are not sure of what kernel revision you are running, use your favorite hex editor and search for the string "Internal revision" without the quotes. The string that follows is the value PMDF uses to match the kernel to the symbols. You can find the same information with bldlevel c:\os2krnl but the revision number reported in the build level string is usually not in exactly the format that PMDF is looking for. Go to ftp.software.ibm.com/ps/products/os2/fixes/debug and download m015dmp.zip. This is the symbols zip file for FP15 and kernel version 14.062. If you are using a testcase kernel, the dump symbols will be at testcase.boulder.ibm.com/ps/fromibm/os2 and will be named dfyyyymmdd.zip where mmdd matches the testcase kernel date. When you download a testcase kernel, be sure to download the symbols file at the same time. Otherwise, it might be gone when you need it. There are a few sites that archive copies of the test test case kernels. One such site is http://www.os2site.com/sw/upgrades/kernel/ Create the subdirectory c:\os2\pdpsi\pmdf\warp4.15 Unzip the contents of m015dmp.zip into this subdirectory with unzip -j m015pmd.zip -d f:\os2\pdpsi\pmdf\warp4.15 This will put all the files in m015pmd.zip into the data directory. This is important. If you do not use the -j option or it's equivalent, PMDF will not be able to find the symbols. The zip files contain subdirectories because they are structured for use with either the Kernel Debugger or PMDF. As explained previously, the Kernel Debugger requires that the symbol files be in the same directory as the associated executable. The zip files contain subdirectory information so that the files will be placed in the correct directory when unzipped to the root of the boot volume. For PMDF, the symbol files must be in a data directory named in pmdfvers.lst. == Preparing to Use the PM Dump Facility - Other Applications == If your application came with map files and no symbol files, the mapsym utility may be able to create symbol files from the map files. Mapsym is available with most compilers. If you need usage help, just type mapsym from the command line. Copy the symbol files to the PMDF data directory. If you don't have either map or symbol files for your application, it may be a bit more difficult to analyze the dump file, but PMDF will still work. == Verify the Trap Dump File Integrity == Before attempting to analyze a trap dump file or send it off to someone else to analyze, you must verify its integrity. You verify the dump file by successfully loading it into PMDF. If PMDF can load the dump file without errors on your system, it should be loadable on any system with the proper set of Dump Symbols. If you can not load the dump file with PMDF, it is unlikely that anyone else can either. To verify the dump file, start PMDF. Use File->Open Dump File to open the Open Dump File dialog. Select your dump file and press the OK button. If the dump file and your PMDF setup are valid, PMDF will read in the dump file, find the matching Dump Symbols set using the data in pmdfvers.lst and will display a summary of the dump file and the state of the system at the time the dump was recorded. The display will be similar to: IBM OS/2 Dump Formatter for a retail or an hstrict SMP kernel. Formatter is --> Internal revision 14.100g_SMP Dump file is --> Internal revision 14.106_SMP (system dump) Symbol (d:\devtools\pmdf\14_106_smp_t60\os2krnlr.sym) linked Current slot number: 009e Slot Pid Ppid Csid Ord Sta Pri pTSD pPTDA pTCB Disp SG Name *009e# 005c 001f 005c 0001 run 0200 f909d000 fe638860 f972c690 0ea0 1e CPE eax=80030117 ebx=102937b0 ecx=00000000 edx=00000000 esi=1b386560 edi=ffffffff eip=1e92f331 esp=000f4544 ebp=000f457c iopl=0 -- -- -- nv up ei pl zr na pe nc cs=005b ss=0053 ds=0053 es=0053 fs=150b gs=0000 cr2=00000000 cr3=00211000 p=00 005b:1e92f331 83c40c add esp,+0c # Loading version data ... please wait Loading structure info for eCS FP6 GA ... please wait. ....Structure definition file (14_106_SMP_t60\kernel.sdf) loaded. Done. The names and numbers will be specific to your trap, but the display layout will be same. If PMDF appears to trap or hang before selecting a Dump Symbols set, check pmdfvers.lst and make sure it contains no blank lines. If df_ret.exe traps and you are trying to open a system dump for an SMP kernel, try the 14.100d df_ret.exe in place of whatever you currently have installed. If PMDF appears to trap or hang after selecting a Dump Symbols set, the Dump Symbol set may contain one or more incorrect files or the dump file might be corrupted. Double check that you have the right Dump Symbols for your kernel installed in the directory. If PMDF cannot match up the dump file with a Dump Symbol set, it will prompt you to select a Dump Symbol set from the sets defined in pmdfvers.lst. This in an indication that either there is something wrong the content of your pmdfvers.lst or the dump file is corrupted. Double check pmdfvers.lst for typos and try again. If PMDF continues to prompt for a Dump Symbol set, you should probably request help. If you are adventurous, you can try selecting one of the available Dump Symbol sets. PMDF will allow you to do this, but the results will be unpredictable. Unless you really know what you are doing, PMDF is almost sure to misinterpret the dump file content. As a result, it may complain or or it may trap or it may go into some sort of loop. If you continue to have trouble opening your dump file with PMDF, request help. If PMDF can load the dump file without trapping or hanging or prompting you for a Dump Symbol set, the dump file is probably not corrupted and it is OK for you and others to use. == Analyzing Trap Dumps == This is a bare bones overview. To view some of the available data, try the following: Select Synopsis -> Trap Screen Info from the Analyze menu. Select Thread -> Call Gate from the Analyze menu. Select Thread -> Ring 0 Stack Trace from the Analyze menu. Select Thread -> Ring 2 Stack Trace from the Analyze menu. Select Thread -> Ring 3 Stack Trace from the Analyze menu. Select Process -> Open Files from the Analyze menu. Select Synopsis -> Process Synopsis from the Analyze menu. Select Synopsis -> System Synopsis from the Analyze menu. Select Process -> Module Table from the Analyze menu. Enter the commands r ln u eip-20 eip k dw bp in the command line at the bottom of the PMDF window. Press the Enter key after each command. If the r command does not report the same cs:eip as shown on the Trap Screen, repeat the above commands substituting the numeric cs:eip value from the Trap Screen for eip and the numeric ss:ebp value from the Trap Screen for ebp. Select Save Output from the File menu and save the window contents to a file. If you don't understand what you are seeing, you will have to find someone to help you interpret the content of the dump file. Ask questions and let your helper guide you. Be prepared to spend some time working with your helper to understand the cause of the trap. The bare bones information you generated is just a starting point. It may or may not be sufficient to identify the source of the trap. Unless you are lucky, your helper will request additional output and may ask you generate another dump file using different settings. Often, your helper will to want you to send a copy of the dump file and the debug symbols. It is a good idea to keep notes describing how each dump file was generated and what you were doing when the dump file was generated. This is especially important for intermittent failures where one is looking for a pattern. Save the dump file, the debug symbols and your notes until analysis is complete. == Interpreting Trap Dumps == This too is just the bare bones overview. The first goal is to decide what the code is trying to do when it traps. Start with the Ring 0 Trap Screen Dump. If CSLIM is all F's, the trap is in the kernel. Experienced developers can sometimes derive a module name from the cs:eip value, but this is beyond the scope of this note. If the CSLIM is not all F's, the trap is either in a device driver or 16-bit code within the kernel. Scan the device driver list. Look for a matching CS value in the strategy entry point column or a matching DS value in the Device Header column. If SS is E8 or 15E8, the trap is within an interrupt handler. == E-mailing Trap Dumps == In general, don't e-mail a dump file to someone without asking them if they want you to send it. If you do need to e-mail the file, compress it first. This will save transmit time and protect the dump file from corruption. While zip is still the defacto standard for generating compressed archive files, other tools do a much better job. I recommend using 7z for compressing system dump files. A version is available at http://www.os2site.com/sw/util/archiver/zip/p7zip-9.20.1-os2.zip The space savings can be significant. A 2GiB system dump will typically compress 5 to 7 times better with 7z compared to zip. It's always a good idea to give the archive file a useful name. Something like DavesTrapE_20111221.7z will help everyone remember what the file contains. It's also a good idea to include a note in the archive file describing how and why the trap occurred along with the bldlevel output because the archive file may get separated from the e-mail message. You should include the output of procdump query in your note. This will describe the type of data recorded in the dump file. If the archive file is over 5MB or so, check with your helper before sending it via e-mail. You may need to use a file splitter and send each chunk in a separate e-mail giving your helper a chance to delete each e-mail from the server before you send the next chunk. Most ISPs limit e-mail inboxes to 10MB and unless you send the archive file in chunks, it will never get to your helper. These days FTP is often a better solution. Most helpers capable of analyzing a system dump file will have access to an FTP site where you can upload the archive. Be careful to send the e-mail containing the dump file only to the intended addressee. Sending a large, unexpected e-mail to all the members of a mailing list, some of whom may still be one dial up, is sure of upset someone. == Known Restrictions == None not described above Good luck. Steven