head	3.1;
access;
symbols;
locks; strict;
comment	@.\" @;
3.1
date	95.09.11.09.51.31;	author grog;	state Exp;
branches;
next	3.0;
3.0
date	95.06.25.10.55.49;	author grog;	state Exp;
branches;
next	2.8;
2.8
date	95.06.25.10.55.49;	author grog;	state Exp;
branches;
next	2.7;
2.7
date	95.06.24.11.45.14;	author grog;	state Exp;
branches;
next	2.6;
2.6
date	95.06.09.04.31.13;	author grog;	state Exp;
branches;
next	2.5;
2.5
date	95.05.17.17.33.19;	author grog;	state Exp;
branches;
next	2.4;
2.4
date	95.02.22.14.24.21;	author grog;	state Exp;
branches;
next	2.3;
2.3
date	95.02.03.13.25.50;	author grog;	state Exp;
branches;
next	2.2;
2.2
date	95.01.25.14.34.05;	author grog;	state Exp;
branches;
next	2.1;
2.1
date	95.01.25.13.34.27;	author grog;	state Exp;
branches;
next	2.0;
2.0
date	94.12.21.16.58.30;	author grog;	state Exp;
branches;
next	;
desc
@@
3.1
log
@Fix typos
@
text
@.\" For emacs, this file is in -*- nroff-fill -*- mode
.\" $Id: testing.ms,v 3.0 1995”N06ŒŽ25“ú 10:55:49 grog Exp grog $
.\" $Log: testing.ms,v $
.\" Revision 3.0 1995”N06ŒŽ25“ú 10:55:49 grog
.\" Final draft
.\"
.\" Revision 2.8 1995”N06ŒŽ25“ú 10:55:49 grog
.\" Final draft, second cut
.\"
.\" Revision 2.7 1995”N06ŒŽ24“ú 11:45:14 grog
.\" Final draft, first cut.
.\"
.\" Revision 2.6 1995”N06ŒŽ09“ú 04:31:13 grog
.\" Remove date from page headers
.\" Minor mods
.\"
.\" Revision 2.5 1995”N05ŒŽ17“ú 17:33:19 grog
.\" Major mods after Andy's final draft review
.\"
.\" Revision 2.4 1995”N02ŒŽ22“ú 14:24:21 grog
.\" Minor mods
.\"
.\" Revision 2.3 1995”N02ŒŽ03“ú 13:25:50 grog
.\" Mods after Andy's review
.\"
.\" Revision 2.2 1995”N01ŒŽ25“ú 14:34:05 grog
.\" Minor mods
.\"
.\" Revision 2.1 1995”N01ŒŽ25“ú 13:34:27 grog
.\" Minor mods
.\"
.\"
.so global.ms
.Se \*[nchtest] "Testing the results"
.St "Testing"
Finally \fImake\fR has run through to the end and has not reported errors. Your
source tree now contains all the objects and executables. You're done!
.LP
After a brief moment of euphoria, you sit down at the keyboard and start the
program:
.Ps
$ \f(CBxterm\f(CW
Segmentation fault - core dumped
.Pe
Well, maybe you're not quite done after all. Occasionally the program does not
work as advertised. What you do now depends on how much programming experience
you have. If you are a complete beginner, you could be in trouble--about the
only thing you can do (apart from asking somebody else) is to go back and check
that you really did configure the package correctly.
.LP
On the other hand, if you have even a slight understanding of programming, you
should try to analyze the cause of the error--it's easier than you think. Hold
on, and try not to look down.
.LP
There are thousands of possible reasons for the problems you encounter when you
try to run a buggy executable, and lots of good books explain debugging
techniques. In this chapter, we will touch only on aspects of debugging that
relate to porting. First we'll attack a typical, if somewhat involved,
real-life bug, and solve it, discussing the pros and cons on the way. Then
we'll look at alternatives to traditional debuggers: kernel and network tracing.
.LP
Before you even start your program, of course, you should check if any test
programs are available. Some packages include their own tests, and separate
test suites are available for others. For other packages there may be test
suites that were not designed for the package, but that can be used with it. If
there are any tests, you should obviously run them. You might also consider
writing some tests and including them as a target \s10\f(CWtest\fR\s0 in the
\fIMakefile\fR.
.Bh "What makes ported programs fail?"
.XX "failure, causes"
Ported programs don't normally fail for the same reasons as programs under
development. A program under development still has bugs that prevent it from
running correctly on any platform, while a ported program has already run
reasonably well on some other platform. If it doesn't run on your platform, the
reasons are usually:
.Ls B
.Li
A latent bug has found more fertile feeding ground. For example, a program may
read from a null pointer. This frequently doesn't get noticed if the data at
address 0 doesn't cause the program to do anything unusual. On the other hand,
if the new platform does not have any memory mapped at address 0, it will cause
a segmentation violation or a bus error.
.\" XXX Do we need this?
.\" \**
.\" .FS
.\" See \*[chsignal], page \*[SIGBUSEGV], for more details.
.\" .FE
.Li
Differences in the implementation of library functions or kernel functionality
cause the program to behave differently in the new environment. For example,
the function \s10\f(CWsetpgrp\fR\s0 has completely different semantics under
System V and under BSD. See \*[chkdepend], page \*[setpgrp], for more details.
.Li
The configuration scripts have never been adequately tested for your platform.
As a result, the program contains bugs that were not in the original versions.
.Le
.Ah "A strategy for testing"
.XX "strategy, for testing"
.XX "testing, strategy"
When you write your own program with its own bugs, it helps to understand
exactly what the program is trying to do: if you sit back and think about it,
you can usually shorten the debugging process. When debugging software that you
have just ported, the situation is different: you \fIdon't\fR understand the
package, and learning its internals could take months. You need to find a way
to track down the bug without getting bogged down with the specifics of how the
package works.
.LP
.XX "xterm"
You can overdo this approach, of course. It still helps to know what the
program is trying to do. For example, when \fIxterm\fR dies, it's nice to know
roughly how \fIxterm\fR works: it opens a window on an X server and emulates a
terminal in this window. If you know something about the internals of X11, this
will also be of use to you. But it's not time-effective to try to fight your
way through the source code of \fIxterm\fR.
.LP
.XX "GIGO"
.XX "garbage in, garbage out"
In the rest of this chapter, we'll use this bug (yes, it was a real live bug in
X11R6) to look at various techniques that you can use to localize and finally
pinpoint the problem. The principle we use is the old \fIGIGO\fR
principle--garbage in, garbage out. We'll subdivide the program into pieces
which we can conveniently observe, and check which of them does not produce the
expected output. After we find the piece with the error, we subdivide it
further and repeat the process until we find the bug. The emphasis in this
method is on \fIconvenient\fR: it doesn't necessarily have to make sense. As
long as you can continue to divide your problem area into between two and five
parts and localize the problem in one of the parts, it won't take long to find
the bug.
.LP
So what's a convenient way to look at the problems? That depends on the tools
you have at your disposal:
.Ls B
.Li
If you have a symbolic debugger, you can divide your problem into the individual
functions and examine what goes in and what goes out.
.Li
If you have a system call trace program, such as \fIktrace\fR or \fItruss\fR,
you can monitor what the program says to the system and what the system replies.
.Li
If you have a communications line trace program, you can try to divide your
program into pieces that communicate across this line, so you can see what they
are saying to each other.
.Le
Of course, we have all these things. In the following sections we'll look at
each of them in more detail.
.Ah "Symbolic debuggers"
.XX "symbolic debugger"
.XX "debugger, symbolic"
.XX "gdb"
.XX "pyramid"
If you don't have a symbolic debugger, get one. Now. Many people still claim
to be able to get by without a debugger, and it's horrifying how many people
don't even know how to use one. Of course you can debug just about anything
without a symbolic debugger. Historians tell us that you can build pyramids
without wheels--that's a comparable level of technology to testing without a
debugger. The GNU debugger, \fIgdb\fR, is available on just about every
platform you're likely to encounter, and though it's not perfect, it runs rings
around techniques like putting \fIprintf\fR statements in your programs.
.LP
.XX "attach, debugger"
.XX "debugger, attach"
In UNIX, a debugger is a process that takes control of the execution of another
process. Most versions of UNIX allow only one way for the debugger to take
control: it must start the process that it debugs. Some versions, notably SunOS
4, but not Solaris 2, also allow the debugger to \fIattach\fR to a running
process.
.LP
Whichever debugger you use, there are a surprisingly small number of commands
that you need. In the following discussion, we'll look at the command set of
\fIgdb\fR, since it is widely used. The commands for other symbolic debuggers
vary considerably, but they normally have similar purposes.
.Ls B
.Li
.XX "where am I?"
.XX "how did I get here?"
.XX "stack trace"
A \fIstack trace\fR command answers the question, "Where am I, and how did I get
here?", and is almost the most useful of all commands. It's certainly the first
thing you should do when examining a core dump or after getting a signal while
debugging the program. \fIgdb\fR implements this function with the
\s10\f(CWbacktrace\fR\s0 command.
.Li
\fIDisplaying data\fR is the most obvious requirement: what is the current
value of the variable \s10\f(CWbar\fR\s0? In \fIgdb\fR, you do this with the
\s10\f(CWprint\fR\s0 command.
.Li
\fIDisplaying register contents\fR is really the same thing as displaying
program data. In \fIgdb\fR, you display individual registers with the
\s10\f(CWprint\fR\s0 command, or all registers with the \s10\f(CWinfo
registers\fR\s0 command.
.Li
\fIModifying data and register contents\fR is an obvious way of modifying
program execution. In \fIgdb\fR, you do this with the \s10\f(CWset\fR\s0
command.
.XX "breakpoint, debugger"
.XX "debugger, breakpoint"
.Li
\fIbreakpoints\fR stop execution of the process when the process attempts to
execute an instruction at a certain address. \fIgdb\fR sets breakpoints with
the \s10\f(CWbreak\fR\s0 command.
.Li
.XX "watchpoint, debugger"
.XX "debugger, watchpoint"
Many modern machines have hardware support for more sophisticated breakpoint
mechanisms. For example, the i386 architecture can support four hardware
breakpoints on instruction fetch (in other words, traditional breakpoints),
memory read or memory write. These features are invaluable in systems that
support them; unfortunately, UNIX usually does not. \fIgdb\fR simulates this
kind of breakpoint with a so-called \fIwatchpoint\fR. When watchpoints are set,
\fIgdb\fR simulates program execution by single-stepping through the program.
When the condition (for example, writing to the global variable
\s10\f(CWfoo\fR\s0) is fulfilled, the debugger stops the program. This slows
down the execution speed by several orders of magnitude, whereas a real hardware
breakpoint has no impact on the execution speed.\**
.FS
Some architectures slow the overall execution speed slightly in order to test
the hardware registers. This effect is negligible.
.FE
.Li
.XX "program counter"
.XX "instruction pointer"
\fIJumping\fR (changing the address from which the next instruction will be
read) is really a special case of modifying register contents, in this case the
\fIprogram counter\fR (the register that contains the address of the next
instruction). This register is also sometimes called the \fIinstruction
pointer\fR, which makes more sense. In \fIgdb\fR, use the \s10\f(CWjump\fR\s0
command to do this. Use this instruction with care: if the compiler expects the
stack to look different at the source and at the destination, this can easily
cause incorrect execution.
.Li
.XX "single stepping, in debugger"
.XX "debugger, single stepping"
\fISingle stepping\fR in its original form is supported in hardware by many
architectures: after executing a single instruction, the machine automatically
generates a hardware interrupt that ultimately causes a \s10\f(CWSIGTRAP\fR\s0
signal to the debugger. \fIgdb\fR performs this function with the
\s10\f(CWstepi\fR\s0 command.
.Li
You won't want to execute individual machine instructions until you are in deep
trouble. Instead, you will execute a \fIsingle line\fR instruction, which
effectively single steps until you leave the current line of source code. To
add to the confusion, this is also frequently called \fIsingle stepping\fR.
This command comes in two flavours, depending on how it treats function calls.
One form will execute the function and stop the program at the next line after
the call. The other, more thorough form will stop execution at the first
executable line of the function. It's important to notice the difference
between these two functions: both are extremely useful, but for different
things. \fIgdb\fR performs single line execution omitting calls with the
\s10\f(CWnext\fR\s0 command, and includes calls with the \s10\f(CWstep\fR\s0
command.
.Le
There are two possible approaches when using a debugger. The easier one is to
wait until something goes wrong, then find out where it happened. This is
appropriate when the process gets a signal and does not overwrite the stack: the
\s10\f(CWbacktrace\fR\s0 command will show you how it got there.
.LP
Sometimes this method doesn't work well: the process may end up in
no-man's-land, and you see something like:
.Ps
Program received signal SIGSEGV, Segmentation fault.
0x0 in ?? ()
(gdb) \f(CBbt\f(CW \fI\&abbreviation for \f(BIbacktrace\f(CW
#0 0x0 in ?? () \fI\&nowhere\f(CW
(gdb)
.Pe
Before dying, the process has mutilated itself beyond recognition. Clearly, the
first approach won't work here. In this case, we can start by conceptually
dividing the program into a number of parts: initially we take the function
\s10\f(CWmain\fR\s0 and the set of functions which \s10\f(CWmain\fR\s0 calls.
By single stepping over the function calls until something blows up, we can
localize the function in which the problem occurs. Then we can restart the
program and single step through this function until we find what it calls before
dying. This iterative approach sounds slow and tiring, but in fact it works
surprisingly well.
.Ah "Libraries and debugging information"
.XX "libraries, debugging information"
.XX "debugging information, in libraries"
.XX "xterm"
Let's come back to our \fIxterm\fR program and use \fIgdb\fR to figure out what
is going on. We could, of course, look at the core dump, but in this case we
can repeat the problem at will, so we're better off looking at the live program.
We enter:
.Pn first-xterm-death
.Ps
$ \f(CBgdb xterm\f(CW
\fI(political statement for the FSF omitted)\f(CW
(gdb) r -display allegro:0 \fI\&run the program\f(CW
Starting program: /X/X11/X11R6/xc/programs/xterm/xterm -display allegro:0
Program received signal SIGBUS, Bus error.
0x3b0bc in _XtMemmove ()
(gdb) bt \fI\&look back down the stack\f(CW
#0 0x3b0bc in _XtMemmove () \fI\&all these functions come from the X toolkit\f(CW
#1 0x34dcd in XtScreenDatabase ()
#2 0x35107 in _XtPreparseCommandLine ()
#3 0x4e2ef in XtOpenDisplay ()
#4 0x4e4a1 in _XtAppInit ()
#5 0x35700 in XtOpenApplication ()
#6 0x357b5 in XtAppInitialize ()
#7 0x535 in main ()
(gdb) 
.Pe
The stack trace shows that the main program called
\s10\f(CWXtAppInitialize\fR\s0, and the rest of the stack shows the program deep
in the X Toolkit, one of the central X11 libraries. If this were a program that
you had just written, you could expect it to be a bug in your program. In this
case, where we have just built the complete X11 core system, there's also every
possibility that it is a library bug. As usual, the library was compiled
without debug information, and without that you hardly have a hope of finding
it.
.LP
Apart from size constraints, there is no reason why you can't include debugging
information in a library. The object files in libraries are just the same as
any others--we discuss them in detail on page \*[libdef]. If you want, you can
build libraries with debugging information, or you can take individual library
routines and compile them separately.
.LP
.XX "libXt.a"
Unfortunately, the size constraints are significant: without debugging
information, the file \fIlibXt.a\fR is about 330 kB long and contains 53 object
files. With debugging information, it might easily reach 20 MB, since all the
myriad X11 global symbols would be included with each object file in the
archive. It's not just a question of disk space: you also need virtual memory
during the link phase to accommodate all these symbols. Most of these files
don't interest us anyway: the first one that does is the one that contains
\s10\f(CW_XtMemmove\fR\s0. So we find where it is and compile it alone with
debugging information.
.LP
.XX "X Toolkit"
That's not as simple as it sounds: first we need to find the source file, and to
do that we need to find the source directory. We could read the documentation,
but to do that we need to know that the \fIXt\fR functions are in fact the X
toolkit. If we're using GNU \fImake\fR, or if our \fIMakefile\fR documents
directory changes, an alternative would be to go back to our \fImake\fR log and
look for the text \fIXt\fR. If we do this, we quickly find
.Ps
make[4]: Leaving directory `/X/X11R6/xc/lib/Xext'
making Makefiles in lib/Xt...
 mv Makefile Makefile.bak
make[4]: Entering directory `/X/X11R6/xc/lib/Xt'
make[4]: Nothing to be done for `Makefiles'.
make[4]: Leaving directory `/X/X11R6/xc/lib/Xt'
.Pe
.XX "XtMemmove, function"
.XX "function, XtMemmove"
So the directory is \fI/X/X11R6/xc/lib/Xt\fR. The next step is to find the file
that contains \s10\f(CWXtMemmove\fR\s0. There is a possibility that it is
called \fIXtMemmove.c\fR, but in this case there is no such file. We'll have to
grep for it. Some versions of \fIgrep\fR have an option to descend recursively
into subdirectories, which can be very useful if you have one available.
Another useful tool is \fIcscope\fR, which is supplied with System V.
.\" Thanks for cscope, Jox.
.Ps
$ \f(CBgrep XtMemmove *.c\f(CW
Alloc.c:void _XtMemmove(dst, src, length)
Convert.c: XtMemmove(&p->from.addr, from->addr, from->size);
\fI\&... many more references to XtMemmove\f(CW
.Pe
So \s10\f(CWXtMemmove\fR\s0 is in \fIAlloc.c\fR. By the same method, we look
for the other functions mentioned in the stack trace and discover that we also
need to recompile \fIInitialize.c\fR and \fIDisplay.c\fR.
.XX "Alloc.c"
.XX "Initialize.c"
.XX "Display.c"
.LP
In order to compile debugging information, we add the compiler option
\s10\f(CW-g\fR\s0. At the same time, we remove \s10\f(CW-O\fR\s0. \fIgcc\fR
doesn't require this, but it's usually easier to debug a non-optimized program.
We have three choices of how to set the options:
.Ls B
.Li
.XX "make, World target"
.XX "World, make target"
We can modify the \fIMakefile\fR (\fImake World\fR, the main \fImake\fR target
for X11, rebuilds the \fIMakefile\fRs from the corresponding \fIImakefile\fRs,
so this is not overly dangerous).
.Li
If we have a working version of \fIxterm\fR, we can use its facilities: first we
start the compilation with \fImake\fR, but we don't need to wait for the
compilation to complete: as soon as the compiler invocation appears on the
screen, we abort the build with \s10\f(CWCTRL-C\fR\s0. Using the \fIxterm\fR
copy function, we copy the compiler invocation to the command line and add the
options we want:
.\" XXX I don't like the way this looks. I don't seem to be getting f(BI below,
.\" and it doesn't look too good anyway. Any ideas? How about 'reverse video'
.\" for the marked-up stuff?
.Ps
$ \f(CBrm Alloc.o Initialize.o Display.o\f(CW \fI\&remove the old objects\f(CW
$ \f(CBmake\f(CW \fI\&and start make normally\f(CW
rm -f Alloc.o
\f(BIgcc -DNO_ASM -fstrength-reduce -fpcc-struct-return -c -I../.. \e
-DNO_AF_UNIX -DSYSV -DSYSV386 -DUSE_POLL Alloc.c\f(CW
^C			 \fIinterrupt make with CTRL-C\f(CW
make: *** [Alloc.o] Interrupt
\fIcopy the invocation lines above with the mouse, and paste below, then
modify as shown in bold print\f(CW
$ gcc -DNO_ASM -fstrength-reduce -fpcc-struct-return -c -I../.. \e
-DNO_AF_UNIX -DSYSV -DSYSV386 -DUSE_POLL Alloc.c \f(CB-g\f(CW
.Pe
You can also use \s10\f(CWmake -n\fR\s0, which just shows the commands that
\s10\f(CWmake\fR\s0 would execute, rather than aborting the \s10\f(CWmake\fR\s0,
but you frequently find that \s10\f(CWmake -n\fR\s0 prints out a whole lot of
stuff you don't expect. When you have made \fIAlloc.o\fR, you can repeat the
process for the other two object files.
.Li
We could change \s10\f(CWCFLAGS\fR\s0 from the \fImake\fR command line. Our
first attempt doesn't work too well, though. If you compare the following line
with the invocation above, you'll see that a whole lot of options are missing.
They were all in \s10\f(CWCFLAGS\fR\s0; by redefining \s10\f(CWCFLAGS\fR\s0, we
lose them all:
.Ps
$ \f(CBmake CFLAGS=-g\f(CW
rm -f Alloc.o
gcc -DNO_ASM -fstrength-reduce -fpcc-struct-return -c -g Alloc.c
.Pe
.IP
\s10\f(CWCFLAGS\fR\s0 included all the compiler options starting from
\s10\f(CW-I/../..\fR\s0, so we need to write:
.Ps
$ \f(CBmake CFLAGS='-g -c -I../.. -DNO_AF_UNIX -DSYSV -DSYSV386 -DUSE_POLL'\f(CW
.Pe
.Le
.XX "Alloc.o"
.XX "Initialize.o"
.XX "Display.o"
When we have created all three new object files, we can let \fImake\fR complete
the library for us. It will not try to remake these object files, since now
they are newer than any of their dependencies:
.Ps
$ \f(CBmake\f(CW \fI\&run make to build a new library\f(CW
rm -f libXt.a
ar clq libXt.a ActionHook.o Alloc.o ArgList.o Callback.o ClickTime.o Composite.o \e
Constraint.o Convert.o Converters.o Core.o Create.o Destroy.o Display.o Error.o \e
Event.o EventUtil.o Functions.o GCManager.o Geometry.o GetActKey.o GetResList.o \e
GetValues.o HookObj.o Hooks.o Initialize.o Intrinsic.o Keyboard.o Manage.o \e
NextEvent.o Object.o PassivGrab.o Pointer.o Popup.o PopupCB.o RectObj.o \e
Resources.o Selection.o SetSens.o SetValues.o SetWMCW.o Shell.o StringDefs.o \e
Threads.o TMaction.o TMgrab.o TMkey.o TMparse.o TMprint.o TMstate.o VarCreate.o \e
VarGet.o Varargs.o Vendor.o
ranlib libXt.a
rm -f ../../usrlib/libXt.a
cd ../../usrlib; ln ../lib/Xt/libXt.a .
$ 
.Pe
Now we have a copy of the X Toolkit in which these three files have been
compiled with symbols. Next, we need to rebuild \fIxterm\fR. That's
straightforward enough:
.Ps
$ \f(CBcd ../../programs/xterm/\f(CW
$ \f(CBpwd\f(CW
/X/X11R6/xc/programs/xterm
$ \f(CBmake\f(CW
rm -f xterm
gcc -DNO_ASM -fstrength-reduce -fpcc-struct-return -fwritable-strings -o xterm \e
-L../../usrlib main.o input.o charproc.o cursor.o util.o tabs.o screen.o \e
scrollbar.o button.o Tekproc.o misc.o VTPrsTbl.o TekPrsTbl.o data.o menu.o -lXaw \e
-lXmu -lXt -lSM -lICE -lXext -lX11 -L/usr/X11R6/lib -lpt -ltermlib
.Pe
.LP
Finally, we try again. Since the library is not in the current directory, we
use the \s10\f(CWdir\fR\s0 command to tell \fIgdb\fR where to find the sources.
Now we get:
.Ps
$ \f(CBgdb xterm\f(CW
(gdb) \f(CBdir ../../lib/X11\f(CW \fI\&set source paths\f(CW
Source directories searched:
/X/X11/X11R6/xc/programs/xterm/../../lib/X11:$cdir:$cwd
(gdb) \f(CBdir ../../lib/Xt\f(CW
Source directories searched:
/X/X11/X11R6/xc/programs/xterm/../../lib/Xt/X/X11/X11R6/xc/programs/xterm/../..\e
/lib/X11:$cdir:$cwd
(gdb) \f(CBr\f(CW \fI\&and run the program\f(CW
Starting program: /X/X11/X11R6/xc/programs/xterm/xterm 
Program received signal SIGBUS, Bus error.
0x3ced6 in _XtMemmove (dst=0x342d8 "\(-DE003‰~", src=0x41c800 "", length=383) \e
at Alloc.c:101
101 *dst++ = *src++;
(gdb) 
.Pe
This shows a typical byte for byte memory move. About the only thing that could
cause a bus error on that statement would be an invalid address, but the
parameters show that they appear to be valid.
.LP
There are at two possible gotchas here:
.Ls B
.Li
The debugger may be lying. The parameters it shows are the parameters on the
stack. If the code has been optimized, there is a very good chance that the
source and destination addresses are stored in registers, and thus the value of
\s10\f(CWdst\fR\s0 on the stack is not up to date.
.Li
The destination address may be in the text segment, in which case an attempt to
write to it will cause some kind of error. Depending on the system it could be
a segmentation violation or a bus error.
.Le
The most reliable way to find out what is really going on is to look at the
machine instructions being executed. First we tell the debugger to look at
current instruction and the following five instructions:
.Ps
(gdb) \f(CBx/6i $eip\f(CW \fI\&list the next 6 instructions\f(CW
0x3ced6 <_xtmemmove+74>: movb %al,(%edx)
0x3ced8 <_xtmemmove+76>: incl 0xc(%ebp)
0x3cedb <_xtmemmove+79>: incl 0x8(%ebp)
0x3cede <_xtmemmove+82>: jmp 0x3cec2 <_xtmemmove+54>
0x3cee0 <_xtmemmove+84>: leave 
0x3cee1 <_xtmemmove+85>: ret 
.Pe
The first instruction is a byte move, from register \s10\f(CWal\fR\s0 to the
address stored in register \s10\f(CWedx\fR\s0. Let's look at the address in
\s10\f(CWedx\fR\s0:
.Ps
(gdb) \f(CBp/x $edx\f(CW
9ƒhƒ‹ = 0x342d8
.Pe
Well, this is our \s10\f(CWdst\fR\s0 address alright--why can't it store there?
It would be nice to be able to try to set values in memory and see if the
debugger can do it:
.Ps
(gdb) \f(CBset *dst = 'X'b\f(CW
(gdb) \f(CBp *dst\f(CW
13ƒhƒ‹ = 88 'X'
.Pe
That looks writable enough. Unfortunately, you can't rely on the debugger to
tell the truth. Debuggers must be able to write to the text segment. If the
write had failed, you could have been sure that the address was not writable,
but if the write succeeds, you can't be sure. What we need to know are the
exact segment limits. Some debuggers show you the segment limits, but current
versions of \fIgdb\fR do not. An alternative is the \fIsize\fR command:
.XX "size"
.Ps
$ \f(CBsize xterm\f(CW
text data bss dec hex filename
846204 56680 23844 926728 e2408 xterm
.Pe
The text segment is 846204 decimal bytes long (0xce97c), and on this system (SCO
UNIX) it starts at address 0, so the address is, indeed, in the text segment.
But where did it come from? To find an answer to that question, we need to look at
the calling function. In \fIgdb\fR, we do this with the \s10\f(CWframe\fR\s0
command:
.Ps
(gdb) \f(CBf 1		\fIlook at the calling function (frame 1)\f(CW
#1 0x35129 in _MergeOptionTables (src1=0x342d8, num_src1=24, 
 src2=0x400ffe, num_src2=64, dst=0x7ffff9c0, num_dst=0x7ffff9bc)
 at Initialize.c:602
602 (void) memmove(table, src1, sizeof(XrmOptionDescRec) * num_src1 );
.Pe
That's funny--last time it died, the function was called from
\f(CWXtScreenDatabase\fR,\**
.FS
See frame 1 in the stack trace on page \*[first-xterm-death].
.FE
not from \f(CW_MergeOptionTables\fR. Why? At the moment it's difficult to say
for sure, but it's possible that this difference happened because we removed
optimization. In any case, we still have a problem, so we should fix this one
first and then go back and look for the other one if solving this problem isn't
enough.
.LP
In this case, the \fIframe\fR command doesn't help much, but it does tell us
that the destination variable is called \s10\f(CWtable\fR\s0, and implicitly
that \s10\f(CWmemmove\fR\s0 has been defined as \f(CW_XtMemmove\fR in this
source file. We could now look at the source file in an editor in a different X
window, but it's easier to list the instructions around the current line with
the \f(CWlist\fR command:
.Ps
(gdb) \f(CBl\f(CW
597 enum {Check, NotSorted, IsSorted} sort_order = Check;
598 
599 *dst = table = (XrmOptionDescRec*)
600 XtMalloc( sizeof(XrmOptionDescRec) * (num_src1 + num_src2) );
601 
602 (void) memmove(table, src1, sizeof(XrmOptionDescRec) * num_src1 );
603 if (num_src2 == 0) {
604 *num_dst = num_src1;
605 return;
606 }
.Pe
So, the address is returned by the function \f(CWXtMalloc\fR--it seems to be
allocating storage in the text segment. At this point, we could examine it more
carefully, but let's first be sure that we're looking at the right problem. The
address in \s10\f(CWtable\fR\s0 should be the same as the address in the
parameter \s10\f(CWdst\fR\s0 of \s10\f(CWXtMemmove\fR\s0. We're currently
examining the environment of \s10\f(CW_MergeOptionTables\fR\s0, so we can look
at it directly:
.Ps
(gdb) \f(CBp table\f(CW
29ƒhƒ‹ = (XrmOptionDescRec *) 0x41c800
.Pe
That looks just fine. Where did this strange \s10\f(CWdst\fR\s0 address come
from? Let's set a breakpoint on the call to \s10\f(CWmemmove\fR\s0 on line 602,
and then restart the program:
.Xs
(gdb) \f(CBb 602\f(CW
Breakpoint 8 at 0x35111: file Initialize.c, line 602.
(gdb) \f(CBr\f(CW
The program being debugged has been started already.
Start it from the beginning? (y or n) \f(CBy\f(CW
Starting program: /X/X11/X11R6/xc/programs/xterm/xterm 
Breakpoint 8, _MergeOptionTables (src1=0x342d8, num_src1=24, 
 src2=0x400ffe, num_src2=64, dst=0x7ffff9c0, num_dst=0x7ffff9bc)
 at Initialize.c:602
602 (void) memmove(table, src1, sizeof(XrmOptionDe
(gdb) \f(CBp table\f(CW \fI\&look again, to be sure\f(CW
31ƒhƒ‹ = (XrmOptionDescRec *) 0x41c800
(gdb) \f(CBs\f(CW \fI\&single step into memmove\f(CW
_XtMemmove (dst=0x342d8 "\(-DE003‰~", src=0x41c800 "", length=384)
 at Alloc.c:94
94 if (src < dst) { .Xe This is really strange! \s10\f(CWtable\fR\s0 has a valid address in the data segment, but the address we pass to \s10\f(CW_XtMemmove\fR\s0 is in the text segment and seems unrelated. It's not clear what we should look at next: .Ls B .Li The source of the function calls \s10\f(CWmemmove\fR\s0, but after preprocessing it ends up calling \s10\f(CW_XtMemmove\fR\s0. \s10\f(CWmemmove\fR\s0 might simply be defined as \s10\f(CW_XtMemmove\fR\s0, but it might also be defined with parameters, in which case some subtle type conversions might result in our problem. .Li If you understand the assembler of the system, it might be instructive to look at the actual instructions that the compiler produces. .Le It's definitely quicker to look at the assembler instructions than to fight your way through the thick undergrowth in the X11 source tree: .Ps (gdb) \f(CBx/8i $eip\f(CW \fI\&look at the next 8 instructions\f(CW 0x35111 <_mergeoptiontables+63>: movl 0xc(%ebp),%edx
0x35114 <_mergeoptiontables+66>: movl %edx,0xffffffd8(%ebp)
0x35117 <_mergeoptiontables+69>: movl 0xffffffd8(%ebp),%edx
0x3511a <_mergeoptiontables+72>: shll 0ƒhƒ‹x4,%edx
0x3511d <_mergeoptiontables+75>: pushl %edx
0x3511e <_mergeoptiontables+76>: pushl 0xfffffffc(%ebp)
0x35121 <_mergeoptiontables+79>: pushl 0x8(%ebp)
0x35124 <_mergeoptiontables+82>: call 0x3ce8c <_xtmemmove>
.Pe
This isn't easy stuff to handle, but it's worth understanding, so we'll pull it
apart, instruction for instruction. It's easier to understand this discussion
if you refer to the diagrams of stack structure in \*[chobj], page
\*[complete-stack].
.Ls B
.Li
\s10\f(CWmovl 0xc(%ebp),%edx\fR\s0 takes the content of the stack word offset 12
in the current stack frame and places it in register \s10\f(CWedx\fR\s0. As we
have seen, this is \s10\f(CWnum_src1\fR\s0, the second parameter passed to
\s10\f(CW_MergeOptionTables\fR\s0.
.Li
\s10\f(CWmovl %edx,0xffffffd8(%ebp)\fR\s0 stores the value of \s10\f(CWedx\fR\s0
at offset -40 in the current stack frame. This is for temporary storage.
.Li
\s10\f(CWmovl 0xffffffd8(%ebp),%edx\fR\s0 does \fIexactly\fR the opposite: it
loads register \s10\f(CWedx\fR\s0 from the location where it just stored it.
These two instructions are completely redundant. They are also a sure sign that
the function was compiled without optimization.
.Li
\s10\f(CWshll 0ƒhƒ‹x4,%edx\fR\s0 shifts the contents of register \s10\f(CWedx\fR\s0
left by 4 bits, multiplying it by 16. If we compare this to the source, it's
evident that the value of \s10\f(CWXrmOptionDescRec\fR\s0 is 16, and that the
compiler has taken a short cut to evaluate the third parameter of the call.
.Li
\s10\f(CWpushl %edx\fR\s0 pushes the contents of \s10\f(CWedx\fR\s0 onto the
stack.
.Li
\s10\f(CWpushl 0xfffffffc(%ebp)\fR\s0 pushes the value of the word at offset -4
in the current stack frame onto the stack. This is the value of
\s10\f(CWtable\fR\s0, as we can confirm by looking at the instructions generated
for the previous line.
.Li
\s10\f(CWpushl 0x8(%ebp)\fR\s0 pushes the value of the first parameter,
\s10\f(CWsrc1\fR\s0, onto the stack.
.Li
Finally, \s10\f(CWcall _XtMemmove\fR\s0 calls the function. Expressed in C, we
now know that it calls
.Ps
memmove (src1, table, num_src1 << 4); .Pe .Le This is, of course, wrong: the parameter sequence of source and destination has been reversed. Let's look at \s10\f(CW_XtMemmove\fR\s0 more carefully: .Ps (gdb) \f(CBl _XtMemmove\f(CW 89 #ifdef _XNEEDBCOPYFUNC 90 void _XtMemmove(dst, src, length) 91 char *dst, *src; 92 int length; 93 { 94 if (src < dst) { 95 dst += length; 96 src += length; 97 while (length--) 98 *--dst = *--src; 99 } else { 100 while (length--) 101 *dst++ = *src++; 102 } 103 } 104 #endif .Pe Clearly the function parameters are the same as those of \s10\f(CWmemmove\fR\s0, but the calling sequence has reversed them. We've found the problem, but we haven't found what's causing it. .LP \fIAside\fR: Debugging is not an exact science. We've found our problem, though we still don't know what's causing it. But looking back at .Ref e , p we see that the address for \s10\f(CWsrc\fR\s0 on entering \s10\f(CW_XtMemmove\fR\s0 was the same as the address of \s10\f(CWtable\fR\s0. That tells us as much as analyzing the machine code did. This will happen again and again: after you find a problem, you discover you did it the hard way. .LP The next thing we need to figure out is why the compiler reversed the sequence of the parameters. Can this be a compiler bug? Theoretically, yes, but it's very unlikely that such a primitive bug should go undiscovered up to now. .LP Remember that the compiler does not compile the sources you see: it compiles whatever the preprocessor hands to it. It makes a lot of sense to look at the preprocessor output. To do this, we go back to the library directory. Since we used \s10\f(CWpushd\fR\s0, this is easy--just enter \s10\f(CWpushd\fR\s0. In the library, we use the same trick as before in order to run the compiler with different options, only this time we use the options \s10\f(CW-E\fR\s0 (stop after running the preprocessor), \s10\f(CW-dD\fR\s0 (retain the text of the definitions in the preprocessor output), and \s10\f(CW-C\fR\s0 (retain comments in the preprocessor output). In addition, we output to a file \fIjunk.c\fR: .Ps $ \f(CBpushd\f(CW $ \f(CBrm Initialize.o\f(CW $ \f(CBmake Initialize.o\f(CW rm -f Initialize.o gcc -DNO_ASM -fstrength-reduce -fpcc-struct-return -c -g -I../.. \\ -D_SVID -DNO_AF_UNIX -DSYSV -DSYSV386 -DUSE_POLL Initialize.c make: *** [Initialize.o] Interrupt \fI\&hit CTRL-C\f(CW \fI\&... copy the command into the command line, and extend:\f(CW $ \f(CWgcc -DNO_ASM -fstrength-reduce -fpcc-struct-return -c -g -I../.. \\ -D_SVID -DNO_AF_UNIX -DSYSV -DSYSV386 -DUSE_POLL Initialize.c \\ \f(CB-E -dD -C>junk.c
$
.Pe
As you might have guessed, we now look at the file \fIjunk.c\fR with an editor.
We're looking for \s10\f(CWmemmove\fR\s0, of course. We find a definition in
\fI/usr/include/string.h\fR, then later on we find, in
\fI/X/X11/X11R6/xc/X11/Xfuncs.h\fR,
.Ps
#define memmove(dst,src,len) bcopy((char *)(src),(char *)(dst),(int)(len))
#define memmove(dst,src,len) _XBCOPYFUNC((char *)(src),(char *)(dst),(int)(len))
#define _XNEEDBCOPYFUNC
.Pe
For some reason, the configuration files have decided that
\s10\f(CWmemmove\fR\s0 is not defined on this system, and have replaced it with
\s10\f(CWbcopy\fR\s0 (which is really not defined on this system). Then they
replace it with the substitute function \s10\f(CW_XBCOPYFUNC\fR\s0, almost
certainly a preprocessor definition. It also defines the preprocessor variable
\s10\f(CW_XNEEDBCOPYFUNC\fR\s0 to indicate that \s10\f(CW_XtMemmove\fR\s0 should
be compiled.
.LP
Unfortunately, we don't see what happens with \s10\f(CW_XNEEDBCOPYFUNC\fR\s0.
The preprocessor discards all \s10\f(CW#ifdef\fR\s0 lines. It does include
\s10\f(CW#define\fR\s0s, however, so we can look for where
\s10\f(CW_XBCOPYFUNC\fR\s0 is defined--it's in \fIIntrinsicI.h\fR, as the last
\s10\f(CW#line\fR\s0 directive before the definition indicates.
.Ps
#define _XBCOPYFUNC _XtMemmove
.Pe
\fIIntrinsicI.h\fR also contains a number of definitions for
\s10\f(CWXtMemmove\fR\s0, none of which are used in the current environment, but
all of which have the parameter sequence \s10\f(CW(dst, src, count)\fR\s0.
\s10\f(CWbcopy\fR\s0 has the parameter sequence \s10\f(CW(src, dst,
count)\fR\s0. Clearly, somebody has confused something in this header file, and
under certain rare circumstances the call is defined with the incorrect
parameter sequence.
.LP
Somewhere in here is a lesson to be learnt: this is a real bug that occurred in
X11R6, patch level 3, one of the most reliable and most portable software
packages available, yet here we have a really primitive bug. The real problem
lies in the configuration mechanism: automated configuration can save a lot of
time in normal circumstances, but it can also cause lots of pain if it makes
incorrect assumptions. In this case, the environment was unusual: the kernel
platform was SCO UNIX, which has an old-fashioned library, but the library was
GNU \fIlibc\fR. This caused the assumptions of the configuration mechanism to
break down.
.LP
Let's look more carefully at the part of \fIXfuncs.h\fR where we found the
definitions:
.Ps
/* the new Xfuncs.h */
#if !defined(X_NOT_STDC_ENV) && (!defined(sun) &#124;&#124; defined(SVR4))
/* the ANSI C way */
#ifndef _XFUNCS_H_INCLUDED_STRING_H
#include <string.h>
#endif
#undef bzero
#define bzero(b,len) memset(b,0,len)
#else /* else X_NOT_STDC_ENV or SunOS 4 */
#if defined(SYSV) &#124;&#124; defined(luna) &#124;&#124; defined(sun) &#124;&#124; defined(__sxg__)
#include <memory.h>
#define memmove(dst,src,len) bcopy((char *)(src),(char *)(dst),(int)(len))
#if defined(SYSV) && defined(_XBCOPYFUNC)
#undef memmove
#define memmove(dst,src,len) _XBCOPYFUNC((char *)(src),(char *)(dst),(int)(len))
#define _XNEEDBCOPYFUNC
#endif
#else /* else vanilla BSD */
#define memmove(dst,src,len) bcopy((char *)(src),(char *)(dst),(int)(len))
#define memcpy(dst,src,len) bcopy((char *)(src),(char *)(dst),(int)(len))
#define memcmp(b1,b2,len) bcmp((char *)(b1),(char *)(b2),(int)(len))
#endif /* SYSV else */
#endif /* ! X_NOT_STDC_ENV else */
.Pe
This is hairy (and incorrect) stuff. It makes its decisions based on the
variables \s10\f(CWX_NOT_STDC_ENV\fR\s0, \s10\f(CWsun\fR\s0,
\s10\f(CWSVR4\fR\s0, \s10\f(CWSYSV\fR\s0, \s10\f(CWluna\fR\s0,
\s10\f(CW__sxg__\fR\s0 and \s10\f(CW_XBCOPYFUNC\fR\s0. These are the decisions:
.Ls B
.Li
If the variable is \fInot\fR defined, it assumes ANSI C, unless this is a
pre-SVR4 Sun machine.
.Li
Otherwise it checks the variables \s10\f(CWSYSV\fR\s0 (for System V.3),
\s10\f(CWluna\fR\s0, \s10\f(CWsun\fR\s0 or \s10\f(CW__sxg__\fR\s0. If any of
these are set, it includes the file \fImemory.h\fR and defines \fImemmove\fR in
terms of \fIbcopy\fR. If \s10\f(CW_XBCOPYFUNC\fR\s0 is defined, it redefines
\s10\f(CWmemmove\fR\s0 as \s10\f(CW_XBCOPYFUNC\fR\s0, reversing the parameters
as it goes.
.Li
If none of these conditions apply, it assumes a vanilla BSD machine and defines
the functions \s10\f(CWmemmove\fR\s0, \s10\f(CWmemcpy\fR\s0 and
\s10\f(CWmemcmp\fR\s0 in terms of the BSD functions \s10\f(CWbcopy\fR\s0 and
\s10\f(CWbcmp\fR\s0.
.Le
There are two errors here:
.Ls B
.Li
The only way that \s10\f(CW_XBCOPYFUNC\fR\s0 is ever defined is as
\s10\f(CW_XtMemmove\fR\s0, which does \fInot\fR have the same parameter sequence
as \s10\f(CWbcopy\fR\s0--instead, it has the same parameter sequence as
\s10\f(CWmemmove\fR\s0. We can fix this part of the header by changing the
definition line to
.Ps
#define memmove(dst,src,len) _XBCOPYFUNC((char *)(dst),(char *)(src),(int)(len))
.Pe
.IP
or even to
.Ps
#define memmove _XBCOPYFUNC
.Pe
.Li
There is no reason to assume that this system does not use ANSI C: it's using
\fIgcc\fR and GNU \fIlibc.a\fR, both of them very much standard compliant. We
need to examine this point in more detail:
.Le
Going back to our \fIjunk.c\fR, we search for \s10\f(CWX_NOT_STDC_ENV\fR\s0 and
find it defined at line 85 of \fI/X/X11/X11R6/xc/X11/Xosdefs.h\fR:
.Ps
#ifdef SYSV386
#ifdef SYSV
#define X_NOT_POSIX
#define X_NOT_STDC_ENV
#endif
#endif
.Pe
In other words, this bug is likely to occur only with System V.3 implementations
on Intel architecture. This is a fairly typical way to make decisions about the
system, but it is wrong: \s10\f(CWX_NOT_STDC_ENV\fR\s0 relates to a compiler,
not an operating system, but both \s10\f(CWSYSV386\fR\s0 and \s10\f(CWSYSV\fR\s0
define operating system characteristics. At first sight it would seem logical
to modify the definitions like this:
.Ps
#ifdef SYSV386
#ifdef SYSV
#ifndef __GNU_LIBRARY__
#define X_NOT_POSIX
#endif
#ifndef __GNUC__
#define X_NOT_STDC_ENV
#endif
#endif
#endif
.Pe
This would only define the variables if the library is not GNU \fIlibc\fR or the
compiler is not \fIgcc\fR. This is still not correct: the relationship between
\s10\f(CW__GNUC__\fR\s0 and \s10\f(CWX_NOT_STDC_ENV\fR\s0 or
\s10\f(CW__GNU_LIBRARY__\fR\s0 and \s10\f(CWX_NOT_POSIX\fR\s0 is not related to
System V or the Intel architecture. Instead, it makes more sense to backtrack at
the end of the file:
.Ps
#ifdef __GNU_LIBRARY__
#undef X_NOT_POSIX
#endif
#ifdef __GNUC__
#undef X_NOT_STDC_ENV
#endif
.Pe
Whichever way we look at it, this is a mess. We're applying cosmetic patches to
a configuration mechanism which is based in incorrect assumptions. Until some
better configuration mechanism comes along, unfortunately, we're stuck with this
situation.
.Bh "Limitations of debuggers"
Debuggers are useful tools, but they have their limitations. Here are a couple
which could cause you problems:
.Ch "Can't breakpoint beyond fork"
.XX "debugging, past fork"
.XX "fork, debugging past"
.Pn fork-debugging
UNIX packages frequently start multiple processes to do the work on hand.
Frequently enough, the program that you start does nothing more than to spawn a
number of other processes and wait for them to stop. Unfortunately, the
\s10\f(CWptrace\fR\s0 interface which debuggers use requires the process to be
started by the debugger. Even in SunOS 4, where you can attach the debugger to
a process that is already running, there is no way to monitor it from the start.
Other systems don't offer even this facility. In some cases you can determine
how the process was started and start it with the debugger in the same manner.
This is not always possible--for example, many child processes communicate with
their parent.
.LP
Unfortunately, SunOS trace doesn't support tracing through
\s10\f(CWfork\fR\s0. \fItruss\fR does it better than \fIktrace\fR. In extreme
cases (like debugging a program of this nature on SunOS 4, where there is no
support for trace through \s10\f(CWfork\fR\s0), you might find it an advantage
to port to a different machine running an operating system such as Solaris 2 in
order to be able to test with \fItruss\fR. Of course, Murphy's law says that
the bug won't show up under Solaris 2.
.Ch "Terminal logs out"
The debugger usually shares a terminal with the program being tested. If the
program changes the driver configuration, the debugger should change it back
again when it gains control (for example, on hitting a breakpoint), and set it
back to the way the program set it before continuing. In some cases, however,
it can't: if the process has taken ownership of the terminal with a system call
like \fIsetsid\fR (see \*[chkdepend], page \*[setsid]), it will no longer have
access to the terminal. Under these circumstances, most debuggers crawl into a
corner and die. Then the shell in control of the terminal awakes and dies too.
If you're running in an \fIxterm\fR, the \fIxterm\fR then stops; if you're
running on a glass tty, you will be logged out.
.LP
The best way out of this dilemma is to start the child process on a different
terminal, if your debugger and your hardware configuration support it. To do
this with an \fIxterm\fR requires starting a program which just sleeps, so that
the window stays open until you can start your test program:
.Ps
$ \f(CBxterm -e sleep 100000&\f(CW
[1] 27013
$ \f(CBps aux&#124;grep sleep\f(CW
grog 27025 3.0 0.0 264 132 p6 S+ 1:13PM 0:00.03 grep sleep
root 27013 0.0 0.0 1144 740 p6 I 1:12PM 0:00.37 xterm -e sleep 100000
grog 27014 0.0 0.0 100 36 p8 Is+ 1:12PM 0:00.06 sleep 100000
$ \f(CBgdb myprog\f(CW
(gdb) \f(CBr < /dev/ttyp8> /dev/ttyp8\f(CW
.Pe
This example was done on a BSD machine. On a System V machine you will need to
use \fIps -ef\fR instead of \fIps aux\fR. First, you start an \fIxterm\fR with
\fIsleep\fR as controlling shell (so that it will stay there). With \fIps\fR
you grep for the controlling terminal of the \fIsleep\fR process (the third line
in the example), and then you start your program with \fIstdin\fR and
\fIstdout\fR redirected to this terminal.
.Ch "Can't interrupt process"
The \fIptrace\fR interface uses the signal \s10\f(CWSIGTRAP\fR\s0 to communicate
with the process being debugged. What happens if you block this signal, or
ignore it? Nothing--the debugger doesn't work any more. It's bad practice to
block \s10\f(CWSIGTRAP\fR\s0, of course, but it can be done. More frequently,
though, you'll encounter this problem when a process gets stuck in a signal
processing loop and doesn't get round to processing the
\s10\f(CWSIGTRAP\fR\s0--precisely one of the times when you would want to
interrupt it. My favourite one is the program which had a
\s10\f(CWSIGSEGV\fR\s0 handler which went and retried the instruction.
Unfortunately, the only signal to which a process in this state will still
respond is \s10\f(CWSIGKILL\fR\s0, which doesn't help you much in finding out
what's going on.
.Ah "Tracing system calls"
.XX "tracing, system calls"
.XX "system calls, tracing"
An alternative approach is to divide the program between system code and user
code. Most systems have the ability to trace the parameters supplied to each
system call and the results that they return. This is not nearly as good as
using a debugger, but it works with all object files, even if they don't have
symbols, and it can be very useful when you're trying to figure out why a
program doesn't open a specific file. 
.LP
Tracing is a very system-dependent function, and there are a number of different
programs to perform the trace: \fItruss\fR runs on System V.4, \fIktrace\fR runs
on BSD NET/2 and 4.4BSD derived systems, and \fItrace\fR runs on SunOS 4. They
vary significantly in their features. We'll look briefly at each. Other
systems supply still other programs--for example, SGI's IRIX operating system
supplies the program \fIpar\fR, which offers similar functionality.
.XX "par, program"
.XX "program, par"
.Bh "trace"
.XX "trace"
\fItrace\fR is a relatively primitive tool supplied with SunOS 4 systems. It
can either start a process or attach to an existing process, and it can print
summary information or a detailed trace. In particular, it \fIcannot\fR trace
the child of a \s10\f(CWfork\fR\s0 call, which is a great disadvantage. Here's
an example of \fItrace\fR output with a possibly recognizable program:
.Ps
$ \f(CBtrace hello\f(CW
open ("/usr/lib/ld.so", 0, 040250) = 3
read (3, "".., 32) = 32
mmap (0, 40960, 0x5, 0x80000002, 3, 0) = 0xf77e0000
mmap (0xf77e8000, 8192, 0x7, 0x80000012, 3, 32768) = 0xf77e8000
open ("/dev/zero", 0, 07) = 4
getrlimit (3, 0xf7fff488) = 0
mmap (0xf7800000, 8192, 0x3, 0x80000012, 4, 0) = 0xf7800000
close (3) = 0
getuid () = 1004
getgid () = 1000
open ("/etc/ld.so.cache", 0, 05000100021) = 3
fstat (3, 0xf7fff328) = 0
mmap (0, 4096, 0x1, 0x80000001, 3, 0) = 0xf77c0000
close (3) = 0
open ("/opt/lib/gcc-lib/sparc-sun-sunos".., 0, 01010525) = 3
fstat (3, 0xf7fff328) = 0
getdents (3, 0xf7800108, 4096) = 212
getdents (3, 0xf7800108, 4096) = 0
close (3) = 0
open ("/opt/lib", 0, 056) = 3
getdents (3, 0xf7800108, 4096) = 264
getdents (3, 0xf7800108, 4096) = 0
close (3) = 0
open ("/usr/lib/libc.so.1.9", 0, 023170) = 3
read (3, "".., 32) = 32
mmap (0, 458764, 0x5, 0x80000002, 3, 0) = 0xf7730000
mmap (0xf779c000, 16384, 0x7, 0x80000012, 3, 442368) = 0xf779c000
close (3) = 0
open ("/usr/lib/libdl.so.1.0", 0, 023210) = 3
read (3, "".., 32) = 32
mmap (0, 16396, 0x5, 0x80000002, 3, 0) = 0xf7710000
mmap (0xf7712000, 8192, 0x7, 0x80000012, 3, 8192) = 0xf7712000
close (3) = 0
close (4) = 0
getpagesize () = 4096
brk (0x60d8) = 0
brk (0x70d8) = 0
ioctl (1, 0x40125401, 0xf7ffea8c) = 0
write (1, "Hello, World!\n", 14) = Hello, World!
14
close (0) = 0
close (1) = 0
close (2) = 0
exit (1) = ?
.Pe
What's all this output? All we did was a simple write, but we have performed a
total of 43 system calls. This shows in some detail how much the viewpoint of
the world differs when you're on the other side of the system library. This
program, which was run on a SparcStation 2 with SunOS 4.1.3, first sets up the
shared libraries (the sequences of \s10\f(CWopen\fR\s0, \s10\f(CWread\fR\s0,
\s10\f(CWmmap\fR\s0, and \s10\f(CWclose)\fR\s0, then initializes the
\s10\f(CWstdio\fR\s0 library (the calls to \s10\f(CWgetpagesize\fR\s0,
\s10\f(CWbrk\fR\s0, \s10\f(CWioctl\fR\s0, and \s10\f(CWfstat\fR\s0), and finally
writes to \fIstdout\fR and exits. It also looks strange that it closed
\fIstdin\fR before writing the output text: again, this is a matter of
perspective. The \s10\f(CWstdio\fR\s0 routines buffer the text, and it didn't
actually get written until the process exited, just before closing \fIstdout\fR.
.Bh "ktrace"
.XX "ktrace"
.XX "ktrace.out, file"
.XX "file, ktrace.out"
.XX "kdump"
\fIktrace\fR is supplied with newer BSD systems. Unlike the other trace
programs, it writes unformatted data to a log file (by default,
\fIktrace.out\fR), and you need to run another program, \fIkdump\fR, to display
the log file. It has the following options:
.Ls B
.Li
It can trace the descendents of the process it is tracing. This is particularly
useful when the bug occurs in large complexes of processes, and you don't even
know which process is causing the problem.
.Li
It can attach to processes that are already running. Optionally, it can also
attach to existing children of the processes to which it attaches.
.Li
It can specify broad subsets of system calls to trace: system calls, namei
translations (translation of file name to inode number), I/O, and signal
processing.
.Le
Here's an example of \fIktrace\fR running against the same program:
.Ps
$ \f(CBktrace hello\f(CW
Hello, World!
$ \f(CBkdump\f(CW
 20748 ktrace RET ktrace 0
 20748 ktrace CALL getpagesize
 20748 ktrace RET getpagesize 4096/0x1000
 20748 ktrace CALL break(0xadfc)
 20748 ktrace RET break 0
 20748 ktrace CALL break(0xaffc)
 20748 ktrace RET break 0
 20748 ktrace CALL break(0xbffc)
 20748 ktrace RET break 0
 20748 ktrace CALL execve(0xefbfd148,0xefbfd5a8,0xefbfd5b0)
 20748 ktrace NAMI "./hello"
 20748 hello RET execve 0
 20748 hello CALL fstat(0x1,0xefbfd2a4)
 20748 hello RET fstat 0
 20748 hello CALL getpagesize
 20748 hello RET getpagesize 4096/0x1000
 20748 hello CALL break(0x7de4)
 20748 hello RET break 0
 20748 hello CALL break(0x7ffc)
 20748 hello RET break 0
 20748 hello CALL break(0xaffc)
 20748 hello RET break 0
 20748 hello CALL ioctl(0x1,TIOCGETA,0xefbfd2e0)
 20748 hello RET ioctl 0
 20748 hello CALL write(0x1,0x8000,0xe)
 20748 hello GIO fd 1 wrote 14 bytes
 "Hello, World!
 "
 20748 hello RET write 14/0xe
 20748 hello CALL exit(0xe)
.Pe
This display contains the following information in columnar format:
.Ls N
.Li
The process ID of the process.
.Li
The name of the program from which the process was started. We can see that the
name changes after the call to \s10\f(CWexecve\fR\s0.
.Li
The kind of event. \s10\f(CWCALL\fR\s0 is a system call, \s10\f(CWRET\fR\s0 is a
return value from a system call, \s10\f(CWNAMI\fR\s0 is a system internal call
to the function \s10\f(CWnamei\fR\s0, which determines the inode number for a
pathname, and \s10\f(CWGIO\fR\s0 is a system internal I/O call.
.Li
The parameters to the call.
.Le
In this trace, run on an Intel 486 with BSD/OS 1.1, we can see a significant
difference from SunOS: there are no shared libraries. Even though each system
call produces two lines of output (the call and the return value), the output is
much shorter.
.Bh "truss"
.XX "truss"
\fItruss\fR, the System V.4 trace facility, offers the most features:
.Ls B
.Li
It can print statistical information instead of a trace.
.Li
It can display the argument and environment strings passed to each call to
\s10\f(CWexec\fR\s0.
.Li
It can trace the descendents of the process it is tracing. 
.Li
Like \fIktrace\fR, it can attach to processes which are already running and
optionally attach to existing children of the processes to which it attaches.
.Li
It can trace specific system calls, signals, and interrupts (called \fIfaults\fR
in System V terminology). This is a very useful feature: as we saw in the
\fIktrace\fR example above, the C library may issue a surprising number of
system calls.
.Le
Here's an example of \fItruss\fR output:
.Ps
$ \f(CBtruss -f hello\f(CW
511: execve("./hello", 0x08047834, 0x0804783C) argc = 1
511: getuid() = 1004 [ 1004 ]
511: getuid() = 1004 [ 1004 ]
511: getgid() = 1000 [ 1000 ]
511: getgid() = 1000 [ 1000 ]
511: sysi86(SI86FPHW, 0x80036058, 0x80035424, 0x8000E255) = 0x00000000
511: ioctl(1, TCGETA, 0x08046262) = 0
Hello, World!
511: write(1, " H e l l o , W o r l d".., 14) = 14
511: _exit(14)
.Pe
\fItruss\fR offers a lot of choice in the amount of detail it can display. For
example, you can select a verbose parameter display of individual system calls.
If we're interested in the parameters to the \s10\f(CWioctl\fR\s0 call, we can
enter:
.Ps
$ \f(CBtruss -f -v ioctl hello\f(CW
\fI\&...\f(CW
516: ioctl(1, TCGETA, 0x08046262) = 0
516: iflag=0004402 oflag=0000005 cflag=0002675 lflag=0000073 line=0
516: cc: 177 003 010 030 004 000 000 000
.Pe
.XX "termio"
In this case, \fItruss\fR shows the contents of the \fItermio\fR structure
associated with the \s10\f(CWTCGETA\fR\s0 request--see \*[chterm], pages
\*[termios] and \*[TCGETA], for the interpretation of this information.
.Bh "Tracing through fork"
.XX "tracing, through fork"
.XX "fork, tracing through"
We've seen that \fIktrace\fR and \fItruss\fR can both trace the child of a
\s10\f(CWfork\fR\s0 system call. This is invaluable: as we saw on page
\*[fork-debugging], debuggers can't do this.
.LP
Unfortunately, SunOS trace doesn't support tracing through
\s10\f(CWfork\fR\s0. \fItruss\fR does it better than \fIktrace\fR. In extreme
cases (like debugging a program of this nature on SunOS 4, where there is no
support for trace through \s10\f(CWfork\fR\s0), you might find it an advantage
to port to a different machine running an operating system such as Solaris 2 in
order to be able to test with \fItruss\fR. Of course, Murphy's law says that
the bug won't show up under Solaris 2.
.Bh "Tracing network traffic"
.XX "tracing, network traffic"
.XX "network traffic, tracing"
Another place where we can trace is at the network interface. Many processes
communicate across the network, and if we have tools to look at this
communication, they may help us isolate the part of the package that is causing
the problem.
.LP
Two programs trace message flow across a network:
.Ls B
.Li
.XX "tcpdump"
.XX "Berkeley Packet Filter"
On BSD systems, \fItcpdump\fR and the \fIBerkeley Packet Filter\fR provide a
flexible means of tracing traffic across Internet domain sockets. See
\*[appsource], for availability.
.Li
.XX "trpt"
\fItrpt\fR will print a trace from a socket marked for debugging. This function
is available on System V.4 as well, though it is not clear what use it is under
these circumstances, since System V.4 emulates sockets in a library module. On
BSD systems, it comes in a poor second to \fItcpdump\fR.
.Le
Tracing net traffic is an unusual approach, and we won't consider it here, but
in certain circumstances it is an invaluable tool. You can find all you need to
know about \fItcpdump\fR in \fITCP/IP Illustrated, Volume 1\fR, by Richard
Stevens.
.XX "Stevens, W. Richard"
@
3.0
log
@Final draft
@
text
@d2 1
a2 1
.\" $Id: testing.ms,v 2.8 1995”N06ŒŽ25“ú 10:55:49 grog Exp grog $
d4 3
d564 1
a564 1
the the \f(CWlist\fR command:
@
2.8
log
@Final draft, second cut
@
text
@d2 1
a2 1
.\" $Id: testing.ms,v 2.7 1995”N06ŒŽ24“ú 11:45:14 grog Exp grog $
d4 3
@
2.7
log
@Final draft, first cut.
@
text
@d2 1
a2 1
.\" $Id: testing.ms,v 2.6 1995”N06ŒŽ09“ú 04:31:13 grog Exp grog $
d4 3
d39 5
a43 5
Well, this time you're not quite done after all. Occasionally the program does
not work as advertised. What you do now depends on how much programming
experience you have. If you are a complete beginner, you could be in
trouble--about the only thing you can do (apart from asking somebody else) is to
go back and check that you really did configure the package correctly.
d76 6
a81 4
a segmentation violation or a bus error.\**
.FS
See \*[chsignal], page \*[SIGBUSEGV], for more details.
.FE
d262 7
a268 6
dividing the program into a number of parts, the function \s10\f(CWmain\fR\s0
and the set of functions which \s10\f(CWmain\fR\s0 calls. By single stepping
over the function calls until something blows up, we can establish in which part
the problem occurs. Then we can restart the program and single step through
this function until we find what it calls before dying. This iterative approach
sounds slow and tiring, but in fact it works surprisingly well.
d379 2
a380 1
.\" and it doesn't look too good anyway. Any ideas?
d387 2
a388 2
^C
make: *** [Alloc.o] Interrupt \fIinterrupt make with CTRL-C\f(CW
d908 1
a908 1
Other systems don't even offer this facility. In some cases you can determine
d914 1
a914 1
\s10\f(CWfork\fR\s0. \fItruss\fR does it better than \fIktrace\fR. In extreme
d1183 1
a1183 1
\s10\f(CWfork\fR\s0. \fItruss\fR does it better than \fIktrace\fR. In extreme
@
2.6
log
@Remove date from page headers
Minor mods
@
text
@d2 1
a2 1
.\" $Id: testing.ms,v 2.5 1995”N05ŒŽ17“ú 17:33:19 grog Exp grog $
d4 4
d36 5
a40 5
Well, maybe you're not quite done after all. The program does not work as
advertised. What you do now depends on how much programming experience you
have. If you are a complete beginner, you could be in trouble--about the only
thing you can do (apart from asking somebody else) is to go back and check that
you really did configure the package correctly.
d256 7
a262 7
first approach won't work here. In this case, you can start by conceptually
.\" XXX conceptually?
dividing the program into the set of functions called by \s10\f(CWmain\fR\s0.
We can single step over the function calls until something blows up: then we can
restart the program and single step through this function until we find what it
calls before dying. This iterative approach sounds slow and tiring, but in fact
it works surprisingly well.
d891 22
a959 2
.\" XXX James Cox refers to SGI's 'par' (if I read correctly). Are
.\" we interested?
d971 5
a975 1
vary significantly in their features. We'll look briefly at each.
d1172 2
a1173 10
\s10\f(CWfork\fR\s0 system call. This is invaluable: UNIX packages frequently
start multiple processes to do the work on hand. Frequently enough, the program
that you start does nothing more than spawn a number of other processes and wait
for them to stop. One of the significant disadvantages of the
\s10\f(CWptrace\fR\s0 interface to debugging is that the process needs to be
started by the debugger. Even in SunOS 4, where you can attach the debugger to
a process that is already running, there is no way to monitor it from the start.
In some cases you can determine how the process was started and start it with
the debugger in the same manner. This is not always possible--for example, many
child processes communicate with their parent.
@
2.5
log
@Major mods after Andy's final draft review
@
text
@d2 1
a2 1
.\" $Id: testing.ms,v 2.4 1995”N02ŒŽ22“ú 14:24:21 grog Exp grog $
d4 3
d22 1
a22 1
.St "Testing ($Date: 1995”N02ŒŽ22“ú 14:24:21 $)"
d33 15
a47 7
advertised. There are thousands of possible reasons for the problems you
encounter when you try to run a buggy executable, and lots of good books explain
debugging techniques. In this chapter, we will touch only on aspects of
debugging that relate to porting. First we'll attack a typical, if somewhat
involved, real-life bug and solve it, discussing the pros and cons on the way.
Then we'll look at alternatives to traditional debuggers: kernel and network
tracing.
d162 1
a162 1
A \fIstack trace\fR command answers the question "Where am I, and how did I get
d226 10
a235 9
effectively single steps until you leave the current symbolic line. To add to
the confusion, this is also frequently called \fIsingle stepping\fR. This
command comes in two flavours, depending on how it treats function calls. One
form will execute the function and stop the program at the next line after the
call. The other, more thorough form will stop execution at the first executable
line of the function. It's important to notice the difference between these two
functions: both are extremely useful, but for different things. \fIgdb\fR
performs single line execution omitting calls with the \s10\f(CWnext\fR\s0
command, and includes calls with the \s10\f(CWstep\fR\s0 command.
d242 2
a243 2
Sometimes this method doesn't work well: the process may end up in no-mans-land,
and you see something like:
d252 7
a258 6
first approach won't work here. In this case, you can start by dividing the
program into the set of functions called by \s10\f(CWmain\fR\s0. We can single
step over the function calls until something blows up: then we can restart the
program and single step through this function until we find what it calls before
dying. This iterative approach sounds slow and tiring, but in fact it works
surprisingly well.
d267 1
d350 1
a350 1
In order to compile debugging information, we add the compiler flag
d353 1
a353 1
We have three options about how to set the flags:
d363 7
a369 3
start the compilation with \fImake\fR, but before the compilation completes, we
abort it with \s10\f(CWCTRL-C\fR\s0. Using the \fIxterm\fR copy function, we
copy the compiler invocation to the command line and add the flags we want:
d371 1
a371 1
$ \f(CBrm Alloc.o\f(CW \fI\&remove the old object\f(CW
d374 3
a376 2
gcc -DNO_ASM -fstrength-reduce -fpcc-struct-return -c -I../.. \e
-DNO_AF_UNIX -DSYSV -DSYSV386 -DUSE_POLL Alloc.c
d378 2
d382 31
d428 3
a430 21
.Li
We could change \s10\f(CWCFLAGS\fR\s0 from the \fImake\fR command line. Our
first attempt doesn't work too well, though:
.Ps
$ \f(CBmake CFLAGS=-g\f(CW
rm -f Alloc.o
gcc -DNO_ASM -fstrength-reduce -fpcc-struct-return -c -g Alloc.c
.Pe
.Le
\s10\f(CWCFLAGS\fR\s0 included all the compiler flags except \s10\f(CW-c\fR\s0,
so we need to write:
.Ps
$ \f(CBmake CFLAGS='-g -c -I../.. -DNO_AF_UNIX -DSYSV -DSYSV386 -DUSE_POLL'\f(CW
.Pe
.LP
.XX "Alloc.o"
.XX "Initialize.o"
.XX "Display.o"
We now have a copy of the X Toolkit in which the three files \fIAlloc.o\fR,
\fIInitialize.o\fR, and \fIDisplay.o\fR have been compiled with symbols. Next,
we need to rebuild \fIxterm\fR. That's straightforward enough:
d525 1
a525 1
(gdb) \f(CBf 1\f(CW
d532 9
a540 5
\f(CWXtScreenDatabase\fR, not from \f(CW_MergeOptionTables\fR. Why? At the
moment it's difficult to say for sure, but it's possible that this difference
happened because we removed optimization. In any case, we still have a problem,
so we should fix this one first and then go back and look for the other one if
solving this problem isn't enough.
d609 2
a610 2
It's definitely quicker to look at the instructions than to fight your way
through the thick undergrowth in the X11 source tree.:
d666 1
a666 1
(gdb) \f(CBl 90\f(CW
d705 1
a705 1
different flags, only this time we use the flags \s10\f(CW-E\fR\s0 (stop after
d746 1
a746 1
\s10\f(CW#line\fR\s0 directive before the line indicates.
d1149 4
a1152 4
started by the debugger. Even in SunOS 4, where you can attach to a process
that is already running, there is no way to monitor it from the start. In some
cases you can determine how the process was started and start it with the
debugger in the same manner. This is not always possible--for example, many
@
2.4
log
@Minor mods
@
text
@d2 1
a2 1
.\" $Id: testing.ms,v 2.3 1995”N02ŒŽ03“ú 13:25:50 grog Exp grog $
d4 3
d19 1
a19 1
.St "Testing ($Date: 1995”N02ŒŽ03“ú 13:25:50 $)"
d32 5
a36 5
debugging techniques. In this chapter, we will only touch on aspects of
debugging which relate to porting. First we'll look at debuggers, using a
typical, if somewhat involved, real-life bug and solve it, discussing the pros
and cons on the way. Then we'll look at alternatives to traditional debuggers:
kernel and network tracing.
d40 5
a44 5
test suites are available for others. For other packages again there may be
test suites which were not designed for the package, but which can be used with
it. If there are any tests, you should obviously run them. You might also
consider writing some tests and including them as a target \s10\f(CWtest\fR\s0
in the \fIMakefile\fR.
d48 2
a49 2
development. A program under development still has bugs which prevent it from
funning correctly on any platform, while a ported program has already run
d58 1
a58 1
a segmentation violation or a memory error.\**
d69 1
a69 1
As a result, the program contains bugs which were not in the original versions.
d93 1
a93 1
X11R6) to look at various techniques which you can use to localize and finally
d115 1
a115 1
program into pieces which communicate across this line, so you can see what they
d118 1
a118 1
Of course, I have all these things. In the following sections we'll look at
d124 1
d129 4
a132 4
without wheels--that's a comparable level of technology. The GNU debugger,
\fIgdb\fR, is available on just about every platform you're likely to encounter,
and though it's not perfect, it runs rings around techniques like putting
\fIprintf\fR statements in your programs.
d136 2
a137 2
In UNIX, a debugger is a process which takes control of the execution of another
process. Most versions of UNIX only allow one way for the debugger to take
d169 24
d198 1
a198 1
program counter (the register which contains the address of the next
d209 1
a209 1
generates a hardware interrupt which ultimately causes a \s10\f(CWSIGTRAP\fR\s0
d218 6
a223 31
form will execute the function and stop the program at the instruction after the
call instruction. The other, more thorough form will stop execution at the
first executable line of the function. It's important to notice the difference
between these two functions: both are extremely useful, but for different
things. \fIgdb\fR performs single line execution omitting calls with the
\s10\f(CWnext\fR\s0 command, and includes calls with the \s10\f(CWstep\fR\s0
command.
.Li
.XX "breakpoint, debugger"
.XX "debugger, breakpoint"
\fIbreakpoints\fR stop execution of the process when the process attempts to
execute an instruction at a certain address. \fIgdb\fR sets breakpoints with
the \s10\f(CWbreak\fR\s0 command.
.Li
.XX "watchpoint, debugger"
.XX "debugger, watchpoint"
Many modern machines have hardware support for more sophisticated breakpoint
mechanisms. For example, the i386 architecture can support four hardware
breakpoints on instruction fetch (in other words, traditional breakpoints),
memory read or memory write. These features are invaluable in systems which
support them; unfortunately, UNIX usually does not. \fIgdb\fR simulates this
kind of breakpoint with a so-called \fIwatchpoint\fR. When watchpoints are set,
\fIgdb\fR simulates program execution by single-stepping through the program.
When the condition (for example, writing to the global variable
\s10\f(CWfoo\fR\s0) is fulfilled, the debugger stops the program. This slows
down the execution speed by several orders of magnitude, whereas a real hardware
breakpoint has no impact on the execution speed.\**
.FS
Some architectures slow the overall execution speed slightly in order to test
the hardware registers. The effect is negligible.
.FE
d250 4
a253 4
Let's come back to our \fIxterm\fR program. This time we'll use \fIgdb\fR to
figure out what is going on. We could, of course, look at the core dump, but in
this case we can repeat the problem at will, so we're better off looking at the
live program. We enter:
d277 2
a278 2
case, where we have just built the complete X11 core system, there's every
possibility thhat it is a library bug. As usual, the library was compiled
d302 4
a305 4
but to do that we need to know the fact that the \fIXt\fR functions are in fact
the X toolkit. If we're using GNU \fImake\fR, or if our \fIMakefile\fR
documents directory changes, an alternative would be to go back to our
\fImake\fR log and look for the text \fIXt\fR. If we do this, we quickly find
d317 1
a317 1
which contains \s10\f(CWXtMemmove\fR\s0. There is a possibility that it is
d319 4
a322 1
grep for it:
d324 1
a324 1
$ grep XtMemmove *.c
d431 2
a432 2
cause a bus error on that statement would be an invalid destination address, but
the parameters show that \s10\f(CWdst\fR\s0 appears to be valid.
d444 1
a444 1
a segementation violation or a bus error.
d474 5
a478 7
tell the truth. Debuggers must be able to write to the text segment (to set
breakpoints, for example), and they may either consider it a feature to be able
to change the text segment, or they may not even notice. If the write had
failed, you could have been sure that the address was not writable, but if the
write succeeds, you can't be sure. What we need to know are the exact segment
limits. Some debuggers show you the segment limits, but current versions of
\fIgdb\fR do not. An alternative is the \fIsize\fR command:
d502 1
a502 1
we solving this problem doesn't explain the other problem.
d504 6
a509 5
This alone doesn't tell us very much, except that the destination variable is
called \s10\f(CWtable\fR\s0, and implicitly that \s10\f(CWmemmove\fR\s0 has been
defined as \f(CW_XtMemmove\fR in this source file. We could now look at the
source file in an editor in a different X window, but it's easier to list the
instructions around the current line with the the \f(CWlist\fR command:
d584 4
a587 4
This isn't easy stuff to handle, but it's so typical of what you might find that
we'll pull it apart, instruction for instruction. It's easier to understand
this discussion if you refer to the diagrams of stack structure in \*[chobj],
page \*[complete-stack].
d592 2
a593 2
have seen, this is the second parameter passed to
\s10\f(CW_MergeOptionTables\fR\s0, \s10\f(CWnum_src1\fR\s0.
a649 1
.QS
a656 1
.QQE
d660 1
a660 1
very unlikely that such a primitive bug should not have been discovered earlier.
d700 1
a700 1
certainly a define. It also defines the preprocessor variable
d712 7
a718 4
\fIIntrinsicI.h\fR also contains a number of definitions for XtMemmove, none of
which are used in the current environment, but all of which have the parameter
sequence \s10\f(CW(dst, src, count)\fR\s0. \s10\f(CWbcopy\fR\s0 has the
parameter sequence \s10\f(CW(src, dst, count)\fR\s0.
d720 1
a720 1
Somewhere in here is a lesson to be learnt: this is a real bug which occurred in
d761 1
a761 2
\s10\f(CW__sxg__\fR\s0 and \s10\f(CW_XBCOPYFUNC\fR\s0. It makes the following
decisions:
d782 3
a784 3
\s10\f(CW_XBCOPYFUNC\fR\s0 is only ever defined as \s10\f(CW_XtMemmove\fR\s0,
which does \fInot\fR have the same parameter sequence as
\s10\f(CWbcopy\fR\s0--instead, it has the same parameter sequence as
d790 1
a790 1
.LP
d810 1
a810 1
In other words, this bug is only likely to occur with System V.3 implementations
d846 48
d896 2
d901 9
a909 5
system call and the results that they return. This is a very system-dependent
function, and there are a number of different programs to perform the trace:
\fItruss\fR runs on System V.4, \fIktrace\fR runs on BSD NET/2 and 4.4BSD
derived systems, and \fItrace\fR runs on SunOS 4. They vary significantly in
their features. We'll look briefly at each.
d912 2
a913 2
\fItrace\fR is a relatively primitive tool supplied with SunOS 4 systems: it can
either start a process or attach to an existing process, and it can print
d991 1
a991 1
It can attach to processes which are already running. Optionally, it can also
d995 1
a995 1
translations (translation of file name to inode number), I/O and signal
d1050 1
a1050 1
difference from SunOS: there are no shared libraries--even though each system
d1063 1
a1063 3
It can trace the descendents of the process it is tracing. This is particularly
useful when the bug occurs in large complexes of processes, and you don't even
know which process is causing the problem.
d1065 2
a1066 2
It can attach to processes which are already running. Optionally, it can also
attach to existing children of the processes to which it attaches.
d1070 2
a1071 1
\fIktrace\fR example above, the C library may call a surprising number
d1089 2
a1090 2
For example, if we're interested in the parameters to the \s10\f(CWioctl\fR\s0
call, we can enter:
d1101 1
a1101 1
\*[termios] and \*[TCGETA], for further information.
d1106 5
a1110 5
\s10\f(CWfork\fR\s0 system call. This is invaluable: UNIX program packages
frequently start multiple processes to do the work on hand. Frequently enough,
the program that you start does nothing more than spawn a number of other
processes and wait for them to stop. One of the significant disadvantages of
the \s10\f(CWptrace\fR\s0 interface to debugging is that the process needs to be
d1117 7
a1123 7
Unfortunately, SunOS trace doesn't support tracing through \s10\f(CWfork\fR\s0,
and \fItruss\fR does it better than \fIktrace\fR. In extreme cases (like
debugging a program of this nature on SunOS 4, where there is no support for
trace through \s10\f(CWfork\fR\s0), you might find it an advantage to port to a
different machine running an operating system such as Solaris 2 in order to be
able to test with \fItruss\fR. Of course, Murphy's law says that the bug won't
show up under Solaris 2.
d1127 1
a1127 1
Another interface which we can trace is the network interface. Many processes
d1129 1
a1129 1
communication, they may help us isolate the part of the package which is causing
d1132 1
a1132 1
Two programs support tracing message flow across a network:
d1145 1
a1145 1
BSD systems, it comes a poor second to \fItcpdump\fR.
d1149 2
a1150 1
know about \fItcpdump\fR in [Stevens 94].
@
2.3
log
@Mods after Andy's review
@
text
@d2 1
a2 1
.\" $Id: testing.ms,v 2.2 1995”N01ŒŽ25“ú 14:34:05 grog Exp grog $
d4 3
d16 1
a16 1
.St "Testing ($Date: 1995”N01ŒŽ25“ú 14:34:05 $)"
d311 2
a312 1
.XX "XtMemmove.c"
d444 1
a444 1
(gdb) \f(CBx/6i $eip\f(CW \fI\&show the \f(CW
@
2.2
log
@Minor mods
@
text
@d2 1
a2 1
.\" $Id: testing.ms,v 2.1 1995”N01ŒŽ25“ú 13:34:27 grog Exp grog $
d4 3
d13 1
a13 1
.St "Testing ($Date: 1995”N01ŒŽ25“ú 13:34:27 $)"
d17 1
a17 1
After a brief moment of euphoria, you sit down at the terminal and start the
d27 4
a30 3
debugging which relate to porting. First we'll look at debugging tools, then
we'll take a typical, if somewhat involved, real-life bug and solve it,
discussing the pros and cons on the way.
d39 26
d66 2
d76 1
d90 7
a96 6
which we can conveniently observe, and check which of them is misbehaving. When
we find the piece which is misbehaving, we keep subdividing it further until we
find the bug. The emphasis in this method is on \fIconvenient\fR: it doesn't
necessarily have to make sense. As long as you can continue to divide your
problem area into between two and five parts and localize the problem in one of
the parts, it won't take long to find the bug.
d98 2
a99 2
So what's a convenient way to look at the problems? That depends entirely on
the tools you have at your disposal:
d112 2
a113 1
Of course, we have all these things. We'll look at each of them in more detail.
d115 3
d125 1
a125 2
\fIprintf\fR statements in your programs. You can find a copy on the companion
CD-ROM in the directory \fIgnu/gdb-4.13\fR.
d127 2
a128 2
.XX "PTRACE_ATTACH"
.XX "debugging, attach process"
d130 4
a133 6
process. It uses the system call \s10\f(CWptrace\fR\s0 to start and stop
execution and to read and write data from the process. In most cases, the
process must agree to this treatment, which means that the debugger must start
it. SunOS 4 systems also the \s10\f(CWPTRACE_ATTACH\fR\s0 subcommand of
\s10\f(CWptrace\fR\s0 to take control of a process which has not agreed to such
treatment. This feature is no longer available with SunOS 5 (Solaris 2).
d141 9
d174 2
d189 4
a192 5
first address in the function after the stack trace linkage has been established
(see \*[chobj], page \*[fn-entry-breakpoint]). It's important to notice the
difference between these two functions: both are extremely useful, but for
different things. \fIgdb\fR performs single line execution omitting calls with
the \s10\f(CWnext\fR\s0 command, and includes calls with the \s10\f(CWstep\fR\s0
d195 5
a199 7
\fIbreakpoints\fR, as specified by \s10\f(CWptrace\fR\s0, tell the kernel to
stop execution of the process and send the debugger a \s10\f(CWSIGTRAP\fR\s0
signal when the process attempts to execute an instruction at a certain address.
In traditional architectures, this is achieved by changing the instruction at
the specified location to a system call or an illegal instruction--something
which will generate a processor interrupt. \fIgdb\fR sets breakpoints with the
\s10\f(CWbreak\fR\s0 command.
d201 2
d204 10
a213 8
mechanisms than \s10\f(CWptrace\fR\s0 offers. For example, the i386
architecture can support four hardware breakpoints on instruction fetch (in
other words, traditional breakpoints), memory read or memory write. These
features are invaluable in systems which support them; unfortunately, UNIX
usually does not. \fIgdb\fR simulates this kind of breakpoint with a so-called
\fIwatchpoint\fR, which involves single-stepping through the program. This
slows down the execution speed by several orders of magnitude, whereas a real
hardware breakpoint has no impact on the execution speed.\**
a217 8
.Li
.XX "Where am I?"
.XX "How did I get here?"
A \fIstack trace\fR command answers the question "Where am I, and how did I get
here?", and is almost the most useful of all commands. It's certainly the first
thing you should do when examining a core dump or after getting a signal while
debugging the program. \fIgdb\fR implements this function with the
\s10\f(CWbacktrace\fR\s0 command.
d219 2
a220 8
A couple of properties of debuggers help determine how we subdivide our program
when testing it: the \s10\f(CWbacktrace\fR\s0 command shows our current position
as a hierarchy of functions, and the \s10\f(CWnext\fR\s0 command treats
functions as black boxes. There are two possible approaches when using a
debugger:
.Ls B
.Li
Wait until something goes wrong, then find out where it happened. This is
d223 3
a225 3
.Li
If the process does end up in no-mans-land, you may see something like something
like:
d229 2
a230 2
(gdb) \f(CBbt\f(CW			\fI\&abbreviation for \f(BIbacktrace\f(CW
#0 0x0 in ?? ()		\fI\&nowhere\f(CW
a239 239
.Le
.Ah "Tracing system calls"
An alternative approach is to divide the program between system code and user
code. Most systems have the ability to trace the parameters supplied to each
system call and the results that they return. This is a very system-dependent
function, and there are a number of different programs to perform the trace:
\fItruss\fR runs on System V.4, \fIktrace\fR runs on BSD NET/2 and 4.4BSD
derived systems, and \fItrace\fR runs on SunOS 4. They vary significantly in
their features. We'll look briefly at each.
.Bh "trace"
\fItrace\fR is a relatively primitive tool supplied with SunOS 4 systems:
.Ls B
.Li
It can either start a process or attach to an existing process.
.Li
It can print summary information or a detailed trace.
.Le
In particular, it \fIcannot\fR trace the child of a \s10\f(CWfork\fR\s0 call,
which is a great disadvantage. Here's an example of \fItrace\fR output with a
possibly recognizable program:
.Ps
$ \f(CBtrace hello\f(CW
open ("/usr/lib/ld.so", 0, 040250) = 3
read (3, "".., 32) = 32
mmap (0, 40960, 0x5, 0x80000002, 3, 0) = 0xf77e0000
mmap (0xf77e8000, 8192, 0x7, 0x80000012, 3, 32768) = 0xf77e8000
open ("/dev/zero", 0, 07) = 4
getrlimit (3, 0xf7fff488) = 0
mmap (0xf7800000, 8192, 0x3, 0x80000012, 4, 0) = 0xf7800000
close (3) = 0
getuid () = 1004
getgid () = 1000
open ("/etc/ld.so.cache", 0, 05000100021) = 3
fstat (3, 0xf7fff328) = 0
mmap (0, 4096, 0x1, 0x80000001, 3, 0) = 0xf77c0000
close (3) = 0
open ("/opt/lib/gcc-lib/sparc-sun-sunos".., 0, 01010525) = 3
fstat (3, 0xf7fff328) = 0
getdents (3, 0xf7800108, 4096) = 212
getdents (3, 0xf7800108, 4096) = 0
close (3) = 0
open ("/opt/lib", 0, 056) = 3
getdents (3, 0xf7800108, 4096) = 264
getdents (3, 0xf7800108, 4096) = 0
close (3) = 0
open ("/usr/lib/libc.so.1.9", 0, 023170) = 3
read (3, "".., 32) = 32
mmap (0, 458764, 0x5, 0x80000002, 3, 0) = 0xf7730000
mmap (0xf779c000, 16384, 0x7, 0x80000012, 3, 442368) = 0xf779c000
close (3) = 0
open ("/usr/lib/libdl.so.1.0", 0, 023210) = 3
read (3, "".., 32) = 32
mmap (0, 16396, 0x5, 0x80000002, 3, 0) = 0xf7710000
mmap (0xf7712000, 8192, 0x7, 0x80000012, 3, 8192) = 0xf7712000
close (3) = 0
close (4) = 0
getpagesize () = 4096
brk (0x60d8) = 0
brk (0x70d8) = 0
ioctl (1, 0x40125401, 0xf7ffea8c) = 0
write (1, "Hello, World!\n", 14) = Hello, World!
14
close (0) = 0
close (1) = 0
close (2) = 0
exit (1) = ?
.Pe
What's all this output? All we did was a simple write, but we have performed a
total of 43 system calls. This shows in some detail how much the viewpoint of
the world differs when you're on the other side of the system library. This
program, which was run on a SparcStation 2 with SunOS 4.1.3, first sets up the
shared libraries (the sequences of \s10\f(CWopen\fR\s0, \s10\f(CWread\fR\s0,
\s10\f(CWmmap\fR\s0, and \s10\f(CWclose)\fR\s0, then initializes the
\s10\f(CWstdio\fR\s0 library (the calls to \s10\f(CWgetpagesize\fR\s0,
\s10\f(CWbrk\fR\s0, \s10\f(CWioctl\fR\s0, and \s10\f(CWfstat\fR\s0), and finally
writes to \fIstdout\fR and exits. It also looks strange that it closed
\fIstdin\fR before writing the output text: again, this is a matter of
perspective. The \s10\f(CWstdio\fR\s0 routines buffer the text, and it didn't
actually get written until the process exited, just before closing \fIstdout\fR.
.Bh "ktrace"
\fIktrace\fR is supplied with newer BSD systems. Unlike the other trace
programs, it writes unformatted data to a log file (by default,
\fIktrace.out\fR), and you need to run another program, \fIkdump\fR, to display
the log file. It has the following options:
.Ls B
.Li
It can trace the descendents of the process it is tracing. This is particularly
useful when the bug occurs in large complexes of processes, and you don't even
know which process is causing the problem.
.Li
It can attach to processes which are already running. Optionally, it can also
attach to existing children of the processes to which it attaches.
.Li
It can specify broad subsets of system calls to trace: system calls, namei
translations (translation of file name to inode number), I/O and signal
processing.
.Le
Here's an example of \fIktrace\fR running against the same program:
.Ps
$ \f(CBktrace hello\f(CW
Hello, World!
$ \f(CBkdump\f(CW
 20748 ktrace RET ktrace 0
 20748 ktrace CALL getpagesize
 20748 ktrace RET getpagesize 4096/0x1000
 20748 ktrace CALL break(0xadfc)
 20748 ktrace RET break 0
 20748 ktrace CALL break(0xaffc)
 20748 ktrace RET break 0
 20748 ktrace CALL break(0xbffc)
 20748 ktrace RET break 0
 20748 ktrace CALL execve(0xefbfd148,0xefbfd5a8,0xefbfd5b0)
 20748 ktrace NAMI "./hello"
 20748 hello RET execve 0
 20748 hello CALL fstat(0x1,0xefbfd2a4)
 20748 hello RET fstat 0
 20748 hello CALL getpagesize
 20748 hello RET getpagesize 4096/0x1000
 20748 hello CALL break(0x7de4)
 20748 hello RET break 0
 20748 hello CALL break(0x7ffc)
 20748 hello RET break 0
 20748 hello CALL break(0xaffc)
 20748 hello RET break 0
 20748 hello CALL ioctl(0x1,TIOCGETA,0xefbfd2e0)
 20748 hello RET ioctl 0
 20748 hello CALL write(0x1,0x8000,0xe)
 20748 hello GIO fd 1 wrote 14 bytes
 "Hello, World!
 "
 20748 hello RET write 14/0xe
 20748 hello CALL exit(0xe)
.Pe
This display contains the following information in columnar format:
.Ls B
.Li
The first column shows the process ID of the process.
.Li
The second column shows the name of the program from which the process was
started. We can see that the name changes after the call to
\s10\f(CWexecve\fR\s0.
.Li
The third column show the kind of event. \s10\f(CWCALL\fR\s0 is a system call,
\s10\f(CWRET\fR\s0 is a return value from a system call, \s10\f(CWNAMI\fR\s0 is
a system internal call to the function \s10\f(CWnamei\fR\s0, which determines
the inode number for a pathname, and \s10\f(CWGIO\fR\s0 is a system internal I/O
call.
.Li
The fourth column shows the parameters to the call.
.Le
In this trace, run on an Intel 486 with BSD/OS 1.1, we can see a significant
difference from SunOS: there are no shared libraries--even though each system
call produces two lines of output (the call and the return value), the output is
much shorter.
.Bh "truss"
\fItruss\fR, the System V.4 trace facility, offers the most features:
.Ls B
.Li
It can print statistical information instead of a trace.
.Li
It can display the argument and environment strings passed to each call to
\s10\f(CWexec\fR\s0.
.Li
It can trace the descendents of the process it is tracing. This is particularly
useful when the bug occurs in large complexes of processes, and you don't even
know which process is causing the problem.
.Li
It can attach to processes which are already running. Optionally, it can also
attach to existing children of the processes to which it attaches.
.Li
It can trace specific system calls, signals, and interrupts (called \fIfaults\fR
in System V terminology). This is a very useful feature: as we saw in the
\fIktrace\fR example above, the C library may call a surprising number 
.Le
Here's an example of \fItruss\fR output:
.Ps
$ \f(CBtruss -f hello\f(CW
511: execve("./hello", 0x08047834, 0x0804783C) argc = 1
511: getuid() = 1004 [ 1004 ]
511: getuid() = 1004 [ 1004 ]
511: getgid() = 1000 [ 1000 ]
511: getgid() = 1000 [ 1000 ]
511: sysi86(SI86FPHW, 0x80036058, 0x80035424, 0x8000E255) = 0x00000000
511: ioctl(1, TCGETA, 0x08046262) = 0
Hello, World!
511: write(1, " H e l l o , W o r l d".., 14) = 14
511: _exit(14)
.Pe
If we're interested in the parameters to the \s10\f(CWioctl\fR\s0 call, we can
specify this with the \s10\f(CW-v\fR\s0 (verbose) flag:
.Ps
$ \f(CBtruss -f -v ioctl hello\f(CW
\fI\&...\f(CW
516: ioctl(1, TCGETA, 0x08046262) = 0
516: iflag=0004402 oflag=0000005 cflag=0002675 lflag=0000073 line=0
516: cc: 177 003 010 030 004 000 000 000
.Pe
.Bh "Tracing through fork"
We've seen that \fIktrace\fR and \fItruss\fR can both trace the child of a
\s10\f(CWfork\fR\s0 system call. This is invaluable: UNIX program packages
frequently start multiple processes to do the work on hand. Frequently enough,
the program that you start does nothing more than spawn a number of other
processes and wait for them to stop. One of the significant disadvantages of
the \s10\f(CWptrace\fR\s0 interface to debugging is that the process needs to be
started by the debugger. Even in SunOS 4, where you can attach to a process
that is already running, there is no way to monitor it from the start. In some
cases you can determine how the process was started and start it with the
debugger in the same manner. This is not always possible--for example, many
child processes communicate with their parent.
.LP
Unfortunately, SunOS trace doesn't support tracing through \s10\f(CWfork\fR\s0,
and \fItruss\fR does it better than \fIktrace\fR. In extreme cases (like
debugging a program of this nature on SunOS 4, where there is no support for
trace through \s10\f(CWfork\fR\s0), you might find it an advantage to port to a
different platform (such as Solaris 2) in order to be able to test with
\fItruss\fR. Of course, Murphy's law says that the bug won't show up under
Solaris 2.
.Ah "Tracing network traffic"
Another interface which we can trace is the network interface. Many processes
communicate across the network, and if we have tools to look at this
communication, they may help us isolate the part of the package which is causing
the problem.
.LP
Two programs support tracing message flow across a network:
.Ls B
.Li
\fItcpdump\fR and the \fIBerkeley Packet Filter\fR provide a flexible means of
tracing traffic across Internet domain sockets. It is included on the companion
CD-ROM as \fInet/tcpdump-2.2.1\fR.
.Li
\fItrpt\fR will print a trace from a socket marked for debugging. This function
is available on System V.4 as well, though it is not clear what use it is under
these circumstances, since System V.4 emulates sockets in a library module.
On BSD systems, it comes a poor second to \fItcpdump\fR.
.Le
Tracing net traffic is an unusual approach, and we won't consider it here, but
in certain circumstances it is an invaluable tool. You can find all you need to
know about \fItcpdump\fR in [Stevens 94].
.XX "Stevens, W. Richard"
d241 3
d251 1
a251 1
(gdb) r -display allegro:0	\fI\&run the program\f(CW
d256 2
a257 2
(gdb) bt				\fI\&look back down the stack\f(CW
#0 0x3b0bc in _XtMemmove ()	\fI\&all these functions come from the X toolkit\f(CW
d270 11
a280 17
you had just written, it would probably be a bug in your program. In this case,
where we have just built the complete X11 core system, there's every possibility
thhat it is a library bug. As usual, the library was compiled without debug
information, and without that you hardly have a hope of finding it. Apart from
size constraints, there is no reason why you can't include debugging information
in a library. The object files in libraries are just the same as any others--we
discuss them in detail on page \*[libdef]. If you want, you can build libraries
with debugging information, or you can take individual library routines and
compile them separately. Unfortunately, the size constraints are significant:
without debugging information, the file \fIlibXt.a\fR is about 330 kB long and
contains 53 object files. With debugging information, it might easily reach 20
MB, since all the myriad X11 global symbols would be included with each object
file in the archive. It's not just a question of disk space: you also need
virtual memory during the link phase to accommodate all these symbols. Most of
these files don't interest us anyway: the first one that does is the one that
contains \s10\f(CW_XtMemmove\fR\s0. So we find where it is and compile it alone
with debugging information.
d282 12
d297 3
a299 2
the X toolkit. An alternative would be to go back to our \fImake\fR log and
look for the text \fIXt\fR. If we do this, we quickly find
d303 1
a303 1
	mv Makefile Makefile.bak
d308 1
d319 6
a324 3
So \s10\f(CWXtMemmove\fR\s0 is in \fIAlloc.c\fR. By the same method, we look for
the other functions mentioned in the stack trace and discover that we also need
to recompile \fIInitialize.c\fR and \fIDisplay.c\fR.
d332 5
a336 2
We can modify the \fIMakefile\fR (the modifications will go away at the next
\fImake World\fR, so this is not overly dangerous).
d343 2
a344 2
$ \f(CBrm Alloc.o\f(CW			\fI\&remove the old object\f(CW
$ \f(CBmake\f(CW				\fI\&and start make normally\f(CW
d348 1
a348 1
make: *** [Alloc.o] Interrupt		\fIinterrupt make with CTRL-C\f(CW
d351 1
a351 1
$ \f(CBmake\f(CW				\fI\&run make to build a new library\f(CW
d374 1
a374 1
.IP
d380 4
a383 1
.IP
d388 1
a388 1
$ \f(CBpushd ../../programs/xterm/\f(CW
d398 1
a398 7
.IP
The shell \s10\f(CWpushd\fR\s0 command changes directories, like the
\s10\f(CWcd\fR\s0 command. Unlike the \s10\f(CWcd\fR\s0 command, it doesn't
forget the old directory, and you can change back using \s10\f(CWpushd\fR\s0
without an argument. Since we expect to be back in the library directory again,
this saves us some time.
.Le
d404 1
a404 1
(gdb) \f(CBdir ../../lib/X11\f(CW	\fI\&set source paths\f(CW
d411 1
a411 1
(gdb) \f(CBr\f(CW			\fI\&and run the program\f(CW
d415 1
a415 1
0x3ced6 in _XtMemmove (dst=0x342d8 "ƒ~E003‰~", src=0x41c800 "", length=383) \e
d440 1
a440 1
(gdb) \f(CBx/6i $eip\f(CW		\fI\&show the \f(CW
d471 1
d540 1
a540 1
(gdb) \f(CBp table\f(CW			\fI\&look again, to be sure\f(CW
d542 2
a543 2
(gdb) \f(CBs\f(CW			\fI\&single step into memmove\f(CW
_XtMemmove (dst=0x342d8 "ƒ~E003‰~", src=0x41c800 "", length=384)
d565 1
a565 1
(gdb) \f(CBx/8i $eip\f(CW		\fI\&look at the next 8 instructions\f(CW
d641 1
a641 1
.RS
d649 1
a649 1
.RE
d671 1
a671 1
make: *** [Initialize.o] Interrupt		\fI\&hit CTRL-C\f(CW
d678 2
a679 7
It doesn't really matter what you call the output file unless you use
\fIemacs\fR. \fIemacs\fR recognizes the suffix \fI.c\fR and loads macros for C
programs.
.LP
As you might have guessed, we now look at the file \fIjunk.c\fR with
\fIemacs\fR, though you could use any other editor. We're looking for
\s10\f(CWmemmove\fR\s0, of course. We find a definition in
d710 9
a718 6
This is a real example. It happened with X11R6, Patch level 3. Somewhere in here
is a lesson to be learnt: X11 is one of the most reliable and most portable
software packages available, and yet here we have a really primitive bug. The
reason it has not been found before is doubtless due to the fact that I was
building this version of X11 in an unusual environment (SCO UNIX and GNU libc),
and so the usual assumptions didn't apply.
d755 2
a756 2
If the variable \s10\f(CWX_NOT_STDC_ENV\fR\s0 is \fInot\fR defined, it assumes
ANSI C, unless this is a pre-SVR4 Sun machine.
d758 1
a758 1
Otherwise it checks the variables \s10\f(CWSYSV\fR\s0 (for System V.3),
d781 1
a781 1
.IP
d789 1
a789 1
need to look further for this one.
d837 156
a992 1
.Ah "Summary"
d995 77
a1071 1
Debugging is a black art.
d1073 5
a1077 1
Most bugs in ported software are due to incorrect configuration.
d1079 5
a1083 1
FOO
d1085 4
a1088 1
@
2.1
log
@Minor mods
@
text
@d2 1
a2 1
.\" $Id: testing.ms,v 2.0 1994”N12ŒŽ21“ú 16:58:30 grog Exp grog $
d4 2
d7 1
d10 1
a10 1
.St "Testing ($Date: 1994”N12ŒŽ21“ú 16:58:30 $)"
d79 1
a79 1
.Bh "Symbolic debuggers"
d132 3
a134 3
generates a hardware interrupt which ultimately results in a
\s10\f(CWSIGTRAP\fR\s0 signal to the debugger. \fIgdb\fR performs this function
with the \s10\f(CWstepi\fR\s0 command.
d136 2
a137 2
Nowadays, you won't want to look at single machine instructions until you are in
deep trouble. Instead, you will execute a \fIsingle line\fR instruction, which
d139 6
a144 6
the confusion, this is also frequently called \fIsingle stepping\fR. They come
in two flavours, depending on how they treat function calls. One form will
execute the function and stop the program at the instruction after the call
instruction. The other, more thorough form will stop execution at the first
address in the function after the stack trace linkage has been established (see
\*[chobj], page \*[fn-entry-breakpoint]). It's important to notice the
d164 7
a170 2
\fIwatchpoint\fR. This involves single-stepping through the program, which
slows down the execution speed by several orders of magnitude.
d174 4
a177 4
A \fIstack trace\fR command is almost the most useful of all commands. It's
certainly the first thing you should do when examining a core dump or after
getting a signal while debugging the program. It answers the question "Where am
I, and how did I get here?" \fIgdb\fR implements this function with the
d196 1
a196 1
(gdb) \f(CBbt\f(CW		\fI\&abbreviation for \f(BIbacktrace\f(CW
d208 1
a208 1
.Bh "Tracing system calls"
d216 1
a216 1
.Ch "trace"
d286 1
a286 1
.Ch "ktrace"
d301 2
a302 1
translations, I/O and signal processing.
d304 1
a304 2
Here's an example of \fIktrace\fR running against a possibly recognizable
program:
d361 1
a361 1
.Ch "truss"
d404 1
a404 1
.Ch "Tracing through fork"
d412 4
a415 3
that is already running, there is no way to monitor it from the start. This may
not be possible--sometimes the child process will only work correctly if it is
started by its parent.
d417 8
a424 7
Unfortunately, not all packages support this feature, and \fItruss\fR does it
better than \fIktrace\fR. In extreme cases (like debugging a program of this
nature on SunOS 4, where there is no support for trace through
\s10\f(CWfork\fR\s0), you might find it an advantage to port to a different
platform (such as Solaris 2) in order to be able to test with \fItruss\fR. Of
course, Murphy's law says that the bug won't show up under Solaris 2.
.Bh "Tracing network traffic"
d443 3
a445 1
in certain circumstances it is an invaluable tool.
d459 1
a459 1
(gdb) bt			\fI\&look back down the stack\f(CW
d474 13
a486 13
where we have just built the complete X11 core system, it looks more like a
library bug. As usual, the library was compiled without debug information, and
without that you hardly have a hope of finding it. Apart from size constraints,
there is no reason why you can't include debugging information in a library. As
we discussed on page \*[libdef], the object files in libraries are just the same
as any others. If you want, you can build libraries with debugging information,
or you can take individual library routines and compile them separately.
Unfortunately, the size constraints are significant: without debugging
information, the file \fIlibXt.a\fR is about 330 kB long and contains 53 object
files. With debugging information, it might easily reach 20 MB, since all the
myriad X11 global symbols would be included with each object file in the
archive. It's not just a question of disk space: you also need virtual memory
during the link phase to accommodate all these symbols. Of course, most of
@
2.0
log
@checked in with -k by grog at 1995”N01ŒŽ09“ú 13:22:41
@
text
@d2 1
a2 1
.\" $Id: testing.ms,v 1.25 1994”N12ŒŽ21“ú 16:58:30 grog Exp grog $
a3 2
.\" Revision 1.25 1994”N12ŒŽ21“ú 16:58:30 grog
.\" Revised, mods for bignuts macros
a4 19
.\" Revision 1.24 1994”N11ŒŽ07“ú 17:14:21 grog
.\" Mods after Andy's review
.\"
.\" Revision 1.23 1994”N10ŒŽ17“ú 17:30:06 grog
.\" *** empty log message ***
.\"
.\" Revision 1.22 1994”N09ŒŽ30“ú 17:58:33 grog
.\" Snapshot 30 September 94
.\"
.\" Revision 1.21 1994”N09ŒŽ01“ú 13:30:58 grog
.\" Snapshot 1 September 1994
.\"
.\" Revision 1.20 1994”N08ŒŽ25“ú 17:10:13 grog
.\" Change all names from .roff to .ps, set uniform version number 1.20, minor mods
.\"
.\" Revision 1.1 1994”N08ŒŽ19“ú 12:13:37 grog
.\" Initial revision
.\"
.\"
a15 1
$
d20 4
a23 4
debugging techniques. We can only touch on aspects of debugging which relate to
porting. In this chapter, we'll first look at debugging tools, then we'll take
a typical, if somewhat involved, real-life bug and solve it, discussing the pros
and cons on the way.
d29 2
a30 2
it. If there are any tests, you should obviously run them. Otherwise you
should consider writing some and including them as a target \s10\f(CWtest\fR\s0
d57 3
a59 3
necessarily have to make sense, but if you can divide your problem into between
two and five parts and easily determine where the problem is, it won't take long
to find the bug.
d83 3
a85 3
and though it's not perfect, it runs rings around putting \fIprintf\fR
statements in your programs. You can find a copy on the companion CD-ROM in the
directory \fIgnu/gdb-4.13\fR.
@
</div><div class="naked_ctrl">
<form action="/index.cgi/contrast" method="get" name="gate">
<p><a href="http://altstyle.alfasado.net">AltStyle</a> ‚É‚æ‚Á‚Ä•ÏŠ·‚³‚ê‚½ƒy[ƒW <a href="http://www.lemis.com/grog/Documentation/PUS/testing.ms">(-&gt;ƒIƒŠƒWƒiƒ‹)</a>
/ <label>ƒAƒhƒŒƒX: <input type="text" name="naked_post_url" value="http://www.lemis.com/grog/Documentation/PUS/testing.ms" size="22" /></label> <label>ƒ‚[ƒh: <select name="naked_post_mode">
<option value="default">ƒfƒtƒHƒ‹ƒg</option>
<option value="speech">‰¹ºƒuƒ‰ƒEƒU</option>
<option value="ruby">ƒ‹ƒr•t‚«</option>
<option value="contrast" selected="selected">”zF”½“]</option>
<option value="larger-text">•¶ŽšŠg‘å</option>
<option value="mobile">ƒ‚ƒoƒCƒ‹</option>
</select>
<input type="submit" value="•\Ž¦" />
</p>
</form>
</div>