Core File Creation and Debugging
initfpu-1.0.tar.gz
Background
When doing numerics you often create programs that produce invalid
results like
nan or
inf due to bugs in your
code. It would then be very nice if your program would dump core on
the first occurrence of such an invalid result instead of continue
happily to run and complete ''successfully'' writing out hundreds
or thousands of lines of complete nonsense.
You can achieve this -- if your operating system does not do so by
default -- by setting the relevant FPU flags such that
invalid instructions or results generate FPU exceptions, which will
raise the FPE signal, which (if not handled otherwise) will make
your program dump core and exit. This core file together with the
executable itself can then be used to examine the program and find
out what went wrong.
Core File Creation
Even though the following code fragments are written in C, fortran
programmers can also use them - they may call C code out of their
fortran programs if the calling conventions are respected. We will discuss
that later.
Implementation
The following C fragments implement this for Linux and FreeBSD. While
setting the relevant FPU flags is OS-dependant, the signal stuff is
portable.
Setting the relevant FPU Flags
- Linux,
glibc>=2.2
#define _GNU_SOURCE 1
#include <fenv.h>
int excepts= 0
//| FE_INEXACT // inexact result
| FE_DIVBYZERO // division by zero
//| FE_UNDERFLOW // result not representable due to underflow
| FE_OVERFLOW // result not representable due to overflow
| FE_INVALID // invalid operation
;
feenableexcept(excepts);
Probably you need to link the math library explicitly, -lm.
- FreeBSD
#include <ieeefp.h>
fp_except_t excepts=0
| FP_X_INV /* invalid operation */
| FP_X_DNML /* denormal */
| FP_X_DZ /* zero divide */
| FP_X_OFL /* overflow */
//| FP_X_UFL /* underflow */
//| FP_X_IMP /* (im)precision */
| FP_X_STK /* stack fault */
;
fpsetmask(excepts);
Re-installing the default Signal Handler
Unfortunately, when using fortran, the standard fortran runtimes will catch
the FPE signal
and you do not get the desired behaviour of core dump and exit. Then
it may help to re-install the default signal handler of the FPE signal.
#include <signal.h>
signal(SIGFPE, SIG_DFL);
Applications
To ease the use of this stuff, I put it all into a library (
download). Autoconf takes care about the right fortran name mangling scheme.
For installation instructions please refer to the source package. Here I want to discuss shortly the usage of the library.
Below the demo program
initfpu_demo.c is listed. It sets the FPU flags using this library and then performs an illegal computation (logarithm of a negative number).
#include <math.h>
#include <stdio.h>
#include <initfpu.h>
int main() {
initfpu();
double d=-1.0;
printf("%f\n", log(d));
return 0;
}
Normally (without setting the FPU flags) this operation executes without an error and the result
nan is produced. This is exactly what we do not want.
Compile, execute and debug the program:
$ gcc -g -o initfpu_demo initfpu_demo.c -I. -L. -linitfpu -lm
$ ./initfpu_demo
Floating point exception (core dumped)
$ gdb initfpu_demo initfpu_demo.core
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
[...]
Core was generated by initfpu_demo.
Program terminated with signal 8, Arithmetic exception.
[...]
What to do with this core file: see below.
Compare this to the default behaviour if you comment out
the line containing the initfpu call:
$ vi initfpu_demo.c
$ gcc -g -o initfpu_demo initfpu_demo.c -I. -L. -linitfpu -lm
$ ./initfpu_demo
nan
$
Using fortran, the demo program
initfpu_demo_f.f looks like:
program initfpu_demo_f
implicit none
real*8 d
call initfpu
d=-1d0
write (6,*) dlog(d)
end program
Configuring the shell's ulimit
Usually linux distrubitions are configured such that the generation
of core files is supressed. This happens in
/etc/profile
or similar with a statement like
ulimit -c 0. You need
to comment this out or replace it by something like
ulimit -c unlimited.
Only root is allowed to
increase
this limit. Therefore, a user cannot increase it; the system must
be configured such that the supression if core file generation is
never configured.
This limit
is inherited by every process from its parent. Therefore, if you for
example start your jobs remotely via
ssh, you need to make
sure that the remote
sshd has the correct
ulimit setting, which can usually only be configured by
changing the system configuration as root.
Core File Debugging
Now that we have our core file, we want to debug it. Therefore, be sure your program has been compiled with the
-g option to include debug information in the executable.
In the following we demonstrate what to do with the core file once our program died and dumped core. We load it (together with the binary that created the core file) into a debugger and examine it:
dominik@daemon ~/initfpu-1.0$ ./initfpu_demo
Floating point exception (core dumped)
dominik@daemon ~/initfpu-1.0$ gdb initfpu_demo initfpu_demo.core
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...
Core was generated by initfpu_demo.
Program terminated with signal 8, Arithmetic exception.
Reading symbols from /lib/libm.so.3...done.
Loaded symbols for /lib/libm.so.3
Reading symbols from /lib/libc.so.5...done.
Loaded symbols for /lib/libc.so.5
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
#0 0x080485d5 in main () at initfpu_demo.c:38
38 printf("%f\n", log(d));
(gdb) where
#0 0x080485d5 in main () at initfpu_demo.c:38
(gdb) list
38 printf("%f\n", log(d));
39 return 0;
40 }
There are some interesting pieces of information in this output.
-
First, the debugger tells us why the program died:
Program terminated with signal 8, Arithmetic exception.
- Second, it tells us where:
#0 0x080485d5 in main () at initfpu_demo.c:38
which means line 38 of the file initfpu_demo.c.
- If you have more nested function calls, you can examine the stack
with the command
where.
- With the command
list you can view the source code.
Impressum
Contact the
author.