ETM -- A Program Exception and Termination Manager Paul DuBois dubois@primate.wisc.edu Wisconsin Regional Primate Research Center Revision date: 10 April 1997 1. Introduction This document describes Exception and Termination Manager (ETM), a simple(-minded) library to manage exceptional conditions that arise during program execution, and to provide for orderly program shutdown. There are at least a couple of approaches one may adopt for handling error conditions within an application: - Have functions always return a value and have all callers test the return value and respond accordingly. - Force the program to give up and exit early. Each approach has strengths and weaknesses. A difficulty with the first is that actions composed of many subsidiary actions, each of which may themselves succeed or fail, can easily become very unwieldy when an attempt is made to handle all possible outcomes. However, such a program will also continue in the face of extreme adversity. An advantage of the second approach is that it is, conceptually at least, simpler to let a program die when a serious error occurs. The difficulty lies in making sure the program cleans up and shuts down properly before it exits. This can be a problem especially when a program uses a number of independent modules which can each encounter exceptional conditions and need to be shut down, and which may know nothing of each other. ETM is designed to alleviate the difficulties of this second approach. The general architecture assumed for this discussion is that of an application which uses zero or more subsystems which may be more or less independent of each other, and which may each require initialization and/or termination. Also, other application-specific initialization and/or termination actions may need to be performed which are unrelated to those of the subsystems, e.g., temporary files created at the beginning of the application need to be removed before final termination, network connections need to be shut down, terminal state needs to be restored. Ideally, when an application executes normally, it will initialize, perform the main processing, then shut down in an orderly fashion. This does not always occur. Exceptional conditions may be detected which necessitate a "panic" (an immediate program exit) because processing cannot continue further, or because it is judged too burdensome to try to continue. An individual subsystem may be easily written such that a panic within itself causes its own shutdown code to be invoked. It is more difficult to arrange for other subsystems to be notified of the panic so that they can shut down as well, since the subsystem in which the panic occurs may not even know about them. An additional difficulty is that some exceptions may occur for reasons not related to algorithmically detectable conditions. For instance, the user of an application may cause a signal to be delivered to it at any time. This has nothing to do with normal execution and cannot be predicted. The goals of ETM are thus twofold: (1) Panics triggered anywhere within an application or any of its subsystems should cause orderly shutdown of all subsystems and the application itself. (2) Signals that normally terminate a program should be caught and trigger a panic to allow shutdown as per (1). 2. Processing Model The model used by ETM is that the application initializes subsystems in the order required by any dependencies among them, and then terminates them in the reverse order. The presumption here is that if subsystem ss2 is dependendent upon subsystem ss1, then ss1 should be initialized first and terminated last; the dependency is unlikely to make it wise to shut down ss1 before ss2. ETM must itself be initialized before any other subsystem which uses it. The initialization call, ETMInit(), takes as an argument a pointer to a routine which performs any application-specific cleanup not related to its subsystems, or NULL if there is no such routine. Each of the subsystems should then be initialized. A subsystem's initialization routine should call ETMAddShutdownProc() to register its own shutdown routine with ETM, if there is one. (Some subsystems may require no explicit initialization or termination. However, if there is a shutdown routine, you should at least call ETMAddShutdownProc() to register it.) When the program detects an exceptional condition, it calls ETMPanic() to describe the problem and exit. ETMPanic() is also called automatically when a signal is caught. A message is printed, and all the shutdown routines that have been registered are automatically executed, including the application-specific one. ETM is designed to handle shutting down under unusual circumstances, but it also works well for terminating normally. Instead of calling ETMPanic(), the application calls ETMEnd(). This is much like calling ETMPanic(), except that no error message is printed, and ETMEnd() returns to the caller. which takes care of calling all the shutdown routines that have been registered. It is evident that the functionality provided by ETM is somewhat like that of the atexit() routine provided on some systems. Some differences between the two are: - atexit() is either built in or not available. ETM can be put on any system to which it can be ported (extent unknown, but includes at least SunOS, Ultrix, Mips RISC/os and THINK C). - ETM is more suited for handling exceptional conditions. - ETM shutdown routines can be installed and removed later. atexit() provides only for installation (although you could simulate removal by setting a flag which shutdown routines examine to see whether to execute or not). Here is a short example of how to set up and shut down using ETM. main () { . . . ETMInit (Cleanup); /* register application-specific cleanup */ SS1Init (); /* registers SS1End() for shutdown */ SS2Init (); /* registers SS2End() for shutdown */ SS3Init (); /* registers SS3End() for shutdown */ ... main processing here ... ETMEnd (); /* calls SS3End (), SS2End () and SS1End () */ exit (0); } Subsystems that are themselves built on other subsystems may follow this model, except that they would not call ETMInit() or ETMEnd(). If there is no special initialization or shutdown activity, and you don't care about catching signals, it is not necessary to call ETMInit() and ETMEnd(). The application may still call ETMPanic() to print error messages and terminate. (Even if the application does use ETMInit() and ETMEnd(), it is safe to call ETMPanic() before any initialization has been done, because nothing needs to be shut down at that point yet.) If ETM itself encounters an exceptional condition (e.g., it cannot allocate memory when it needs to), it will--of course--trigger a panic. This should be rare, but if it occurs, ETM will generate a message indicating what the problem was. 3. Caveats Shutdown routines shouldn't call ETMPanic(), since ETMPanic() causes shutdown routines to be executed. ETM detects loops of this sort, but their occurrence indicate a flaw in program logic. Similarly, if you install a print routine to redirect ETM's output somewhere other than stderr, the routine shouldn't call ETM to print any messages. kill -9 is uncatchable and there's nothing you can do about it. 4. Programming Interface The ETM library should be installed in /usr/lib/libetm.a or local equivalent, and applications should link in the ETM library with the -letm flag. Source files that use ETM routines should include etm.h. If you use ETM functions in a source file without including etm.h, you will get undefined symbol errors at link time. The abstract types ETMProcRetType and ETMProcPtr may be used for declaring and passing pointers to functions that are passed to ETM routines. By default these will be void and void(*)(), but on deficient systems with C compilers lacking void pointers they will be int and int(*)(), the usual C defaults for functions. These types make it easier to declare properly typed functions and NULL pointers. For instance, if you don't pass any shutdown routine to ETMInit(), use ETMInit ((ETMProcPtr) NULL); If you do, use ETMProcRetType ShutdownProc () { . . . } . . . main () { . . . ETMInit (ShutdownProc); . . . } Descriptions of the ETM routines follow. ETMProcRetType ETMInit (p) ETMProcPtr p; Registers the application's cleanup routine p (which should be NULL if there is none) and registers default handlers for the following signals (all of which normally cause program exit): SIGHUP, SIGINT, SIGQUIT, SIGILL, SIGSYS, SIGTERM, SIGBUS, SIGSEGV, SIGFPE, SIGPIPE. If p is not NULL, it should point to a routine that takes no arguments and returns no value. ETMProcRetType ETMEnd () Causes all registered shutdown routines to be executed. The application may then exit normally with exit(0). ETMProcRetType ETMPanic (fmt, ...) char *fmt; ETMPanic() is called when a panic condition occurs, and the program cannot continue. The arguments are as those for printf() and are used to print a message after shutting down all subsystems and executing the application's cleanup routine, and before calling exit(). ETMPanic() adds a newline to the end of the message. ETMPanic() may be called at any time, including prior to calling ETMInit(), but only those shutdown routines which have been registered are invoked. A common problem with applications that encounter exceptional conditions such as segmentation faults is that you often don't see all the output your application has produced. This is because stdout is often buffered. To alleviate this problem, stdout is flushed before any message is printed, so that any pending application output is flushed and appears before the error message. By default, ETMPanic() prints the message on stderr. This behavior may be modified with ETMSetPrintProc(). The default exit() value is 1. This may be modified with ETMSetExitStatus(). ETMProcRetType ETMMsg (fmt, ...) char *fmt; ETMMsg() is like ETMPanic() except that it just prints the message and returns. It is useful in that if panic message output has been redirected somewhere other than stderr (e.g., to the system log), ETMMsg() will write its output there, too. The application does not need to know whether such redirection has taken place. ETMMsg() may be called at any time, including prior to calling ETMInit(). ETMProcRetType ETMAddShutdownProc (p) ETMProcPtr p; Register a shutdown routine with ETM. This is normally called within a subsystem's initialization routine. p should point to a routine that takes no arguments and returns no value. ETMProcRetType ETMRemoveShutdownProc (p) ETMProcPtr p; Deregister a previously-registered shutdown routine with ETM. This is useful for routines that only need to be registered temporarily, e.g., during execution of some piece of code that temporarily creates some file that needs to be removed if the program crashes, but which removes it itself if execution proceeds normally. ETMProcRetType ETMSetSignalProc (signo, p) int signo; ETMProcPtr p; Register a signal-catching routine to override ETM's default. The routine will be called with one argument, the signal number. It should return no value, regardless of the usual return type of signal handler routines on your system. (When ETM is configured on your system, it knows the proper return value for signal() but hides differences among systems from your application so you don't have to think about it.) To return a signal to its default action or to cause a signal to be ignored, pass the following values for p (these are defined in etm.h): ETMSigIgnore signal is ignored ETMSigDefault signal default action is restored ETMProcPtr ETMGetSignalProc (signo) int signo; Returns the function current used to catch signal signo, or NULL if the signal is handled with the default action or being ignored (it's not possible to distinguish between the last two cases). ETMProcRetType ETMSetPrintProc (p) ETMProcPtr p; This routine is used to register a procedure that ETM can use to print messages. The default is to send messages to stderr, which is appropriate for most programs. Applications may prefer to send messages elsewhere. For instance, non-interactive programs like network servers might send them to syslog() instead. Or a program may wish to send messages to multiple destinations. To override the default, pass the address of an alternate print routine to ETMSetPrintProc(). The routine should take one argument, a pointer to a character string, and return no value. The argument will be the fully formatted panic message, complete with a newline on the end. To restore the default, pass NULL. The printing routine shouldn't call ETMPanic() or ETMMsg() or a loop will be detected and ETM will conveniently panic as a service to let you know you have a logic error in your program. ETMProcPtr ETMGetPrintProc () Returns a pointer to the current printing function, NULL if the default is being used. ETMProcRetType ETMSetExitStatus (status) int status; This routine is used to register the status value that is passed to exit() when a panic occurs. The default is 1. For some applications it is desirable to return a different value. For instance, a mail server that processes messages may send back a message to the person who sent mail when a request is erroneous, then panic (perhaps by writing a message to the system log). On some systems, if a program invoked to handle mail returns non-zero, the mailer will send another message to that person stating that there was a problem handling the mail. This extra message is unnecessary, and can be suppressed by registering an exit status of 0. If ETMSetAbort() has been called to force an abort() on a panic, the exit status is not returned. int ETMGetExitStatus () Returns the current exit status which will be returned if a panic occurs. ETMProcRetType ETMSetAbort (val) int val; Calling this function with a non-zero value of val causes ETM to try to generate a core image when ETMPanic() is called (after the panic message is printed). This can sometimes be useful for debugging. If val is zero, image generation is suppressed. The default is no image. ETMSetAbort() is meaningless on systems with no concept of a core image. Also, if you install a signal catcher for SIGABRT, you may end up in a panic loop. int ETMGetAbort () int val; Return current image generation value.