Author:
Björn Gustavsson <bjorn(at)erlang(dot)org>
Status:
Draft
Type:
Standards Track
Created:
14-Sep-2020
Erlang-Version:
24
Post-History:
14-Oct-2020, 27-Nov-2020

EEP 54: Provide more information about errors #

Abstract #

This EEP proposes a mechanism for reporting more human-readable information about what went wrong when a BIF raises an exception. The same mechanism can be used by libraries or applications to provide more detailed error messages.

Specification #

In OTP 23 and earlier, the shell prints a terse message when a call to a built-in function (BIF) fails:

1> element(a,b).
** exception error: bad argument
     in function  element/2
        called as element(a,b)

The bad argument message informs us that one or more of the arguments to the call were incorrect in some way (in this example, both arguments have wrong types).

We propose a mechanism that enables the shell to print more helpful error messages. Here is how the message would be printed with the reference implementation of this EEP:

1> element(a, b).
** exception error: bad argument
     in function  element/2
        called as element(a,b)
        *** argument 1: not an integer
        *** argument 2: not a tuple

Note that the exact formatting and phrasing of the messages is an implementation detail outside the scope of this EEP. What will be specified here are the APIs and conventions that make these messages possible.

Proposals in this EEP #

  • An extension of the format of the call-stack back trace (stacktrace) format to indicate that there exists extended error information for that call, and a convention for how extended error information can be provided.

  • A new erlang:error/3 BIF to allow libraries and applications to raise an exception with extended error information in the stacktrace.

  • New functions erl_error:format_exception/3 and erl_error:format_exception/4 to allow libraries and applications to format stacktraces in the same style as the shell.

Extending the stacktrace #

The stack back-trace (stacktrace) is currently a list of tuples. For the purpose of this EEP we are only interested in the first entry in the stacktrace. It looks like {Module,Function,Arguments,ExtraInfo}, where ExtraInfo is a list of two-tuples. As an indication that extended error info is available, we propose adding an {error_info,ErrorInfoMap} tuple to ExtraInfo in the first element in the stacktrace.

The map ErrorInfoMap may contain further information about the error or hints on how to handle the error.

Currently, three optional keys have a defined meaning:

  • The value of the key module is a module name of a module that can be called to provide additional information about the error. Default is Module from the stacktrace entry.

  • The value of the key function is the name of the function to be called in the module providing the error information. The default name is format_error.

  • The value of the key cause, if it exists, provides additional information about the error.

To obtain more information about the error, the function named by the values of the module and function keys can be called. Hereafter in this document, for brevity we will call that function format_error/2.

The arguments for format_error/2 are the exception reason (usually badarg for BIF calls) and the stacktrace.

Thus, if a call to element/2 fails with a badarg exception and the first entry in the stacktrace is:

{erlang,element,[1,no_tuple],[{error_info,ErrorInfoMap}]}

and assuming that the stacktrace is bound to the variable StackTrace, the following call will provide provide additional information about the error:

FormatModule = maps:get(module, ErrorInfoMap, erlang),
FormatFunction = maps:get(function, ErrorInfoMap, format_error),
FormatModule:FormatError(badarg, StackTrace)

The format_error/2 function should return a map. For each argument that was in error, there should be a map element with the argument number as the key (that is, 1 for the first argument, 2 for the second, and so on) and a unicode:chardata() term as the value.

The atoms general and reason can also be returned in the map. general represents a generic error that is not attributed for a specific argument (for example badarg of io:format(“Hello”) when the default device is dead). reason will tell the error pretty printer to print the returned string instead of the error reason. The value pointed to by general and reason should be a unicode:chardata() term.

As an example:

Args = [1,no_tuple],
StackTrace = [{erlang, element, Args, [{error_info,Map}]}],
erlang:format_error(badarg, StackTrace)

could return:

#{2 => <<"not a tuple">>}

And:

Args = [0, b],
StackTrace = [{erlang, element, Args, [{error_info,Map}]}],
erlang:format_error(badarg, Entry)

could return:

#{1 => <<"out of range">>, 2 => <<"not a tuple">>}

And:

Args = ["Hello"],
StackTrace = [{io, format, Args, [{error_info,Map}]}],
erlang:format_error(badarg, Entry)

could return:

#{general => "the device has terminated"}

Note that the value for the key cause, if present, in the ErrorInfoMap term is only to be used by format_error/2. The actual value for a particular error could change at any time.

The cause key will typically only be present when an error occurs in a BIF that depends on the internal state in the runtime system (such as register/2 or the ETS BIFs), or for BIFs with complex arguments (such as system_flag/2) that would make it tedious and error prone to figure out which argument was in error.

Here is one way that format_error/2 for the erlang module could be implemented:

format_error(ExceptionReason, [{erlang, F, As, Info} | _]) ->
    ErrorInfoMap = proplists:get_value(error_info, Info, #{}),
    Cause = maps:get(cause, ErrorInfoMap, none),
    do_format_error(F, As, ExceptionReason, Cause).

do_format_error(_, _, system_limit, _) ->
    %% The explanation for system_limit is clear enough, so we don't
    %% need any detailed explanations for the arguments.
    #{};
do_format_error(F, As, _, Cause) ->
    do_format_error(F, As, Cause).

do_format_error(element, [Index, Tuple], _) ->
    Arg1 = if
               not is_integer(Index) ->
                   <<"not an integer">>;
               Index =< 0; Index > tuple_size(Tuple) ->
                   <<"out of range">>;
               true ->
                   []
           end,
    Arg2 = if
               not is_tuple(Tuple) -> <<"not a tuple">>;
               true -> []
           end,
    PotentialErrors = [{1, Arg1}, {2, Arg2}],
    maps:from_list([{ArgNum, Err} ||
                       {ArgNum, Err} <- PotentialErrors,
                       Err =/= []]);

do_format_error(list_to_atom, _, _) ->
    #{1 => <<"not a flat list of characters">>};

do_format_error(register, [Name,PidOrPort], Cause) ->
    [Arg1, Arg2] =
    case Cause of
        registered_name ->
            [[],<<"this process or port already has a name">>];
        notalive ->
            [[],<<"the pid does not refer to an existing process">>];
        _ ->
            Errors =
                [if
                     Name =:= undefined -> <<"'undefined' is not a valid name">>;
                     is_atom(Name) -> [];
                     true -> <<"not an atom">>
                 end,
                 if
                     is_pid(PidOrPort) -> [];
                     is_port(PidOrPort) -> [];
                     true -> <<"not a pid or a port">>
                 end],
            case Errors of
                [[],[]] ->
                    [<<"name is in use">>];
                [_,_] ->
                    Errors
            end,
    PotentialErrors = [{1, Arg1}, {2, Arg2}],
    maps:from_list([{ArgNum, Err} ||
                       {ArgNum, Err} <- PotentialErrors,
                       Err =/= []]);
      .
      .
      .

do_format_error(_, _, _) ->
    #{}.

Note that different strategies for different BIFS are used for determining the extended error information:

  • First the system_limit exception is handled (regardless of the BIF that was called). No extended error information is returned, because the explanation for system_limit is sufficiently clear.

  • If element/2 failed, the format_error/2 function only examines the arguments for element/2.

  • If list_to_atom/1 raised a badarg exception, there is only one possible error reason, so there is no need to examine the arguments.

  • If register/2 BIF fails, the value corresponding to the cause key provides specific error reasons for two of the possible failure reasons. If the reason is not one of the two, format_error/2 will figure out the other reasons based on the arguments.

Supplying extended error information using erlang:error/3 #

A library or application can raise an error exception with extended error information by calling erlang:error(Reason, Arguments, Options). Reason should be the error reason (for example badarg), Arguments should be arguments for the calling function, and Options should be [{error_info,ErrorInfoMap}].

The caller of erlang:error/3 should provide a format_error/2 function (not necessarily with that name if the ErrorInfoMap has a function key) that behaves as described in the previous section.

Formatting stacktraces #

To make it possible for applications and libraries to format stacktraces in the same style as the shell, the functions erl_error:format_exception/3 and erl_error:format_exception/4 are provided. Here is an example how erl_error:format_exception/3 can be used:

try
    .
    .
    .
catch
    C:R:Stk ->
        Message = erl_error:format_exception(C, R, Stk),
        io:format(LogFile, "~ts\n", [Message])
end.

The erl_error:format_exception/4 function is similar but has a fourth option argument to support customizing the message. See the documentation in the reference implementation for details.

Possible future extensions #

Since the error_info tuple in the stacktrace contains a map, more data could be added to the map in future extension of this EEP.

Similarly, since the return value of format_error/2 is a map, additional keys in the map could be assigned a meaning in the future.

For example, the value for the key hint could be a longer message that gives more context or provides concrete advice on how to investigate or avoid the error.

Additional examples #

Let’s look at some examples using ETS:

1> T = ets:new(table, []).
#Ref<0.2290824696.4161404930.5168>
2> ets:update_counter(T, k, 1).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5168>,k,1)
        *** argument 2: not a key that exists in the table

Note that when an error occurs while evaluating an expression entered in the shell, the evaluator process terminates and any ETS tables created by that process are deleted. Thus, calling update_counter a second time with the same arguments results in a different message:

3> ets:update_counter(T, k, 1).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5168>,k,1)
        *** argument 1: the table identifier does not refer to an existing ETS table

Starting over, creating a new ETS table:

4> f(T), T = ets:new(table, []).
#Ref<0.2290824696.4161404930.5205>
5> ets:insert(T, {k,a,0}).
true
6> ets:update_counter(T, k, 1).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5205>,k,1)
        *** argument 3: the value in the given position in the object is not an integer
7> ets:update_counter(T, k, bad).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5205>,k,bad)
        *** argument 1: the table identifier does not refer to an existing ETS table
        *** argument 3: not a valid update operation

Motivation #

When a call to a BIF fails with the reason badarg it is not always obvious even to an experienced developer exactly which argument was “bad” and in which way. For a newcomer, having to figure out what a badarg means is another stumbling block standing in the way of mastering a new language.

Even for an experienced developer, figuring out the reason for a badarg exception for some BIFs is hard or impossible. For example, the documentation for ets:update_counter/4 at the time of writing lists 8 situations in which ets:update_counter/4 will fail. That number is too low. Missing from list are, for example, reasons such as the ETS table having been deleted or having insufficient access right.

The general return key was added in order to allow information to be given about the default I/O device in io:format(“hello”). It also allows third-party error_report implementations (such as Elixir) much more freedom in what they can return.

The reason return key was added in order for third-party error_report implementations (such as Elixir) to influence what is printed to describe the actual error.

Rationale #

Why not change badarg to something more informational? #

An alternative way to provide more information about errors would be to introduce additional exception reasons. For example, the call:

element(a, b)

could raise the exception:

{badarg,[{1,not_integer},{2,not_tuple}]}

That change could break code that expects that BIFs should raise a badarg exception. It is less likely that existing code would match the fourth entry in the stacktrace.

A related reason is the amount of work needed to revise the error handling code for all built-in functions. Implementing building of Erlang terms in C is tedious and error prone. There would always be a risk that bugs in that code would crash the runtime system when an error occurred. The test suite would have to be extremely thorough to ensure that all bugs were found, because error handling code is typically infrequently executed.

Why can’t the stacktrace contain the complete error reason? #

We did consider modifying the implementation of all BIFs so that they would produce complete error information in the stacktrace when they failed. However, as mentioned earlier, building Erlang terms in C is tedious and error prone.

With the approach we have taken to let Erlang code do most of the analysis of the error reason, there is a much lower risk that error handling would crash the application or runtime system.

Why is the map key named cause and not reason? #

To avoid confusion with the exception reason.

Why is the value for cause in the ErrorInfoMap undocumented? #

The reason in the ErrorInfoMap is not meant to be used for programmatically figuring out why an error occurred, but only to be used by Module:format_error/2 to produce a human-readable message.

Also, for many BIFs the cause key will not be present, as the Module:format/4 function will produce the messages based solely on the name of the BIF and its arguments.

How is the module key useful? #

  • In OTP all format_error/2 functions will be in modules separate from the implementing modules to facilitate reducing the size of OTP in systems with tight storage constraints. Having the module key avoids the need for having a redirecting format_error/2 function in the implementing module.

  • A library or application may want to have a single module that implements format_error/2 for multiple modules. For example, in OTP we might have the module erl_stdlib_errors that implements format_error/2 for the modules binary, ets, lists, maps, string, and unicode.

How is the function key useful? #

  • A module could already have a function named format_error/2.

  • In the future we might want to extend the compiler to generate its own format_error/2 error function to provide more information about badmatch or function_clause errors.

Backwards Compatibility #

All exceptions from BIFs will now have a ExtraInfo element (called Location in the documentation for OTP 23) in the call-stack back trace (stacktrace) that includes an error_info tuple. In previous releases the ExtraInfo element would be an empty list for a failed BIF call.

Applications that explicitly do matching on the stacktrace and do assumptions of the layout of the ExtraInfo element (for example, assuming that Location is either an empty list or a list of file and line tuples in a specific order) may need modifications. Note that such assumptions have never been safe and that the documentation for error handling strongly discourages developers to rely on stacktrace entries for purposes other than debugging.

Implementation #

The reference implementation includes extended error information for most BIFs implemented in C in the erlang and ets modules. It can be found in PR #2849.

Copyright #

This document has been placed in the public domain.