Theory on error handling?

后端 未结 14 677
情歌与酒
情歌与酒 2021-01-29 18:28

Most advice concerning error handling boils down to a handful of tips and tricks (see this post for example). These hints are helpful but I think they don\'t answer all question

相关标签:
14条回答
  • 2021-01-29 18:35

    Disclaimer: I do not know any theory on error-handling, I did, however, thought repetitively about the subject as I explored various languages and programming paradigms, as well as toyed around with programming language designs (and discussed them). What follows, thus, is a summary of my experience so far; with objective arguments.

    Note: this should cover all the questions, but I did not even try to address them in order, preferring instead a structured presentation. At the end of each section, I present a succinct answer to those questions it answered, for clarity.


    Introduction

    As a premise, I would like to note that whatever is subject to discussion some parameters must be kept in mind when designing a library (or reusable code).

    The author cannot hope to fathom how this library will be used, and should thus avoid strategies that make integration more difficult than it should. The most glaring defect would be relying on globally shared state; thread-local shared state can also be a nightmare for interactions with coroutines/green-threads. The use of such coroutines and threads also highlight that synchronization best be left to the user, in single-threaded code it will mean none (best performance), whilst in coroutines and green-threads the user is best suited to implement (or use existing implementations of) dedicated synchronization mechanisms.

    That being said, when library are for internal use only, global or thread-local variables might be convenient; if used, they should be clearly documented as a technical limitation.


    Logging

    There are many ways to log messages:

    • with extra information such as timestamp, process-ID, thread-ID, server name/IP, ...
    • via synchronous calls or with an asynchronous mechanism (and an overflow handling mechanism)
    • in files, databases, distributed databases, dedicated log-servers, ...

    As the author of a library, the logs should be integrated within the client infrastructure (or turned off). This is best provided by allowing the client to provide hooks so as to deal with the logs themselves, my recommendation is:

    • to provide 2 hooks: one to decide whether to log or not, and one to actually log (the message being formatted and the latter hook called only when the client decided to log)
    • to provide, on top of the message: a severity (aka level), the filename, line and function name if open-source or otherwise the logical module (if several)
    • to, by default, write to stdout and stderr (depending on severity), until the client explicitly says not to log

    I would note that, following the guidelines delineated in the introduction, synchronization is left to the client.

    Regarding whether to log errors: do not log (as errors) what you otherwise already report via your API; you can however still log at a lower severity the details. The client can decide whether to report or not when handling the error, and for example choose not to report it if this was just a speculative call.

    Note: some information should not make it into the logs and some other pieces are best obfuscated. For example, passwords should not be logged, and Credit-Card or Passport/Social Security Numbers are best obfuscated (partly at least). In a library designed for such sensitive information, this can be done during logging; otherwise the application should take care of this.

    Is logging something that should only be done in application code? Or is it ok to do some logging from library code.

    Application code should decide the policy. Whether a library logs or not depends on whether it needs to.


    Going on after an error ?

    Before we actually talk about reporting errors, the first question we should ask is whether the error should be reported (for handling) or if things are so wrong that aborting the current process is clearly the best policy.

    This is certainly a tricky topic. In general, I would advise to design such that going on is an option, with a purge/reset if necessary. If this cannot be achieved in certain cases, then those cases should provoke an abortion of the process.

    Note: on some systems, it is possible to get a memory-dump of the process. If an application handles sensitive data (password, credit-cards, passports, ...), it is best deactivated in production (but can be used during development).

    Note: it can be interesting to have a debug switch that transforms a portion of the error-reporting calls into abortions with a memory-dump to assist debugging during development.


    Reporting an error

    The occurrence of an error signifies that the contract of a function/interface could not be fulfilled. This has several consequences:

    • the client should be warned, which is why the error should be reported
    • no partially correct data should escape in the wild

    The latter point will be treated later on; for now let us focus on reporting the error. The client should not, ever, be able to accidentally ignore this report. Which is why using error-codes is such an abomination (in languages when return values can be ignored):

    ErrorStatus_t doit(Input const* input, Output* output);
    

    I know of two schemes that require explicit action on the client part:

    • exceptions
    • result types (optional<T>, either<T, U>, ...)

    The former is well-known, the latter is very much used in functional languages and was introduced in C++11 under the guise of std::future<T> though other implementations exist.

    I advise to prefer the latter, when possible, as it easier to fathom, but revert to exceptions when no result is expected. Contrast:

    Option<Value&> find(Key const&);
    
    void updateName(Client::Id id, Client::Name name);
    

    In the case of "write-only" operations such as updateName, the client has no use for a result. It could be introduced, but it would be easy to forget the check.

    Reverting to exceptions also occur when a result type is impractical, or insufficient to convey the details:

    Option<Value&> compute(RepositoryInterface&, Details...);
    

    In such a case of externally defined callback, there is an almost infinite list of potential failures. The implementation could use the network, a database, the filesystem, ... in this case, and in order to report errors accurately:

    • the externally defined callback should be expected to report errors via exceptions when the interface is insufficient (or impractical) to convey the full details of the error.
    • the functions based on this abstract callback should be transparent to those exceptions (let them pass, unmodified)

    The goal is to let this exception bubble up to the layer where the implementation of the interface was decided (at least), for it's only at this level that there is a chance to correctly interpret the exception thrown.

    Note: the externally defined callback is not forced to use exceptions, we should just expect it might be using some.


    Using an error

    In order to use an error report, the client need enough information to take a decision. Structured information, such as error codes or exception types, should be preferred (for automatic actions) and additional information (message, stack, ...) can be provided in a non-structured way (for humans to investigate).

    It would be best if a function clearly documented all possible failure modes: when they occur and how they are reported. However, especially in case arbitrary code is executed, the client should be prepared to deal with unknown codes/exceptions.

    A notable exception is, of course, result types: boost::variant<Output, Error0, Error1, ...> provides a compiler-checked exhaustive list of known failure modes... though a function returning this type could still throw, of course.

    How to decide between logging an error, or showing it as an error message to the user?

    The user should always be warned when its order could not be fulfilled, however a user-friendly (understandable) message should be displayed. If possible, advices or work-arounds should be presented as well. Details are for investigating teams.


    Recovering from an error ?

    Last, but certainly not least, comes the truly frightening part about errors: recovery.

    This is something that databases (real ones) are so good for: transaction-like semantics. If anything unexpected occurs, the transaction is aborted as if nothing had happened.

    In the real world, things are not simple. The simple example of cancelling an e-mail sent pops to mind: too late. Protocols may exist, depending on your application domain, but this is out of this discussion. The first step, though, is the ability to recover a sane in-memory state; and that is far from being simple in most languages (and STM can only do so much today).

    First of all, an illustration of the challenge:

    void update(Client& client, Client::Name name, Client::Address address) {
        client.update(std::move(name));
        client.update(std::move(address)); // Throws
    }
    

    Now, after updating the address failed, I am left with a half-updated client. What can I do ?

    • attempting to undo all the updates that occurred is close to impossible (the undo might fail)
    • copying the state prior to executing any single update is a performance hog (supposing we can even swap it back in a sure way)

    In any case, the book-keeping required is such that mistakes will creep in.

    And worst of all: there is no safe assumption that can be made as to the extent of the corruption (except that client is now botched). Or at least, no assumption that will endure time (and code changes).

    As often, the only way to win is not to play.


    A possible solution: Transactions

    Wherever possible, the key idea is to define macro functions, that will either fail or produce the expected result. Those are our transactions. And their form is invariant:

    Either<Output, Error> doit(Input const&);
    
    // or
    
    Output doit(Input const&); // throw in case of error
    

    A transaction does not modify any external state, thus if it fails to produce a result:

    • the external world has not changed (nothing to rollback)
    • there is no partial result to observe

    Any function that is not a transaction should be considered as having corrupted anything it touched, and thus the only sane way of dealing with an error from non-transactional functions is to let it bubble up until a transaction layer is reached. Any attempt to deal with the error prior is, in the end, doomed to fail.

    How to decide if an error should be handled locally or propagated to higher level code ?

    In case of exceptions, where should you generally catch them? In low-level or higher level code?

    Deal with them whenever it is safe to do so and there is value in doing so. Most notably, it's okay to catch an error, check if it can be dealt with locally, and then either deal with it or pass it up.


    Should you strive for a unified error handling strategy through all layers of code, or try to develop a system that can adapt itself to a variety of error handling strategies (in order to be able to deal with errors from 3rd party libraries).

    I did not address this question previously, however I believe it is clear than the approach I highlighted is already dual since it consists of both result-types and exceptions. As such, dealing with 3rd party libraries should be a cinch, though I do advise wrapping them anyway for other reasons (3rd party code is better insulated beyond a business-oriented interface tasked with the impedance adaption).

    0 讨论(0)
  • 2021-01-29 18:41

    Here is an awesome blog post which explains how error handling should be done. http://damienkatz.net/2006/04/error_code_vs_e.html

    How to decide if an error should be handled locally or propagated to higher level code? Like Martin Becket says in another answer, this is a question of whether the error can be fixed here or not.

    How to decide between logging an error, or showing it as an error message to the user? You should probably never show an error to the user if you think so. Rather, show them a well formed message explaining the situation, without giving too much technical information. Then log the technical information, especially if it is an error while processing input. If your code doesn't know how to handle faulty input, then that MUST be fixed.

    Is logging something that should only be done in application code? Or is it ok to do some logging from library code. Logging in library code is not useful, because you may not even have written it. However, the application could log interaction with the library code and even through statistics detect errors.

    In case of exceptions, where should you generally catch them? In low-level or higher level code? See question one.

    Similar question: at what point should you stop propagating an error and deal with it? See question one.

    Should you strive for a unified error handling strategy through all layers of code, or try to develop a system that can adapt itself to a variety of error handling strategies (in order to be able to deal with errors from 3rd party libraries). Throwing exceptions is an expensive operation in most heavy languages, so use them where the entire program flow is broken for that operation. On the other hand, if you can predict all outcomes of a function, put any data through a referenced variable passed as parameter to it, and return an error code (0 on success, 1+ on errors).

    Does it make sense to create a list of error codes? Or is that old fashioned these days? Make a list of error codes for a particular function, and document it inside it as a list of possible return values. See previous question as well as the link.

    0 讨论(0)
  • 2021-01-29 18:46

    A couple of years ago I thought exactly about the same question :)

    After searching and reading several things, I think that the most interesting reference I found was Patterns for Generation, Handling and Management of Errors from Andy Longshaw and Eoin Woods. It is a short and systematic attempt to cover the basic idioms you mention and some others.

    The answer to these questions is quite controversial, but the authors above were brave enough to expose themselves in a conference, and then put their thoughts on paper.

    0 讨论(0)
  • 2021-01-29 18:47

    How to decide if an error should be handled locally or propagated to higher level code?

    If the exception breaks the operation of a method it is a good approach to throw it to higher level. If you are familiar with MVC, Exceptions must be evaluated in Controller.

    How to decide between logging an error, or showing it as an error message to the user? Logging errors and all information available about the error is a good approach. If the error breaks the operation or user needs to know that an error is occur you should display it to user. Note that in a windows service logs are very very important.

    Is logging something that should only be done in application code? Or is it ok to do some logging from library code.

    I don't see any reason to log errors in a dll. It should only throw errors. There may be a specific reason to do of course. In our company a dll logs information about the process (not only errors)

    In case of exceptions, where should you generally catch them? In low-level or higher level code? Similar question: at what point should you stop propagating an error and deal with it?

    In a controller.

    Edit: I need to explain this a bit if you are not familiar with MVC. Model View Controller is a design pattern. In Model you develop application logic. In View you display content to user. In Controller you get user events and call Model for relevant function then invoke View to display result to the user.

    Suppose that you have a form which has two textboxes and a label and a button named Add. As you might guess this is your view. Button_Click event is defined in Controller. And an add method is defined in Model. When user clicks, Button_Click event is triggered and Controller calls add method. Here textbox values can be empty or they can be letters instead of numbers. An exception occur in add function and this exception is thrown. Controller handles it. And displays error message in the label.

    Should you strive for a unified error handling strategy through all layers of code, or try to develop a system that can adapt itself to a variety of error handling strategies (in order to be able to deal with errors from 3rd party libraries).

    I prefer second one. It would be easier. And I don't think you can do a general stuff for error handling. Especially for different libraries.

    Does it make sense to create a list of error codes? Or is that old fashioned these days?

    That depends on how will you use it. In a single application (a web site, a desktop application), i don't think it is needed. But if you develop a web service, how will you inform users for errors? Providing an error code is always important here.

    If (error.Message == "User Login Failed")
    {
       //do something.
    }
    
    If (error.Code == "102")
    {
       //do something.
    }
    

    Which one do you prefer?

    And there is another way for error codes these days:

    If (error.Code == "LOGIN_ERROR_102") // wrong password
    {
       //do something.
    }
    

    The others may be: LOGIN_ERROR_103 (eg: this is user expired) etc...

    This one is also human readable.

    0 讨论(0)
  • 2021-01-29 18:53

    My view on logging (or other actions) from library code is NEVER.

    A library should not impose policy on its user, and the user may have INTENDED an error to occur. Perhaps the program was deliberately soliciting a particular error, in the expectation of it arriving, to test some condition. Logging this error would be misleading.

    Logging (or anything else) imposes policy on the caller, which is bad. Moreover, if a harmless error condition (which would be ignored or retried harmlessly by the caller, for example) were to happen with a high frequency, the volume of logs could mask any legitimate errors or cause robustness problems (filling discs, using excessive IO etc)

    0 讨论(0)
  • 2021-01-29 18:54

    Is logging something that should only be done in application code? Or is it ok to do some logging from library code.

    Just wanted to comment on this. My view is to never logg directly in the library code, but provide hooks or callbacks to implement this in the application code, so the application can decide what to do with the output from the log (if anything at all).

    0 讨论(0)
提交回复
热议问题