Save debug time by logging properly - boreddev


    A good logs mechanism will help us to debug quickly and save time.


    When we need to control errors in production environments or try to understand an exception response, logs can be good friends or otherwise our enemies.


    On a nice day, we need to add some new features or services to the product, we will realize the importance of logs.


    As we begin to develop, we will make a few logging mistakes, so the cost we have to work through the night is to crawl through the heaps of cumbersome logs to debug. Here are a few to share, note when making log.


    Note the logs storage capacity to overflow disk space


    Not enough disk storage logs
    Not enough disk storage logs

    When developing on a local machine, we usually don't care about the log storage capacity. Local hard disk drives are often very large compared to the relatively small number of logs written - due to the fact that the local environment is just a development environment, there are no continuous logs generation interactions like the production environment.


    At the time the hard disk was full, it would not be possible to store logs at that time, and the greater risk was that having a full hard drive could cause the system on production environment to crash. .


    If you want logs that don't build up to overflow the disk, don't forget to use a rotating file handler, which limits the maximum amount of logs that can be written. The rotating file handler will overwrite the old log when the threshold is reached.


    Avoid saving distributed logs


    Avoid storing distributed logs
    Logs are scattered across computers

    Services on the production environment are usually on different machines. Searching for a specific log record will require browsing all of these devices. Meanwhile, we have to quickly fix service errors, and there's no time to waste to figure out exactly where the error occurred and which log will be saved on which machine.


    Instead of storing logs on the hard disk of each machine, we stream them to a centralized logging system (eg syslog server). This allows us to find all logs across the system easily and quickly.


    If you are using AWS or GCP - you can use their logging agent. These agents will perform the streaming log to the search engine logging system.


    Should log more or less log?


    Should perform more or less log
    Significant log should be done

    From a personal perspective, the number of logs is not important, but it is important that the implementation of the log must be meaningful, meeting the purpose of investigating errors or problems on the production environment. When you are hesitant to add a few new logs, you should think about how you will use them in the future or speculate on possible errors and what information we need to get there. debug, trace for ease. So the answer to this question is: What information needs to be logged to give developers a chance to read it when something goes wrong?


    I have encountered many times logs used for user analysis, such as log "User A clicked on button B". This is not a meaningful log type for development or debugging.


    Avoid needle-bottom needle type


    Avoid needle-bottom needle type

    Looking at the log below, you will see that the logs for 3 requests have been processed.


    You need to identify and answer the question "How long will it take to process the 2nd request"? Takes 1ms, 4ms or 6ms?


    2018-10-21 22:39:07,051 - simple_example - INFO - entered request 2018-10-21 22:39:07,053 - simple_example - INFO - entered request 2018-10-21 22:39:07,054 - simple_example - INFO - ended request 2018-10-21 22:39:07,056 - simple_example - INFO - entered request 2018-10-21 22:39:07,057 - simple_example - INFO - ended request 2018-10-21 22:39:07,059 - simple_example - INFO - ended request

    Without any additional information in each log line, we may not be sure what the correct answer is.


    Consider the logs below, by adding to each request an id identifier (such as request 1, request 2) will help us answer the above question more clearly. In addition, adding metadata add-on information (for example, feature module name information: "simple_example", or some brief words describing what action is happening: "entered request 1", " req 1 invalid request structure ”) within each log line will help us filter logs and find the appropriate log more easily.


    We add a few metadata to the log as follows:


    2018-10-21 23:17:09,139 - INFO - entered request 1 - simple_example
    2018-10-21 23:17:09,141 - INFO - entered request 2 - simple_example
    2018-10-21 23:17:09,142 - INFO - ended request id 2 - simple_example
    2018-10-21 23:17:09,143 - INFO - req 1 invalid request structure - simple_example
    2018-10-21 23:17:09,144 - INFO - entered request 3 - simple_example
    2018-10-21 23:17:09,145 - INFO - ended request id 1 - simple_example
    2018-10-21 23:17:09,147 - INFO - ended request id 3 - simple_example

    Because the reading is usually from left to right, in order to make it easier to find, track, and identify the module flow, you should log in the following structure:


    • Information timestamp at the first part of the line
    • Log level information in the second part (immediately after the timestamp)
    • The module name information is in the 3rd part (the module name should not contain spaces, it should be immediately written or separated by underscores "_" to make it easier to search by the stream)
    • Information describing processing and auxiliary information metadata in part 4
    // timestamp - log level - module_name - message + metadata
    2018-10-21 23:12:39,497 - INFO - simple_example - entered request (user/create) (req: 1)
    2018-10-21 23:12:39,500 - INFO - simple_example - entered request (user/login) (req: 2)
    2018-10-21 23:12:39,502 - INFO - simple_example - ended request (user/login) (req: 2)
    2018-10-21 23:12:39,504 - ERROR - simple_example - invalid request structure (user/login) (req: 1)
    2018-10-21 23:12:39,506 - INFO - simple_example - entered request (user/create) (req: 3)
    2018-10-21 23:12:39,507 - INFO - simple_example - ended request (user/create (req: 1)
    2018-10-21 23:12:39,509 - INFO - simple_example - ended request (user/create) (req: 3)

    Analyze logs


    Statistical analysis in graylog
    Statistical analysis in graylog

    Searching on physical files to find log records is a time-consuming and expensive process. In this way, dealing with large log files and having to resort to searching with regular expressions will be costly and slow.


    A more modern way, we take advantage of fast search engines like Elastic Search and use indexing of all logs in these systems. Using ELK will give us the ability to analyze logs and answer questions like this:


    1. Which machine does the error error come from? And does it happen in the whole environment?
    2. When does the error occur? How often does this error occur?

    Because it is possible to perform statistical analysis on all logs, it is possible to set up error warnings. For example, you can use a system to store logs and analyze logs like graylog, logstash, fluentd, ... or cloud services solutions like sentry, bugsnag, loggly, raygun, ...


    In short, do not perform log for yes. On each new feature we develop, we need to think about their future and which logs will help us in the development process or which logs will distract us.


    Remember: logs will help solve problems that arise on production only if you care about them.


    Không có nhận xét nào