You probably have detailed error and exception logging included in your application that capture business or system failure results. But how do you know which users and what requests caused these to be generated? During live usage, when many users are accessing your site regularly, how do you really know who has done what?
When errors occur, its not always easy to link that to a particular user and action that they undertook. It means that diagnosis is harder because its difficult to put together the story of what the user did so that support staff and developers can reproduce and fix. Also, if the application is performing poorly for an end user and there are no exceptions or errors being logged, it’s a challenge to identify parts of the architecture that are being called too often or performing sub-optimally.
To help, one approach we’ve often proposed is a logging framework that gives a per request timing of how long the application took to respond. We typically couple this with inclusion of an ‘application log transaction id’, that can then be propagated to any subsequent exceptions or logs created within the application. In Java, this can be implemented this as a servlet filter, using a ThreadLocal variable to create and store a log transaction id which can be included within log records created by any subsequent log4j function calls. Not only does it give a rudimentary audit log (per user and request) and an objective application ‘front-door’ view of the response time (to help diagnose slow response issues), it also enables errors and log messages associated with an individual user and request to be found quickly (by searching log files for the log transaction id).
For distributed server architectures, this solution has been combined with automated log file retrieval and analysis tools to make trending of response times possible, creation of monitoring alerts for successive slow requests and faster identification of a problem for an individual user request. For a previous client, this enabled them to better understand the parts of the application architecture that would be loaded at peak times and how to better scale and optimise these.
There are many larger scale similar solutions available in the market. However, the key with this approach is that its small, relatively simple and focussed. It captures information that is useful and measurable and can be extended over time to meet additional requirements based on operation and support experience. Done well, it saves a lot of time otherwise spent investigating support issues and optimising performance.
If you would like to learn more about how such a solution could be implemented or tailored for your web or Saas application, and/or how to minimise time spent in analysing support related issues, Contact Us to book a no-obligation initial consultation.