Recently we encountered a production in an application. This application was connecting to multiple systems of records (SOR). Oracle RAC cluster is one of the primary systems of record. This Oracle RAC cluster was slowing down due to resource constraints. This slowdown in the Oracle RAC cluster degraded the entire application’s response time. In this post, let’s discuss the steps we pursued to troubleshoot this problem.

Capturing Troubleshooting Artifacts

The application was running on a WebLogic server. All of a sudden, this application started to become unresponsive.  We ran the yCrash open-source script against it. This script captures 16 different troubleshooting artifacts such as Garbage Collection log, thread dump, heap dump, netstat, vmstat, iostat, top, etc. from the application stack. When an application becomes unresponsive, there could be multiple reasons for it: Garbage collection pauses, threads getting blocked, network connectivity issues, CPU starvation, memory constraints, etc. Thus, it’s ideal to capture all the troubleshooting artifacts.

Generated by Feedzy