The following utilities allow you to collect information about system resource usage and errors, and can help you to identify performance problems caused by overloaded disks, network, memory, or CPUs:
- Displays the contents of the kernel ring buffer, which can contain errors about system resource usage. Provided by the
- Displays statistics about system resource usage. Provided by the
- Displays the amount of free and used memory in the system. Provided by the
- Reports I/O statistics. Provided by the
- Monitors disk and swap I/O on a per-process basis. Provided by the
- Reports network interface statistics and errors. Provided by the
- Reports processor-related statistics. Provided by the
- Reports information about system activity. Provided by the
- Reports network interface statistics. Provided by the
- Provides a dynamic real-time view of the tasks that are running on a system. Provided by the
- Displays the system load averages for the past 1, 5, and 15 minutes. Provided by the
- Reports virtual memory statistics. Provided by the
Many of these utilities provide overlapping functionality. For more information, see the individual manual page for the utility.
See Section 5.2.3, “Parameters that Control System Performance” for a list of kernel parameters that affect system performance.
9.2.2 Monitoring Usage of System Resources
You need to collect and monitor system resources regularly to provide you with a continuous record of a system. Establish a baseline of acceptable measurements under typical operating conditions. You can then use the baseline as a reference point to make it easier to identify memory shortages, spikes in resource usage, and other problems when they occur. Monitoring system performance also allows you to plan for future growth and to see how configuration changes might affect future performance.
To run a monitoring command every
intervalseconds in real time and watch its output change, use the watch command. For example, the following command runs the mpstat command once per second:
Alternatively, many of the commands allow you to specify the sampling interval in seconds, for example:
If installed, the sar command records statistics every 10 minutes while the system is running and retains this information for every day of the current month. The following command displays all the statistics that sar recorded for day
DDof the current month:
sar -A -f /var/log/sa/sa
To run sar command as a background process and collect data in a file that you can display later by using the -foption:
count>/dev/null 2>&1 &
countis the number of samples to record.
Oracle OSWatcher Black Box (OSWbb) and OSWbb analyzer (OSWbba) are useful tools for collecting and analysing performance statistics. For more information, see Section 9.2.4, “About OSWatcher Black Box”.
The uptime, mpstat, sar, dstat, and top utilities allow you to monitor CPU usage. When a system's CPU cores are all occupied executing the code of processes, other processes must wait until a CPU core becomes free or the scheduler switches a CPU core to run their code. If too many processes are queued too often, this can represent a bottleneck in the performance of the system.
The commands mpstat -P ALL and sar -u -P ALL display CPU usage statistics for each CPU core and averaged across all CPU cores.
%idlevalue shows the percentage of time that a CPU was not running system code or process code. If the value of
%idleis near 0% most of the time on all CPU cores, the system is CPU-bound for the workload that it is running. The percentage of time spent running system code (
%sys) should not usually exceed 30%, especially if
%idleis close to 0%.
The system load average represents the number of processes that are running on CPU cores, waiting to run, or waiting for disk I/O activity to complete averaged over a period of time. On a busy system, the load average reported by uptime or sar -q should usually be not greater than two times the number of CPU cores over periods as long as 5 or 15 minutes. If the load average exceeds four times the number of CPU cores for long periods, the system is overloaded.
In addition to load averages (
ldavg-*), the sar -q command reports the number of processes currently waiting to run (the run-queue size,
runq-sz) and the total number of processes (
plist_sz). The value of
runq-szalso provides an indication of CPU saturation.
Determine the system's average load under normal loads where users and applications do not experience problems with system responsiveness, and then look for deviations from this benchmark over time. A dramatic rise in the load average can indicate a serious performance problem.
A combination of sustained large load average or large run queue size and low
%idlecan indicate that the system has insufficient CPU capacity for the workload. When CPU usage is high, use a command such as dstator top to determine which processes are most likely to be responsible. For example, the following dstatcommand shows which processes are using CPUs, memory, and block I/O most intensively:
dstat --top-cpu --top-mem --top-bio
The top command provides a real-time display of CPU activity. By default, top lists the most CPU-intensive processes on the system. In its upper section, top displays general information including the load averages over the past 1, 5 and 15 minutes, the number of running and sleeping processes (tasks), and total CPU and memory usage. In its lower section, top displays a list of processes, including the process ID number (PID), the process owner, CPU usage, memory usage, running time, and the command name. By default, the list is sorted by CPU usage, with the top consumer of CPU listed first. Type
fto select which fields top displays,
oto change the order of the fields, or
Oto change the sort field. For example, entering
Onsorts the list on the percentage memory usage field (
The sar -r command reports memory utilization statistics, including
%memused, which is the percentage of physical memory in use.
sar -B reports memory paging statistics, including
pgscank/s, which is the number of memory pages scanned by the
kswapddaemon per second, and
pgscand/s, which is the number of memory pages scanned directly per second.
sar -W reports swapping statistics, including
pswpout/s, which are the numbers of pages per second swapped in and out per second.
%memusedis near 100% and the scan rate is continuously over 200 pages per second, the system has a memory shortage.
Once a system runs out of real or physical memory and starts using swap space, its performance deteriorates dramatically. If you run out of swap space, your programs or the entire operating system are likely to crash. If freeor top indicate that little swap space remains available, this is also an indication you are running low on memory.
The output from the dmesg command might include notification of any problems with physical memory that were detected at boot time.
The iostat command monitors the loading of block I/O devices by observing the time that the devices are active relative to the average data transfer rates. You can use this information to adjust the system configuration to balance the I/O loading across disks and host adapters.
iostat -x reports extended statistics about block I/O activity at one second intervals, including
%util, which is the percentage of CPU time spent handling I/O requests to a device, and
avgqu-sz, which is the average queue length of I/O requests that were issued to that device. If
%utilapproaches 100% or
avgqu-szis greater than 1, device saturation is occurring.
You can also use the sar -d command to report on block I/O activity, including values for
The iotop utility can help you identify which processes are responsible for excessive disk I/O. iotop has a similar user interface to top. In its upper section, iotop displays the total disk input and output usage in bytes per second. In its lower section, iotop displays I/O information for each process, including disk input output usage in bytes per second, the percentage of time spent swapping in pages from disk or waiting on I/O, and the command name. Use the left and right arrow keys to change the sort field, and press
Ato toggle the I/O units between bytes per second and total number of bytes, or
Oto toggle between displaying all processes or only those processes that are performing I/O.
The sar -v command reports the number of unused cache entries in the directory cache (
dentunusd) and the numbers of in-use file handles (
file-nr), inode handlers (
inode-nr), and pseudo terminals (
iostat -n reports I/O statistics for each NFS file system that is mounted.
The ip -s link command displays network statistics and errors for all network devices, including the numbers of bytes transmitted (
TX) and received (
overrunfields provide an indicator of network interface saturation.
The ss -s command displays summary statistics for each protocol.
The GNOME desktop environment includes a graphical system monitor that allows you to display information about the system configuration, running processes, resource usage, and file systems.
To display the System Monitor, use the following command:
The Resources tab displays:
- CPU usage history in graphical form and the current CPU usage as a percentage.
- Memory and swap usage history in graphical form and the current memory and swap usage.
- Network usage history in graphical form, the current network usage for reception and transmission, and the total amount of data received and transmitted.
To display the System Monitor Manual, press
F1or select Help > Contents.
Oracle OSWatcher Black Box (OSWbb) collects and archives operating system and network metrics that you can use to diagnose performance issues. OSWbb operates as a set of background processes on the server and gathers data on a regular basis, invoking such Unix utilities as vmstat, mpstat, netstat, iostat, and top.
OSWbb is particularly useful for Oracle RAC (Real Application Clusters) and Oracle Grid Infrastructure configurations. The RAC-DDT (Diagnostic Data Tool) script file includes OSWbb, but does not install it by default.
To install OSWbb:
- Log on to My Oracle Support (MOS) at http://support.oracle.com.
- Download OSWatcher from the link listed by Doc ID 301137.1 athttps://support.oracle.com/epmos/faces/DocumentDisplay?id=301137.1.
- Copy the file to the directory where you want to install OSWbb, and run the following command:
tar xvf oswbb
VERSrepresents the version number of OSWatcher, for example 730 for OSWatcher 7.30.Extracting the tar file creates a directory named
oswbb, which contains all the directories and files that are associated with OSWbb, including the startOSWbb.sh script.
- To enable the collection of iostat information for NFS volumes, edit the OSWatcher.sh script in the
oswbbdirectory, and set the value of
To start OSWbb, run the startOSWbb.sh script from the
The optional frequency and duration arguments specifying how often in seconds OSWbb should collect data and the number of hours for which OSWbb should run. The default values are 30 seconds and 48 hours. The following example starts OSWbb recording data at intervals of 60 seconds, and has it record data for 12 hours:
./startOSWbb.sh 60 12... Testing for discovery of OS Utilities... VMSTAT found on your system. IOSTAT found on your system. MPSTAT found on your system. IFCONFIG found on your system. NETSTAT found on your system. TOP found on your system. Testing for discovery of OS CPU COUNT oswbb is looking for the CPU COUNT on your system CPU COUNT will be used by oswbba to automatically look for cpu problems CPU COUNT found on your system. CPU COUNT = 4 Discovery completed. Starting OSWatcher Black Box v7.3.0 on
date and timeWith SnapshotInterval = 60 With ArchiveInterval = 12 ... Data is stored in directory:
OSWbba_archiveStarting Data Collection... oswbb heartbeat:
date and timeoswbb heartbeat:
date and time + 60 seconds...
OSWbba_archiveis the path of the archive directory that contains the OSWbb log files.
To stop OSWbb prematurely, run the stopOSWbb.sh script from the
OSWbb collects data in the following directories under the
OSWbb stores data in hourly archive files named
. Each entry in a file is preceded by a timestamp.
From release v4.0.0, you can use the OSWbb analyzer (OSWbba) to provide information on system slowdowns, system hangs and other performance problems, and also to graph data collected from iostat, netstat, andvmstat. OSWbba requires that you have installed Java version 1.4.2 or higher on your system. You can use yumto install Java, or you can download a Java RPM for Linux from http://www.java.com.
Use the following command to run OSWbba from the
java -jar oswbba.jar -i
OSWbba_archiveis the path of the archive directory that contains the OSWbb log files.
You can use OSWbba to display the following types of performance graph:
- Process run, wait and block queues.
- CPU time spent running in system, user, and idle mode.
- Context switches and interrupts.
- Free memory and available swap.
- Reads per second, writes per second, service time for I/O requests, and percentage utilization of bandwidth for a specified block device.
You can also use OSWbba to save the analysis to a report file, which reports instances of system slowdown,spikes in run queue length, or memory shortage, describes probable causes, and offers suggestions of how to improve performance.
java -jar oswbba.jar -i