Troubleshoot I/O & Wait Latency with OraLatencyMap and PyLatencyMap

Blog article:

I recently chased an Oracle performance issue where most reads were sub-millisecond (cache), but a thin band around ~10 ms (spindles) dominated total wait time. Classic bimodal latency: the fast band looked fine in averages, yet the rare slow band owned the delay.

To investigate, and prove it, I refreshed two of my old tools:

OraLatencyMap (SQL*Plus script): samples Oracle’s microsecond wait-event histograms and renders two terminal heat maps with wait event latency details over time
PyLatencyMap (Python): a general latency heat-map visualizer that reads record-oriented histogram streams from Oracle scripts, BPF/bcc, SystemTap, DTrace, trace files, etc.

Both now have fresh releases with minor refactors and dependency checks.

TL;DR: When your database “feels slow,” metrics are key for investigations, however relying on averages may mislead you. That's when wait latency histograms and heat map visualization can help you understand where and how wait time is spent and fix what actually matters.

Why heat maps for wait latency?

Latency data naturally forms histograms (bucketed counts by latency). That’s a 2D view. But performance evolves over time, adding a third dimension (latency × time × magnitude). A heat map projects this 3D story onto your terminal so you can spot patterns at a glance.

Why two heat maps?

Frequency heat map answers: Are many calls slow? (events/sec)
Intensity heat map answers: Are a few slow calls consuming most time? (time/sec)

In my case, frequency showed a bright <1 ms band (healthy) while intensity lit up at ~10 ms (spindles). Rare tail, real culprit.

Reading the canvas

Y-axis = latency buckets (displayed in ms).
X-axis = time, newest at the right.
Top = Frequency (events/sec). Bottom = Intensity (time/sec).

Look for bands (stable tiers), streaks (bursts), and hot tails (small but expensive).

Example output from OraLatencyMap v1.3

This example shows the latency heatmap measured and displayed with OraLatencyMap for the db file sequential read event.
The system is experiencing a bimodal latency distribution, indicating two distinct latency patterns:

Reads from fast storage (SSD) with latency < 1 ms (visible in the Frequency Heatmap, blue area).
Reads from slower storage (spinning disks) with latency ≈ 10 ms (visible in the Intensity Heatmap, yellow-red areas).

Quick starts

Oracle (OraLatencyMap)

OraLatencyMap is a command-line script/widget for Oracle databases.

Prerequisites
• SQL*Plus

• Privileges: SELECT on GV$EVENT_HISTOGRAM_MICRO and EXECUTE on DBMS_LOCK

Basic usage
SQL> @OraLatencyMap

SQL> @OraLatencyMap 3 "db file sequential read" --sample single-block reads every 3s

SQL> @OraLatencyMap 5 "log file sync" -- check commit stalls / commit latency

Tip: Quote event names exactly as shown (including spaces).

Advanced driver (more control)
Syntax:
@OraLatencyMap_advanced <interval_s> "<event_name>" <bins> <columns> "<where_clause>"

Example (focus on single-block reads, 5s sampling, 11 bins, 100 columns, only inst_id=1):
SQL> @OraLatencyMap_advanced 5 "db file sequential read" 11 100 "and inst_id=1"

Parameter notes:

<interval_s> — sampling interval in seconds
<event_name> — Oracle wait event to analyze (quoted)
<bins> — number of latency buckets
<columns> — width (time axis) of the heat map
<where_clause> — optional extra filter (e.g., RAC: and inst_id=1)

Which wait events to start with?

db file sequential read — typical for random single-block I/O; good starting point for read latency issues.
log file sync — measures commit latency; use when users report slow commits.

PyLatencyMap is a tool running in Python. It is a terminal-based visualizer for latency histograms. It’s intended to help with performance tuning and troubleshooting. It renders two scrolling heat maps—Frequency and Intensity—so you can see how latency distributions evolve over time. Works from the command line and plays nicely with sources that output latency histograms (Oracle wait histograms, BPF/bcc, DTrace, SystemTap, tracefiles, etc.).

Install:

pip install PyLatencyMap
latencymap --help (or: python -m LatencyMap --help)

Example, replay sample data:

cat SampleData/example_latency_data.txt | latencymap

Use it live with your sources, Oracle microsecond histograms → heat maps

sqlplus -S / as sysdba @Event_histograms_oracle/ora_latency_micro.sql "db file sequential read" 3 | latencymap

Use it with System Instrumentation (BPF), Linux block I/O with BCC (run as root and need bcc to be installed)

python -u BPF-bcc/pylatencymap-biolatency.py -QT 3 100 | latencymap

🎬 Demo video:

What’s new

OraLatencyMap v1.3 – updated to use GV$EVENT_HISTOGRAM_MICRO by default, small refactors.
PyLatencyMap v1.3 – cleaner CLI, better record parsing, examples refreshed (Oracle, BPF/bcc, SystemTap, DTrace).

When to reach for which

Suspect Oracle waits? Start with OraLatencyMap on db file sequential read or log file sync.
Need cross-stack visibility (OS/storage/trace)? Use PyLatencyMap and feed it histograms from your favorite tracer.

See it, prove it, and fix it

Bottom line: If you only look at averages, you may miss complex behaviors like multimodal I/O (fast I/O with a slow tail). Heat maps also show you the time evolution of the latency patterns. With two small tools and two heat maps, you can see it, prove it, and fix it.

OraLatencyMap (README, scripts, examples)
PyLatencyMap (README, scripts, examples) + PyLatencyMap on PyPI

This work was carried out as part of CERN’s Databases and Analytics activities. I am especially grateful to my colleagues in the IT Database Services and the ATLAS Database Team for their support and collaboration.

Tags

Performance

BPF/bcc

Oracle

Tools

Troubleshoot I/O & Wait Latency with OraLatencyMap and PyLatencyMap

Why heat maps for wait latency?

Why two heat maps?

Reading the canvas

Quick starts

Oracle (OraLatencyMap)

Linux & everything else (PyLatencyMap)

What’s new

When to reach for which

See it, prove it, and fix it

Add new comment

Disclaimer

Blogroll

CERN Accelerating science

Troubleshoot I/O & Wait Latency with OraLatencyMap and PyLatencyMap

Why heat maps for wait latency?

Why two heat maps?

Reading the canvas

Quick starts

Oracle (OraLatencyMap)

Linux & everything else (PyLatencyMap)

What’s new

When to reach for which

See it, prove it, and fix it

Add new comment

Disclaimer

Blogroll