January 20, 2026

Here you will find all the important explanations about our tool: DT Performance Analyzer v0.1
Performance analysis in darktable: Finding the bottleneck
Darktable is an extremely powerful RAW developer, but this performance takes its toll on the hardware.
A performance analysis helps you understand what’s happening “under the hood.” Here we explain the most important concepts.
- When and where is speed lost?
- Why is a fast GPU alone not always the best solution?
The information and examples on this page refer to exporting photos at maximum resolution. Since editing within darktable (DT) generally requires fewer resources, exporting represents the most demanding scenario, which is ideal for analyzing performance.
If the export process runs quickly, smooth performance during editing is guaranteed.
1. The Pixelpipe: An assembly line for data
You can think of editing in darktable as an assembly line. The image is passed from module to module (e.g., descreening -> exposure -> white balance -> tone mapping (e.g., AgX).
The goal is to load the image onto the graphics card (GPU) once, process all steps there, and export the finished image.
Where is speed lost?
A. The “Ping-Pong” Effect (CPU vs. GPU)
The biggest bottleneck for performance is not a slow GPU, but rather the unnecessary transfer of data between the GPU and CPU.
- Ideal case: Image -> GPU -> Module A -> Module B -> Module C -> Output.
- Problem: Image -> GPU -> Module A -> CPU module (data must be returned to RAM) -> CPU calculates -> Back to GPU -> Module C.
This data transfer via the PCIe bus is slow compared to the computing speed. A single module that is not running on the GPU can slow down the entire pipeline because it interrupts the flow.
The illustration on the right shows you the differences in processing time between GPU and CPU.

When does this problem occur?
- The module does not support OpenCL.
- Insufficient GPU memory (VRAM). Not every module can use tiling (next section); if the GPU memory is insufficient, the only option is to use the CPU.
B. Tiling
Graphics cards have extremely fast, but limited, memory (VRAM). A 24-megapixel image often occupies several gigabytes of VRAM during processing, as darktable calculates with 32-bit floating point numbers and requires temporary storage for algorithms.
- What happens? If the image does not fit into the VRAM in one piece, darktable breaks it down into tiles.
- The disadvantage: Each tile must be calculated individually. To prevent edges from being visible, the tiles must overlap (“ghosting”). These overlapping areas are therefore calculated twice.
- The result: Tiling enables the export of huge images on small graphics cards, but comes at the cost of (massive) performance hits.

2. The driver interfaces: Who communicates with the hardware?
Darktable uses OpenCL to communicate with the graphics card. But how OpenCL is implemented makes a difference.
Example: AMD RX 9060 XT with 8GB (gfx1200) – OpenCL-Mesa (RustiCl) vs. ROCm

Important: This example does not indicate whether ROCm is generally better than OpenCL-Mesa! This may vary depending on the system.
ROCm (Radeon Open Compute)
This is AMD’s modern approach to high-performance computing on Linux.
- Advantage: Often very fast and stable with newer cards (RX 6000/7000/9000 series). Uses hardware very efficiently.
- Disadvantage: Officially only supported for certain distributions and cards, sometimes tricky to install.
RustiCl (OpenCL-Mesa)
A newer OpenCL driver written in the Rust language, which is part of the Mesa project.
- Advantage: Often works “out of the box” on many Linux systems and also supports older hardware or integrated graphics units (iGPUs) that are no longer supported by proprietary drivers.
- Performance: Now often on par with proprietary drivers, sometimes even faster for specific tasks.
Proprietary drivers (AMD Pro / Nvidia)
- Nvidia: There are hardly any alternatives here. The proprietary driver is extremely mature and performs well.
- AMD Pro: Generally solid, but under Linux it is increasingly being displaced by ROCm or Rusticl.
3. Why CPU-only is not an (good) option
Modern image processing algorithms (such as Diffuse & Sharpen or Denoise (profiled)) perform millions of calculations per pixel.
- A CPU has a few, very complex cores (e.g., 16 cores).
- A GPU has thousands of simple cores (e.g., 4,000 shader units).
For image processing, where each pixel can be processed in parallel, the GPU much faster than the CPU. Performance analysis shows this clearly: an export that takes 2 seconds on the GPU can take 40 seconds on the CPU.

4. Conclusion
How does the analysis help me?
The analysis serves as a diagnostic tool: if DT stalls or export takes too long, you can immediately see which component (CPU, GPU, or RAM) is the bottleneck. This allows you to target the hardware limitation. The same applies to the graphics driver.
options for action
When working in darktable:
- Temporarily disable modules: Only activate computationally intensive modules that are not crucial for the current look (e.g., Diffuse or Sharpen, Denoise (profiled)) at the very end of the editing process. This keeps the preview fluid.
Important: The order of the modules in darktable’s pixelpipe is fixed. Even if you activate a module “later” during your editing workflow, it will be calculated at the technically correct point in the pixelpipe—so it’s worth leaving the “heavy hitters” deactivated until the end. - Optimize OpenCL drivers (Linux/AMD): Test whether your AMD card performs better with the ROCm driver or the newer Rusticl stack. A simple A/B test (stop export time) will quickly clarify this.
- iGPU memory management: If you don’t have a dedicated graphics card, allocate more system RAM to the iGPU in the BIOS. While 4 GB is sufficient for simple editing, 8 GB or 16 GB is significantly more stable for high-resolution sensors and complex modules.
When exporting:
- Targeted scaling: If the photo is only intended for social media or the web, scale down the resolution when exporting. This saves processing time and storage space.
- OpenCL prioritization: In the darktable settings, ensure that the GPU is preferred during export (profile “Very fast GPU” or “Multiple GPUs”) to reduce the load on the CPU.
