Skip to content

VNG demosaicer OpenCL performance regression on 5.4 #20055

@agat114

Description

@agat114

Is there an existing issue for this?

  • I checked and did not find my issue in the already reported ones

Describe the bug

After upgrade to 5.4 and 5.5. I am experiencing short freezes when using OpenCL accelerated modules, such as exposure, raw chromatic aberrations, demosaic. Some of those are applied during opening a file, so the freeze occurs right of the start.
What's strange, it didn't happen with DT 5.2.
There was similar issue #20050 Although hardware of previous reporter is much more powerful, so the freezes may be less pronounced.

Short freeze appears along with following lines in the output
8.4097 [opencl copy_host_to_device_constant] could not allocate oversize buffer on device ‘NVIDIA CUDA Quadro K2000’ id=0: CL_SUCCESS

** 8.4097 [opencl copy_host_to_device_constant] could not allocate oversize buffer on device ‘NVIDIA CUDA Quadro K2000’ id=0: CL_SUCCESS**
** 8.4098 [opencl copy_host_to_device_constant] could not allocate oversize buffer on device ‘NVIDIA CUDA Quadro K2000’ id=0: CL_SUCCESS**
** 8.4099 [opencl copy_host_to_device_constant] could not allocate oversize buffer on device ‘NVIDIA CUDA Quadro K2000’ id=0: CL_SUCCESS**

Could you help interpret the above?
Is it a problem with GPU memory usage by OpenCL?
Is it possible to tweak by modifying some limits?
Could this be due to changes in newest DT releases?
Conflict of new DT with OpenCL packages for kernel 6.18?
It is important to understand nature of the issue before taking action - buy new GPUs, downgrade DT version or change configuration.

Steps to reproduce

  1. Get the same setup
  2. Launch DT in CLI in debug mode with darktable -d opencl
  3. While looking at stdout, open RAW file with DT and perform operation with one of OpenCL accelerated modules

Expected behavior

No freezes

Logfile | Screenshot | Screencast

Debug output:

darktable -d opencl
darktable 5.5.0~git41.fa8b49d6-1+13547.1
Copyright (C) 2012-2025 Johannes Hanika and other contributors.

Compile options:
Bit depth → 64 bit
Exiv2 → 0.27.5
Lensfun → 0.3.2
Debug → DISABLED
SSE2 optimizations → ENABLED
OpenMP → ENABLED
OpenCL → ENABLED
Lua → ENABLED - API version 9.6.0
Colord → ENABLED
gPhoto2 → ENABLED
OSMGpsMap → ENABLED - map view is available
GMIC → ENABLED - Compressed LUTs are supported
GraphicsMagick → ENABLED
ImageMagick → DISABLED
libavif → DISABLED
libheif → DISABLED
libjxl → DISABLED
LibRaw → ENABLED - Version 0.22.0-PreRC1
OpenJPEG → ENABLED
OpenEXR → ENABLED
WebP → ENABLED

See resources | darktable for detailed documentation.
See Sign in to GitHub · GitHub to report bugs.

0.0001 [dt starting]

darktable -d opencl
0.2614 [dt_dlopencl_init] could not find default opencl runtime library ‘libOpenCL’
0.2615 [dt_dlopencl_init] could not find default opencl runtime library ‘libOpenCL.so’
0.2618 [opencl_init] opencl library ‘libOpenCL.so.1’ found on your system and loaded, preference ‘default path’
0.2958 [opencl_init] found 1 platform
[opencl_init] found 2 devices

[dt_opencl_device_init]
DEVICE: 0: ‘Quadro K2000’
CONF KEY: cldevice_v5_nvidiacudaquadrok2000
PLATFORM, VENDOR & ID: NVIDIA CUDA, NVIDIA Corporation, ID=4318
CANONICAL NAME: nvidiacudaquadrok2000
DRIVER VERSION: 470.256.02
DEVICE VERSION: OpenCL 3.0 CUDA, SM_20 SUPPORT
DEVICE_TYPE: GPU, dedicated mem
GLOBAL MEM SIZE: 1991 MB
MAX MEM ALLOC: 498 MB
MAX IMAGE SIZE: 16384 x 16384
MAX CONSTANT BUFFER: 64 KB
ADDRESS ALIGN: 512
MAX WORK GROUP SIZE: 1024
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 64 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH & HEIGHT 16x16
CHECK EVENT HANDLES: 128
TILING ADVANTAGE: 61427584.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /usr/share/darktable/kernels
KERNEL DIRECTORY: /home/alex/.cache/darktable/cached_v5_kernels_for_NVIDIACUDAQuadroK2000_47025602
CL COMPILER OPTION: -cl-fast-relaxed-math
CL COMPILER COMMAND: -w -cl-fast-relaxed-math -DNVIDIA_SM_20=1 -DNVIDIA=1 -I"/usr/share/darktable/kernels"
CL EXCEPTION: DT_OPENCL_ONLY_CUDA
KERNEL LOADING TIME: 0.0404 sec

[dt_opencl_device_init]
DEVICE: 1: ‘Quadro K2000’
CONF KEY: cldevice_v5_nvidiacudaquadrok2000
PLATFORM, VENDOR & ID: NVIDIA CUDA, NVIDIA Corporation, ID=4318
CANONICAL NAME: nvidiacudaquadrok2000
DRIVER VERSION: 470.256.02
DEVICE VERSION: OpenCL 3.0 CUDA, SM_20 SUPPORT
DEVICE_TYPE: GPU, dedicated mem
GLOBAL MEM SIZE: 2000 MB
MAX MEM ALLOC: 500 MB
MAX IMAGE SIZE: 16384 x 16384
MAX CONSTANT BUFFER: 64 KB
ADDRESS ALIGN: 512
MAX WORK GROUP SIZE: 1024
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 64 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH & HEIGHT 16x16
CHECK EVENT HANDLES: 128
TILING ADVANTAGE: 61427584.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /usr/share/darktable/kernels
KERNEL DIRECTORY: /home/alex/.cache/darktable/cached_v5_kernels_for_NVIDIACUDAQuadroK2000_47025602
CL COMPILER OPTION: -cl-fast-relaxed-math
CL COMPILER COMMAND: -w -cl-fast-relaxed-math -DNVIDIA_SM_20=1 -DNVIDIA=1 -I"/usr/share/darktable/kernels"
CL EXCEPTION: DT_OPENCL_ONLY_CUDA
KERNEL LOADING TIME: 0.0351 sec
[opencl_init] OpenCL successfully initialized. internal numbers and names of available devices:
[opencl_init] 0 ‘NVIDIA CUDA Quadro K2000’
[opencl_init] 1 ‘NVIDIA CUDA Quadro K2000’
0.4540 [opencl_init] FINALLY: opencl PREFERENCE=ON is AVAILABLE and ENABLED.
[opencl_init] opencl_scheduling_profile: ‘multiple GPUs’
[opencl_init] opencl_device_priority: ‘/!0,///!0,*’
[opencl_init] opencl_mandatory_timeout: 1000
[opencl_update_priorities] these are your device priorities:
[opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] 1 1 1 1 1
[opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[opencl_update_priorities] image preview export thumbs preview2
[opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 20
[opencl_update_priorities] these are your device priorities:
[opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] 1 1 1 1 1
[opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[opencl_update_priorities] image preview export thumbs preview2
[opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 20

darktablerc configuration:
cat $HOME/.config/darktable/darktablerc | grep -e opencl -e nvid
cldevice_v5_nvidiacudaquadrok2000=0 250 0 16 16 128 0 0 0.000 61427584.000 0.250
cldevice_v5_nvidiacudaquadrok2000_building=-cl-fast-relaxed-math
cldevice_v5_nvidiacudaquadrok2000_id0=600
cldevice_v5_nvidiacudaquadrok2000_id1=600
clplatform_intelropenclhdgraphics=FALSE
clplatform_nvidiacuda=TRUE
clplatform_openclon12=FALSE
opencl=TRUE
opencl_checksum=1654065287
opencl_device_priority=/!0,///!0,*
opencl_library=
opencl_mandatory_timeout=1000
opencl_scheduling_profile=multiple GPUs
opencl_tune_headroom=TRUE

Settings:
darktable resources: large
Activate OpenCL support: ON
OpenCL scheduling profile: multiple GPUs
tuned GPU memory: ON

Commit

No response

Where did you obtain darktable from?

darktable.org / GitHub release

darktable version

darktable 5.5.0~git41.fa8b49d6-1+13547.1

What OS are you using?

Linux

What is the version of your OS?

Ubuntu 22.04

Describe your system

My setup is as follows:
Linux kernel 6.8.0-90-generic
Ubuntu 22.04
2x NVIDIA CUDA Quadro K2000
CPU: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz
~ cat /etc/OpenCL/vendors/nvidia.icd
libnvidia-opencl.so.1

Are you using OpenCL GPU in darktable?

Yes

If yes, what is the GPU card and driver?

2x NVIDIA CUDA Quadro K2000, driver Nvidia 470

Please provide additional context if applicable. You can attach files too, but might need to rename to .txt or .zip

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    OpenCLRelated to darktable OpenCL codepriority: highcore features are broken and not usable at all, software crashes

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions