-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Improve DLL/shared library stability by replacing static initializers #16717
-
Hi llama.cpp team,
I'm writing to propose a couple of small changes to improve the library's stability when it's loaded as a DLL/shared library, particularly from non-C++ environments like the JVM (via JNA/JNI) or perhaps Python (via ctypes).
The Problem: Unreliable Static Initialization in Foreign Hosts
When a C++ DLL is loaded into a "foreign" process (like java.exe), the C++ Runtime's automatic static initialization mechanism can be unreliable. This is a known issue in complex plugin architectures.
The result is that global static objects can be left in an uninitialized state. For llama.cpp, this can lead to crashes when, for example, a static std::mutex is used before its constructor has been called. I have verified this behavior by loading the library from a JVM environment, which results in a memory access violation.
Proposed Solutions
I've identified a couple of areas where the problematic implicit static initialization can be replaced with more robust, explicit patterns that work reliably in any environment.
Case 1: Lazy, Thread-Safe Initialization with std::call_once
Location: ggml/src/ggml-cuda/ggml-cuda.cu lines 3848-3853
Current Code:
static bool initialized = false;
{
static std::mutex mutex;
std::lock_guard<std::mutex> lock(mutex);
if (!initialized) {
//rest of the code
}
Proposed Change:
{
static std::once_flag init_flag;
std::call_once(init_flag, []() {
//same code
});
Rationale: This pattern moves initialization from the unreliable library load time to a thread-safe, on-demand call the first time the resource is needed. It completely bypasses the static initialization problem.
Case 2: Safely Initializing a Global Lock
A static std::mutex is a common point of failure. A more robust approach is to use a function-local static, which is guaranteed to be initialized in a thread-safe manner since C++11 (the "Construct on First Use Idiom").
Location: ggml/src/ggml-threading.cpp
Current Code:
std::mutex ggml_critical_section_mutex;
void ggml_critical_section_start() {
ggml_critical_section_mutex.lock();
}
void ggml_critical_section_end(void) {
ggml_critical_section_mutex.unlock();
}
Proposed Change:
#ifndef _WIN32
namespace {
static std::mutex * get_critical_section_mutex() {
static std::mutex * mtx = new std::mutex();
return mtx;
}
} // namespace
void ggml_critical_section_start() {
get_critical_section_mutex()->lock();
}
void ggml_critical_section_end() {
get_critical_section_mutex()->unlock();
}
#else
#include <windows.h>
namespace {
struct CriticalSectionWrapper {
CRITICAL_SECTION cs;
CriticalSectionWrapper() { InitializeCriticalSection(&cs); }
~CriticalSectionWrapper() { DeleteCriticalSection(&cs); }
};
static CriticalSectionWrapper * get_cs_wrapper() {
static CriticalSectionWrapper * wrapper = new CriticalSectionWrapper();
return wrapper;
}
} // namespace
void ggml_critical_section_start() {
EnterCriticalSection(&get_cs_wrapper()->cs);
}
void ggml_critical_section_end() {
LeaveCriticalSection(&get_cs_wrapper()->cs);
}
#endif
Rationale: This modern C++ idiom ensures the mutex is constructed correctly and only once, the first time it's needed. It's clean, efficient, and completely avoids the global static initialization issue. For older compilers, std::call_once could also be used to initialize a static pointer to the mutex. The Windows-specific code is due to a known bug in some CRTs bundled with, for example, recent JVMs (the bugfix came with VS2022 17.10)
DeepWiki problem analysis:
You're experiencing crashes due to static initialization order fiasco (SIOF) across DLL boundaries, specifically with static std::mutex objects in ggml-threading.cpp and ggml-cuda.cu.
Problem Analysis
The crashes occur at these locations:
ggml-threading.cpp: The global ggml_critical_section_mutex at line 4 ggml-threading.cpp:4 is a static mutex that may not be initialized when accessed across DLL boundaries ggml-threading.cpp:6-8 .
ggml-cuda.cu: Multiple static mutexes are used in backend registration functions:
ggml_cuda_lock at line 526 ggml-cuda.cu:526
Static mutex in ggml_backend_cuda_buffer_type() at line 734 ggml-cuda.cu:734-735
Static mutex in ggml_backend_cuda_split_buffer_type() at line 1051 ggml-cuda.cu:1051-1052
Static mutex in ggml_backend_cuda_reg() at line 3851 ggml-cuda.cu:3851-3852
Similar patterns exist in other backends like CANN ggml-cann.cpp:2972-2973 , SYCL ggml-sycl.cpp:4575-4576 , and OpenCL ggml-opencl.cpp:4022-4025 .
Thanks for considering these suggestions. I believe they would make llama.cpp even more robust and versatile.
Best regards,
GM
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment 1 reply
-
The proposed changes look good to me.
Beta Was this translation helpful? Give feedback.
All reactions
-
Thank you for considering my contribution. My suggestions are based on a very limited use of llama.cpp's features, but they solve the crashes I've encountered so far. I don't rule out the possibility that there may be other points, in the complex environment that represents my use case, where I need to intervene as I progress. If so, I'll report them, as I have already done, along with the solutions I've found.
Beta Was this translation helpful? Give feedback.