XLA Errors Overview
Stay organized with collections
Save and categorize content based on your preferences.
XLA errors are categorized into different XLA error sources. Each source has a list of an additional context other than the error message, which will be attached to each error within the category.
🚧 Note that this standarization effort is a work in progress so not all error messages will have an attached error code yet.
An example error log might look like:
XlaRuntimeError:RESOURCE_EXHAUSTED:XLA:TPUcompilepermanenterror.Ranoutofmemoryinmemoryspacehbm.Used49.34Gof32.00Ghbm.Exceededhbmcapacityby17.34G.Totalhbmusage>=49.34G:reserved3.12Mprogramunknownsizearguments49.34G
JaxRuntimeError:RESOURCE_EXHAUSTED:Ranoutofmemoryinmemoryspacevmemwhileallocatingonstackfor%ragged_latency_optimized_all_gather_lhs_contracting_gated_matmul_kernel.18=bf16[2048,4096]{1,0:T(8,128)(2,1)}custom-call(%get-tuple-element.18273,%get-tuple-element.18274,%get-tuple-element.18275,%get-tuple-element.18276,%get-tuple-element.18277,/*index=5*/%bitcast.8695,%get-tuple-element.19201,%get-tuple-element.19202,%get-tuple-element.19203,%get-tuple-element.19204),custom_call_target=""
Statuses and CHECK failures
In general, in XLA we can flag corrupted execution with two mechanisms: statuses and CHECK macro failures.
Statuses are meant for non-fatal, recoverable errors. The assumption is that the function returns, and execution continues down the path where the caller explicitly checks the returned Status object. It's useful for handling invalid user input or expected resource constraints.
On the other hand, CHECK failures cover programmer's errors or violations of invariants that should never happen if the code is correct. In case of an activated CHECK the program will log the error message and immediately terminate. It could ensure internal consistency, such as checking that a pointer is non-null before dereferencing it.
Error codes
Here is an index list with all error codes.