A UMAP With Arrows Is Not a Benchmark. This Is

DEV Community

raised ValueError: setting an array element with a sequence on both synthetic fixtures and the real pancreas data.

The root cause is in scvelo's leastsq_generalized:

gamma[i] = np.linalg.pinv(A.T.dot(A)).dot(A.T.dot(y[:, i]))

In NumPy 2.x, pinv(A).dot(b) returns a 1-element array rather than a scalar. Assigning a 1-element array into a pre-allocated float32 scalar slot raises ValueError. This is a known upstream incompatibility (github.com/theislab/scvelo/issues/966).

The fix: mode='deterministic'. The deterministic steady-state model from LaManno et al. 2018 is the foundational method and produces correct trajectory ordering without triggering the NumPy 2.x incompatibility.

HVG selection after normalisation causes infinity errors

Adding sc.pp.highly_variable_genes after scv.pp.normalize_per_cell raised ValueError: cannot specify integer bins when input data contains infinity.

Root cause: some genes have near-zero total counts after per-cell normalisation. When pandas tries to bin gene expression levels for HVG dispersion computation, infinity values in the bin array trigger the error.

Fix: remove the HVG step entirely. The scvelo workflow handles its own feature selection inside scv.pp.moments. A separate HVG step before log1p is architecturally incorrect for the scvelo pipeline.

dpt_pseudotime is not pre-stored in the pancreas h5ad

The initial implementation required dpt_pseudotime to exist in the loaded AnnData. It does not. The pancreas h5ad from scvelo's repository stores cell-type labels and raw counts, but not pseudotime.

Fix: compute diffusion pseudotime during preprocessing using sc.tl.dpt() rooted at Ductal cells. This is actually a better design -- the oracle is computed fresh from the data, not loaded from a file that might have been computed by a different pipeline version.

The synthetic fixture that validates the benchmark

Unit tests for a velocity benchmark cannot download real data on every run. The solution is a biologically grounded synthetic fixture that embeds a genuine velocity signal.

# Velocity genes: unspliced leads spliced along the trajectory axis
# This is the precursor-product relationship RNA velocity detects
pseudotime = np.linspace(0, 1, n_cells)
spliced_velocity = np.outer(pseudotime, np.ones(n_velocity_genes)) * 8
unspliced_velocity = np.outer(
 np.clip(pseudotime + 0.15, 0, 1), np.ones(n_velocity_genes)
) * 8

The test test_velocity_signal_in_early_cells verifies this works:

early_ratio = (unspliced[early_mask] / spliced[early_mask]).mean()
late_ratio = (unspliced[late_mask] / spliced[late_mask]).mean()
assert early_ratio > late_ratio

Early trajectory cells have higher unspliced-to-spliced ratio than late cells. This is the biological constraint RNA velocity is built to detect. If the fixture does not satisfy it, the fixture is wrong. Having a test that verifies this means you can trust the downstream velocity tests.

The numbers

Metric	Value
Dataset	Pancreas endocrinogenesis (Bastidas-Ponce 2019)
Cells	3,696
Velocity genes	1,598
Task 1 Spearman rho	0.8926 PASS
Task 2 pairs passing	5/7 (71.4%) FAIL
Task 3 rho drop	0.0029 ROBUST
Tests passing	99
Test coverage	92%
Pipeline runtime	47.2 seconds

What this project is actually about

The rho of 0.8926 is a good number. But what the project demonstrates is something different: the ability to ask whether the number is meaningful.

Anyone can run scv.tl.velocity_pseudotime and get a number. The question that separates a bioinformatics engineer from a tutorial follower is: how do you know whether that number reflects real biology? The answer requires an independent oracle, a test that checks specific biologically defined transitions, a perturbation experiment, and honest documentation of what fails and why -- including cases like the terminal fates where the benchmark design itself needs correction because the biology is branching, not linear.

The benchmark framework is the contribution. The rho is just the output.

Repository

github.com/gbadedata/scvelo-trajectory-benchmark

The README covers the full pipeline architecture, all four engineering challenges, the branching topology correction, and complete instructions for reproducing the results from a fresh clone.

Building and shipping bioinformatics, data engineering, or DevOps projects, trajectory inference benchmarks, or single-cell evaluation frameworks? Connect on GitHub or LinkedIn.