-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Paired-data pipeline: 4 bugs in CSI recorder + ground-truth aligner corrupt or block camera-supervised training data #1007
Description
Context
Found during ADR-152 §2.2 measurement (b) (2026年06月10日/11), when a fresh 40-minute paired collection initially aligned to zero windows and the trained-model forensics exposed silent data corruption. These bugs also retroactively explain pathologies in earlier sessions (#645, #509). Full forensic record: benchmarks/wiflow-std/RESULTS.md on branch feat/adr-152-wiflow-std-benchmark.
Bug 1 — scripts/record-csi-udp.py stamps local time with a Z (UTC) suffix
parse_csi_packet() builds timestamp via time.strftime('%Y-%m-%dT%H:%M:%S.') + ... + 'Z' — local wall time labeled as UTC. The camera collector writes true-epoch ts_ns. The aligner parses the CSI ISO string as UTC, so camera and CSI disagree by the UTC offset (−4 h under EDT) and alignment produces 0 pairs. Workaround used: --clock-offset-ms=-14400000. Fix: write datetime.now(timezone.utc).isoformat() or just use the already-present ts_ns in the aligner (preferred — see Bug 4 note).
Bug 2 — scripts/align-ground-truth.js dilutes window confidence with non-detection frames
loadGroundTruth() keeps records with keypoints: [] (empty array is truthy) at confidence 0; window avgConf then averages detections and empties. At a normal ~27% MediaPipe detection rate, every window's avgConf lands ~0.22 < the 0.5 threshold → all windows rejected even when detections themselves average 0.80 confidence. Fix: skip empty-keypoint records at load (treat as gaps); confidence statistics should be over detections only. --min-camera-frames still guards sparse windows.
Bug 3 — heterogeneous csi_shape with silent zero-padding
extractCsiMatrix() stamps the window's subcarrier count from window[0].subcarriers and zero-pads/truncates the other 19 frames to match. Tonight's session: ×ばつ[70,20], ×ばつ[134,20], ×ばつ[26,20], ×ばつ[12,20], ×ばつ[20,20] — ~20% of frames inside even native-70 windows were silently zero-padded. Mixed-subcarrier frames come from the ESP32 emitting different packet formats (HT20/HT40/fragments). Fix: either filter frames to the session's modal subcarrier count before windowing, or record the per-frame subcarrier count and reject mixed windows; never silently pad.
Bug 4 — transposed shape label in extractCsiMatrix
The matrix is filled frame-major (matrix[f * nSc + s]) but declared shape: [nSc, nFrames] (~line 351). Consumers that trust the label transpose the data. Found because the measurement-(b) trainer had to correct it on load. Fix the label or the fill order, and add a round-trip test.
Acceptance
- A fresh paired session aligns with zero clock-offset flags needed
- Window kept-rate ≈
csi_frames/20 ×ばつ detection_coverage(no silent confidence collapse) - No zero-padded frames in output windows;
csi_shapehomogeneous per file - Shape label matches memory layout (tested)
- Re-run alignment on tonight's raw files (
data/recordings/csi-1781143789.csi.jsonl+data/ground-truth/keypoints_20260610_221000.jsonl) reproduces ≥2,046 pairs without workarounds
Related
#645 (paired-data quantity/quality tracking), #509 (external reproducibility), ADR-152 §2.2, the 92.9% retraction (CHANGELOG + PR #535).
🤖 Generated with claude-flow