Releases: playeriv65/EasyLocomo

v0.1.0: Legacy Logic Alignment & Baseline Freeze

07 Jan 05:13

@playeriv65 playeriv65

v0.1.0

d5f5f4a

v0.1.0: Legacy Logic Alignment & Baseline Freeze Latest

Release v0.1.0 - Baseline Alignment Verified

We are excited to announce the initial release of EasyLocomo. The primary focus of this version is to provide a streamlined, easy-to-use interface for the LoCoMo benchmark while maintaining strict logical and data consistency with the original official repository.

🎯 Baseline Alignment Verification

We have conducted extensive testing using gpt-4o-mini to verify that EasyLocomo produces results consistent with the original author's implementation. The minor differences observed are primarily due to the non-deterministic nature of LLM outputs and randomized option ordering in specific categories (Category 5).

Performance Comparison (Macro F1)

Note on Reproducibility: Due to the use of unordered set containers in the original implementation (introducing prompt-level randomness), and the inherent limitations of the legacy F1 scoring logic—which fails to recognize semantically equivalent but phrased-differently responses—re-running the evaluation even with identical models and code typically results in a variance of up to 5%. Consequently, exact bit-level parity is mathematically unattainable, but macro-statistical alignment has been achieved.

The following table compares the F1 scores between the official LoCoMo logic (Original) and the EasyLocomo implementation:

QA Category	Original (Official) F1	EasyLocomo F1	Difference
Temporal	0.3439	0.3551	+0.0112
Single-hop	0.2594	0.2885	+0.0291
Multi-hop	0.3808	0.3935	+0.0127
Open-domain	0.6231	0.6202	-0.0029
Adversarial	0.1883	0.1794	-0.0089
Overall Accuracy	0.4284	0.4301	+0.0017

📦 Release Attachments

For full transparency, the following 6 JSON files containing raw predictions and statistical summaries are included in the release assets:

new_res.json: Raw predictions from EasyLocomo.
new_res_stats.json: Detailed per-question metrics for EasyLocomo.
new_res_summary.json: Aggregated performance summary for EasyLocomo.
old_res.json: Raw predictions from the original official code.
old_res_stats.json: Detailed per-question metrics for the original code.
old_res_summary.json: Aggregated performance summary for the original code.

🚀 Key Improvements in v0.1.0

Streamlined Workflow: Unified environment management via uv.
OpenAI Standard: Support for all OpenAI-compatible API endpoints.
Robustness: Integrated breakpoint resumption and JSON-mode parsing error handling.
Cost Control: Built-in token estimation utility.

Assets 8

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: playeriv65/EasyLocomo

v0.1.0: Legacy Logic Alignment & Baseline Freeze

Choose a tag to compare

Sorry, something went wrong.