Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: playeriv65/EasyLocomo

v0.1.0: Legacy Logic Alignment & Baseline Freeze

07 Jan 05:13
@playeriv65 playeriv65

Choose a tag to compare

Release v0.1.0 - Baseline Alignment Verified

We are excited to announce the initial release of EasyLocomo. The primary focus of this version is to provide a streamlined, easy-to-use interface for the LoCoMo benchmark while maintaining strict logical and data consistency with the original official repository.

🎯 Baseline Alignment Verification

We have conducted extensive testing using gpt-4o-mini to verify that EasyLocomo produces results consistent with the original author's implementation. The minor differences observed are primarily due to the non-deterministic nature of LLM outputs and randomized option ordering in specific categories (Category 5).

Performance Comparison (Macro F1)

Note on Reproducibility: Due to the use of unordered set containers in the original implementation (introducing prompt-level randomness), and the inherent limitations of the legacy F1 scoring logic—which fails to recognize semantically equivalent but phrased-differently responses—re-running the evaluation even with identical models and code typically results in a variance of up to 5%. Consequently, exact bit-level parity is mathematically unattainable, but macro-statistical alignment has been achieved.

The following table compares the F1 scores between the official LoCoMo logic (Original) and the EasyLocomo implementation:

QA Category Original (Official) F1 EasyLocomo F1 Difference
Temporal 0.3439 0.3551 +0.0112
Single-hop 0.2594 0.2885 +0.0291
Multi-hop 0.3808 0.3935 +0.0127
Open-domain 0.6231 0.6202 -0.0029
Adversarial 0.1883 0.1794 -0.0089
Overall Accuracy 0.4284 0.4301 +0.0017

📦 Release Attachments

For full transparency, the following 6 JSON files containing raw predictions and statistical summaries are included in the release assets:

  1. new_res.json: Raw predictions from EasyLocomo.
  2. new_res_stats.json: Detailed per-question metrics for EasyLocomo.
  3. new_res_summary.json: Aggregated performance summary for EasyLocomo.
  4. old_res.json: Raw predictions from the original official code.
  5. old_res_stats.json: Detailed per-question metrics for the original code.
  6. old_res_summary.json: Aggregated performance summary for the original code.

🚀 Key Improvements in v0.1.0

  • Streamlined Workflow: Unified environment management via uv.
  • OpenAI Standard: Support for all OpenAI-compatible API endpoints.
  • Robustness: Integrated breakpoint resumption and JSON-mode parsing error handling.
  • Cost Control: Built-in token estimation utility.
Assets 8
Loading

AltStyle によって変換されたページ (->オリジナル) /