-
Notifications
You must be signed in to change notification settings - Fork 229
feat(cli): import existing PageIndex Cloud indices via add --from-pageindex-cloud (closes #88)#97
Open
KylinMountain wants to merge 7 commits into
Open
feat(cli): import existing PageIndex Cloud indices via add --from-pageindex-cloud (closes #88) #97KylinMountain wants to merge 7 commits into
add --from-pageindex-cloud (closes #88) #97KylinMountain wants to merge 7 commits into
Conversation
@KylinMountain
KylinMountain
force-pushed
the
feat/import-pageindex-cloud
branch
from
June 14, 2026 11:23
f2d427e to
127e75e
Compare
@KylinMountain
KylinMountain
changed the base branch from
fix/doc-name-collision
to
main
June 14, 2026 11:23
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
Summary
Implements #88 — import a document that is already indexed in PageIndex Cloud into a local OpenKB knowledge base, with no local PDF and no re-processing:
doc_id(get_document/get_page_content), bypassing the localconvert → raw → page-count → col.addpipeline entirely._write_long_doc_artifacts) and compiles concepts viacompile_long_doc.pageindex_cloudwith a synthetic identity keypageindex-cloud:<doc_id>(sha256-keyed; re-import of the same doc-id is idempotent/skipped).openkb addruns under the exclusive KB lock, so the cloud-import path is serialized like every other ingest.openkb removeon an imported doc cleans up local artifacts only — the user's cloud corpus is never touched (the existinglong_pdfgate already excludes the new type; proven by a regression test).pageindexswitched to trackgit+https://github.com/VectifyAI/PageIndex.git@devfor the cloud API surface (lock pins a concrete commit). Follow-up: re-pin to an exact tag once a new PageIndex release is published.Notes
doc_namegroundwork (resolve_doc_name, registry path identity), now inmain. Rebased ontomainafter fix: collision-resistant doc_name — same-stem documents no longer overwrite each other #96 merged; base ismain.images: []). Concept compilation reads the summary, so output quality is unaffected.Test Plan
Automated (full suite green — 742 passed on the rebased branch):
uv run --extra dev pytest -quv run --extra dev pytest tests/test_converter.py tests/test_indexer.py tests/test_add_command.py tests/test_remove.py -vManual (requires
PAGEINDEX_API_KEYand a real cloud doc-id):export PAGEINDEX_API_KEY=...thenopenkb add --from-pageindex-cloud <DOC_ID>→ wiki/summaries + wiki/sources written, concepts compiled, doc appears inopenkb list.[SKIP] Already imported(idempotent).openkb add foo.pdf --from-pageindex-cloud X→ errors "not both";openkb addwith neither → "Provide a PATH...".PAGEINDEX_API_KEYthen import → clear error, nothing written.openkb remove <doc>on the imported doc → local artifacts removed, cloud doc still present in PageIndex.