-
Notifications
You must be signed in to change notification settings - Fork 11.2k
Fix: Resolve Windows clone failure from invoice directory with trailing space #2071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
+1,203
−4,608
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@AlbMej
AlbMej
force-pushed
the
main
branch
3 times, most recently
from
August 20, 2025 00:55
3f59370
to
2a0b273
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR deletes the
extracted_invoice_json
directory and removes the trailing space from the'extracted_invoice_json '
directory. This fixes an invalid path error when git cloning on Windows machines:invalid_path_git_clone
Motivation
These changes are necessary to allow Window users to successfully clone the repo on their machine. This fixes 2 open issues (#1934, #1837), improving the quality of the repository's file structure and reducing confusion. Both directories contain the same filenames but the files themselves differ slightly due to the non-deterministic nature of LLMs. A typo must've been made when creating the cookbook and then re-run without the space, creating the two directories. Due to the differing files, we do an analysis to find the correct directory.
Correctness
The choice of which directory to delete is based on it's associated cookbook and notebook. Using the json showed at the end of part 1, we can determine the referenced filename using a simple search (leading us to
premierinn_GABCI19014325_extracted.json
). From there, we can determine which of the 2 directories is the one used for the cookbook. We check the filename in both directories and which ever one matches is the correct one, leading to the deletion of the non-matching directory.We also see that the correct directory name is specified in the notebook with
extracted_invoice_json_path = "./data/hotel_invoices/extracted_invoice_json"
. So we remove the trailing space from the remaining directory. This analysis can be found in this notebook.