-
Notifications
You must be signed in to change notification settings - Fork 839
crop text in middle instead of decimate it #809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe the entire else
block should be removed?
@Copilot
Copilot
AI
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the text cropping functionality to remove excess text from the middle rather than discarding every N sentence, while also adjusting token limit calculations and separator characters for a more natural output. Key changes include:
- Modifying cropText to split text by an expanded set of delimiters and crop from the middle with an appended disclaimer.
- Updating the token limit logic and default configurations for extended model contexts.
- Adjusting the YouTube site adapter to use newline as a separator and refining the summary prompt.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
src/utils/crop-text.mjs | Revised crop text function: new cropping algorithm, delimiter update, and token limit adjustments. |
src/content-script/site-adapters/youtube/index.mjs | Updated subtitle extraction and summary prompt adjustments. |
src/config/index.mjs | Increased default maxResponseTokenLength to accommodate longer texts. |
Copilot
AI
May 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider explicitly converting the result of toFixed to a number (e.g., using Number(...)) to avoid any unexpected type coercion issues.
Copilot uses AI. Check for mistakes.
Copilot
AI
May 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the input text is short and the first pass never exceeds the halfTargetTokens, middleIndex will remain -1, which may lead to unexpected slicing. Consider adding a condition to bypass cropping when the text length is within limits.
Copilot uses AI. Check for mistakes.
/review
Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.
PR Reviewer Guide 🔍
Here are some key observations to aid the review process:
🎫 Ticket compliance analysis 🔶
800 - Partially compliant
Compliant requirements:
• Fix the "decimation" approach to text cropping which creates garbage text
Non-compliant requirements:
• Fix the issue where the extension uses subtitles instead of transcript on YouTube
• Ensure the extension can handle uncommon words that are missed in subtitles
Requires further human verification:
• Improve the quality of text used for summarization - The PR changes the cropping method but still uses subtitles, so verification is needed to see if the summary quality has improved despite still using subtitles
Percentage Calculation
The percentage calculation for the removed tokens might be incorrect. The expression (croppedTokens / totalTokens).toFixed(2) * 100
first converts to a string with 2 decimal places, then multiplies by 100, which could lead to unexpected results.
(croppedTokens / totalTokens).toFixed(2) * 100
first converts to a string with 2 decimal places, then multiplies by 100, which could lead to unexpected results.croppedText += `\n\n**Important disclaimer**, this text is incomplete! ${croppedTokens} or ${ (croppedTokens / totalTokens).toFixed(2) * 100 }% of tokens have been removed from this location in the text due to lack limited model context of ${maxLength}\n\n`
Incomplete Implementation
The PR addresses the cropping method but doesn't solve the core issue from the ticket - it still uses subtitles instead of transcript data. The PR improves how text is cropped but doesn't change the data source.
const splits = text.split(/[,,。??!!;;\n]/).map((s) => s.trim()) const splitsLength = splits.map((s) => (tiktoken ? encode(s).length : s.length)) const cropTargetLength = maxLength - startLength - endLength let firstHalfTokens = 0 let secondHalfTokens = 0 const halfTargetTokens = Math.floor(cropTargetLength / 2) let middleIndex = -1 let endStartIndex = splits.length let totalTokens = splitsLength.reduce((sum, length) => sum + length + 1, 0) let croppedTokens = 0 let croppedText = '' let currentLength = 0 // First pass: find the middle for (let i = 0; i < splits.length; i++) { if (firstHalfTokens < halfTargetTokens) { firstHalfTokens += splitsLength[i] + 1 } else { middleIndex = i break } } // Second pass: find the start of the end section for (let i = splits.length - 1; i >= middleIndex; i--) { secondHalfTokens += splitsLength[i] + 1 if (secondHalfTokens >= halfTargetTokens) { endStartIndex = i break } } // Calculate cropped tokens croppedTokens = totalTokens - firstHalfTokens - secondHalfTokens // Construct the cropped text croppedText = splits.slice(0, middleIndex).join('\n') if (croppedTokens > 0) { croppedText += `\n\n**Important disclaimer**, this text is incomplete! ${croppedTokens} or ${ (croppedTokens / totalTokens).toFixed(2) * 100 }% of tokens have been removed from this location in the text due to lack limited model context of ${maxLength}\n\n` } croppedText += splits.slice(endStartIndex).join('\n')
/improve
Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here. PR Code Suggestions ✨
|
5d81453
to
ffad115
Compare
Uh oh!
There was an error while loading. Please reload this page.
A few problems:
So I've
#800