-
-
Notifications
You must be signed in to change notification settings - Fork 34
Comments
🐣 add tools for reflowing the transcript into one paragraph per sentence / speaker#510
🐣 add tools for reflowing the transcript into one paragraph per sentence / speaker #510
Conversation
into one paragraph per sentence / speaker
134df4c to
c7a9403
Compare
79e9045 to
6fad5ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have tests for the transformations in this file? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a warning to these if applied to a document that is in a non latin style language?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This says ~50 tokens but divides by 100, this seems contradictory, or am I missing something?
Also the magic paragraph length could probably be a constant that is used here and for the <= 100 further up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it get reduces with every additional token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the JSON dance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing it this way totally fucks up collaborative editing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be gated on data?.can_write, no?
No description provided.