Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Sharing an improvement: a High Customizable Text Extractor. #607

sbihaiko started this conversation in Show and tell
Discussion options

Hey Guys!

Below, you will find an attached file that facilitates the overriding of the extraction method during the customization of a new pipeline. Initially developed for personal use, I believe it might be beneficial for you as well. Here is an illustrative example:

var mbuilder = new MemoryClientBuilder();
var memory = mbuilder.Build();
var orchestrator = mbuilder.GetOrchestrator();
// Replacing the default MsWordDecoder
var textExtractor = new TextExtractionHandler("extraction", orchestrator);
textExtractor.AddExtractor(
 (pipeline, file, content, ctoken) => { 
 // return new MsWordDecoder().DocToText(content); 
 return new MyDecoder().DocToText(content); 
 },
 MimeTypes.MsWord
);

Best Regards,
Sandro Bihaiko.

TextExtractionHandler.cs.txt

You must be logged in to vote

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant
Converted from issue

This discussion was converted from issue #59 on June 05, 2024 02:59.

AltStyle によって変換されたページ (->オリジナル) /