Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

StabRise

Document processing solutions

Hi there 👋

StabRise - Document Processing Solutions

Our projects

PDF DataSource for the Apache Spark

Spark Pdf


Source Code: https://github.com/StabRise/spark-pdf

Home page: https://stabrise.com/spark-pdf/

Quick Start Jupyter Notebook: https://github.com/StabRise/spark-pdf/blob/main/examples/PdfDataSource.ipynb


The project provides a custom data source for the Apache Spark that allows you to read PDF files into the Spark DataFrame.

Key features:

  • Read PDF documents to the Spark DataFrame
  • Support read PDF files lazy per page
  • Support big files, up to 10k pages
  • Support scanned PDF files (call OCR)
  • No need to install Tesseract OCR, it's included in the package

ScaleDP

ScaleDP


Source Code: https://github.com/StabRise/scaledp

Home page: https://stabrise.com/scaledp/

Quick Start Jupyter Notebook: https://github.com/StabRise/ScaleDP-Tutorials/blob/master/1.QuickStart.ipynb


ScaleDP is an Open-Source Library for processing documents using Apache Spark.

Key features:

  • Load PDF documents/Images
  • Extract text from PDF documents/Images
  • Extract images from PDF documents
  • OCR Images/PDF documents
  • Run NER on text extracted from PDF documents/Images
  • Visualize NER results

PDF Redaction

pdf-redaction

Home page: https://pdf-redaction.com

Free AI-powered tool for redact PDF files (remove sensitive information) online.

pdf-redaction

Pinned Loading

  1. spark-pdf spark-pdf Public

    PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it

    Scala 78 4

  2. ScaleDP ScaleDP Public

    ScaleDP is an Open-Source extension of Apache Spark for Document Processing

    Python 17 1

  3. ScaleDP-Tutorials ScaleDP-Tutorials Public

    Tutorials for ScaleDP library. ScaleDP is an Open-Source Library for Processing Documents in Apache Spark.

    Jupyter Notebook 5

Repositories

Loading
Type
Select type
Language
Select language
Sort
Select order
Showing 8 of 8 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading...

Most used topics

Loading...

AltStyle によって変換されたページ (->オリジナル) /