Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Extract and analyze bank or payment transaction data from PDF statements — all in one unified CLI tool.

Notifications You must be signed in to change notification settings

ibnu-umer/finance-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

40 Commits

Repository files navigation

🧾 Finance Parser

Extract and analyze bank or payment transaction data from PDF statements — an all-in-one CLI tool.
The Finance Parser reads PDFs (GPay, Canara Bank, etc.), extracts structured transaction details, and exports them to CSV or JSON for easy analysis or integration.

🚀 Features

  • ⚙️ Multi-bank support (GPay, Canara, and extendable to others)
  • 📄 Smart PDF parsing using Camelot / pdfplumber
  • 🧩 CLI tool for easy automation
  • 🧹 Data normalization & cleaning
  • 📊 Exports to CSV and JSON
  • 🔒 Fully offline — no external APIs required

🏗️ Project Structure

finance-parser/
├── src/
│ └── finance_parser/
│ ├── __init__.py
│ ├── __main__.py # CLI entry point
│ ├── main.py # Core logic
│ ├── canara_parser.py # Bank-specific parsers
│ ├── gpay_parser.py
│ └── utils/ # Shared helpers
├── media/
│ └── sample_statement.pdf # Example input
├── output/
│ ├── transactions.csv
│ └── transactions.json
├── pyproject.toml # Build system & CLI entry config
├── requirements.txt
└── README.md

⚙️ Setup

1️⃣ Clone the repo

git clone https://github.com/ibnu-umer/finance-parser.git
cd finance-parser

2️⃣ Install dependencies

pip install -r requirements.txt

3️⃣ Add your statement PDF

Place your bank statement (e.g., GPay, Canara) inside the media/ folder.

🧩 Usage

Basic Command

python -m finance_parser --file "media/canara_statement.pdf" --type canara --format csv

Or, if installed as a package:

finance-parser --file "media/canara_statement.pdf" --type canara --format csv

⚙️ CLI Options

Flag Description Example
-f, --file Path to PDF file --file media/canara_statement.pdf
-t, --type Bank/statement type (gpay, canara, etc.) --type canara
-o, --output Output folder --output output/
--format Output format (csv, json, or both) --format both
-p, --privacy Processing mode (raw, clean, or masked) --privacy clean

Example:

finance-parser --file media/canara_statement.pdf --type canara --format both --privacy masked

🧠 How It Works

  1. Detects and reads statement text using Camelot or pdfplumber.
  2. Chooses the correct parser based on --type.
  3. Extracts structured transaction data (date, description, debit/credit, balance, etc.).
  4. Applies normalization, masking, or cleaning if requested.
  5. Outputs the data in CSV or JSON formats.

🧰 Dependencies

  • camelot-py / pdfplumber – PDF parsing
  • pandas – Data manipulation
  • argparse – Command-line interface
  • re – Regex-based parsing

Install manually if needed:

pip install camelot-py pdfplumber pandas

🧼 Output

GPay

  • date – Transaction date
  • time – Transaction time
  • type – Credit/Debit
  • payee – Counterparty / Payee name
  • txn_id – UPI Transaction ID
  • account – Account
  • amount – Transaction amount

Canara

  • date – Transaction date
  • time – Transaction time
  • txn_type – Credit/Debit
  • mode – UPI, NEFT, IMPS, etc.
  • txn_id – Transaction ID (for UPI/IMPS)
  • bank_code – 4-letter bank code
  • payee – Counterparty / Payee name
  • upi_id – UPI ID if available
  • amount – Transaction amount
  • balance – Account balance after transaction
  • cheque_no – Cheque number if present

🥧 Sensitive Fields

Some transaction fields contain sensitive information. These are handled differently depending on the output mode.

Sensitive Fields by Source

  • Canara Bank

    • upi_id
    • txn_id
    • cheque_no
  • GPay

    • txn_id

Output Modes

  1. Raw

    • All columns are included.
    • Sensitive fields are not masked.
  2. Masked

    • All columns are included.
    • Sensitive fields are masked (partial hiding of UPI IDs, txn IDs, cheque numbers).
  3. Clean

    • All sensitive fields are dropped from the output.
    • Only non-sensitive columns remain.

This ensures privacy while maintaining flexibility for analysis.

About

Extract and analyze bank or payment transaction data from PDF statements — all in one unified CLI tool.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

AltStyle によって変換されたページ (->オリジナル) /