Name	Name	Last commit message	Last commit date
Latest commit History 1 Commit
alembic	alembic
app	app
tests	tests
.env.example	.env.example
.gitignore	.gitignore
Dockerfile	Dockerfile
README.md	README.md
TEST_GUIDE.md	TEST_GUIDE.md
TEST_REPORT.md	TEST_REPORT.md
alembic.ini	alembic.ini
docker-compose	docker-compose
docker-compose.yml	docker-compose.yml
requirements.txt	requirements.txt
run_tests.py	run_tests.py

InfoHub - Personal Information Aggregation System

A robust backend system for aggregating and processing articles from various sources (RSS, Zhihu, etc.) with AI-powered summarization and tagging.

Tech Stack

Backend: Python 3.10, FastAPI
Database: PostgreSQL
Cache/Queue: Redis, Celery
ORM: SQLAlchemy
AI: LangChain, OpenAI
Crawling: feedparser, httpx, trafilatura
Containerization: Docker, docker-compose

Project Structure

InfoHub/
├── app/
│ ├── main.py # FastAPI entry point
│ ├── core/ # Configuration & dependencies
│ ├── db/ # Database setup
│ ├── models/ # SQLAlchemy models
│ ├── schemas/ # Pydantic schemas
│ ├── crud/ # Database operations
│ ├── crawlers/ # Source crawlers
│ ├── services/ # AI processing
│ ├── workers/ # Celery tasks
│ └── utils/ # Utilities
├── docker-compose.yml
├── Dockerfile
├── requirements.txt
└── .env.example

Quick Start

1. Environment Setup

Copy the example environment file and configure your settings:

cp .env.example .env

Edit .env and add your OpenAI API key:

DATABASE_URL=postgresql+psycopg2://user:password@localhost:5432/infohub
REDIS_URL=redis://localhost:6379/0
OPENAI_API_KEY=sk-your-actual-api-key
SECRET_KEY=your-secret-key-here
ENVIRONMENT=dev

2. Docker Deployment (Recommended)

Start all services with docker-compose:

docker-compose up -d

This will start:

Web: FastAPI server on http://localhost:8000
Worker: Celery background worker
DB: PostgreSQL on port 5432
Redis: Redis on port 6379

3. Local Development

Install dependencies:

pip install -r requirements.txt

Start PostgreSQL and Redis (or use docker-compose for just these services):

docker-compose up -d db redis

Run the FastAPI server:

uvicorn app.main:app --reload

Run the Celery worker (in a separate terminal):

celery -A app.workers.tasks worker --loglevel=info

API Endpoints

Health Check

GET http://localhost:8000/

Sources

Create Source: POST /api/v1/sources

{
 "platform": "rss",
 "identity": "https://example.com/rss",
 "is_active": true
}

List Sources: GET /api/v1/sources
Get Source: GET /api/v1/sources/{source_id}
Trigger Crawl: POST /api/v1/sources/{source_id}/crawl

Articles

List Articles: GET /api/v1/articles?status=processed
Get Article: GET /api/v1/articles/{article_id}
Create Article: POST /api/v1/articles

Quick Crawl

GET http://localhost:8000/api/v1/crawl?source_url=https://example.com/rss&platform=rss

Features

Multi-Platform Crawling: Support for RSS feeds and extensible to other platforms
AI Processing: Automatic summarization, tagging, and quality scoring
Async Task Queue: Celery-based background processing
Duplicate Detection: Prevents storing duplicate articles
Clean Content: HTML to Markdown conversion for better readability
RESTful API: Well-structured API with FastAPI
Database Migrations: SQLAlchemy ORM with PostgreSQL

Database Models

Article

id: Primary key
title: Article title
author: Author name
source_url: Original URL (unique)
content: Cleaned Markdown content
summary: AI-generated summary
tags: AI-extracted tags
ai_score: Quality score (0-10)
status: Processing status (pending, processed, failed)
created_at: Creation timestamp
updated_at: Last update timestamp

Source

id: Primary key
platform: Platform type (rss, zhihu, etc.)
identity: URL or user identifier
is_active: Active status
last_crawled_at: Last crawl timestamp

Development

Running Tests

pytest

Database Migrations

Using Alembic (to be set up):

alembic init alembic
alembic revision --autogenerate -m "Initial migration"
alembic upgrade head

Troubleshooting

Worker not processing tasks

Check Redis connection: docker-compose logs redis
Check worker logs: docker-compose logs worker
Verify tasks are registered: celery -A app.workers.tasks inspect registered

Database connection issues

Ensure PostgreSQL is running: docker-compose logs db
Check DATABASE_URL in .env matches docker-compose configuration

AI processing fails

Verify OPENAI_API_KEY is set correctly
Check API quota/billing status
Review worker logs for detailed error messages

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Folders and files

Latest commit

History

Repository files navigation

InfoHub - Personal Information Aggregation System

Tech Stack

Project Structure

Quick Start

1. Environment Setup

2. Docker Deployment (Recommended)

3. Local Development

API Endpoints

Health Check

Sources

Articles

Quick Crawl

Features

Database Models

Article

Source

Development

Running Tests

Database Migrations

Troubleshooting

Worker not processing tasks

Database connection issues

AI processing fails

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages