Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

flamingol1/infohub

Repository files navigation

InfoHub - Personal Information Aggregation System

A robust backend system for aggregating and processing articles from various sources (RSS, Zhihu, etc.) with AI-powered summarization and tagging.

Tech Stack

  • Backend: Python 3.10, FastAPI
  • Database: PostgreSQL
  • Cache/Queue: Redis, Celery
  • ORM: SQLAlchemy
  • AI: LangChain, OpenAI
  • Crawling: feedparser, httpx, trafilatura
  • Containerization: Docker, docker-compose

Project Structure

InfoHub/
├── app/
│ ├── main.py # FastAPI entry point
│ ├── core/ # Configuration & dependencies
│ ├── db/ # Database setup
│ ├── models/ # SQLAlchemy models
│ ├── schemas/ # Pydantic schemas
│ ├── crud/ # Database operations
│ ├── crawlers/ # Source crawlers
│ ├── services/ # AI processing
│ ├── workers/ # Celery tasks
│ └── utils/ # Utilities
├── docker-compose.yml
├── Dockerfile
├── requirements.txt
└── .env.example

Quick Start

1. Environment Setup

Copy the example environment file and configure your settings:

cp .env.example .env

Edit .env and add your OpenAI API key:

DATABASE_URL=postgresql+psycopg2://user:password@localhost:5432/infohub
REDIS_URL=redis://localhost:6379/0
OPENAI_API_KEY=sk-your-actual-api-key
SECRET_KEY=your-secret-key-here
ENVIRONMENT=dev

2. Docker Deployment (Recommended)

Start all services with docker-compose:

docker-compose up -d

This will start:

  • Web: FastAPI server on http://localhost:8000
  • Worker: Celery background worker
  • DB: PostgreSQL on port 5432
  • Redis: Redis on port 6379

3. Local Development

Install dependencies:

pip install -r requirements.txt

Start PostgreSQL and Redis (or use docker-compose for just these services):

docker-compose up -d db redis

Run the FastAPI server:

uvicorn app.main:app --reload

Run the Celery worker (in a separate terminal):

celery -A app.workers.tasks worker --loglevel=info

API Endpoints

Health Check

GET http://localhost:8000/

Sources

  • Create Source: POST /api/v1/sources

    {
     "platform": "rss",
     "identity": "https://example.com/rss",
     "is_active": true
    }
  • List Sources: GET /api/v1/sources

  • Get Source: GET /api/v1/sources/{source_id}

  • Trigger Crawl: POST /api/v1/sources/{source_id}/crawl

Articles

  • List Articles: GET /api/v1/articles?status=processed

  • Get Article: GET /api/v1/articles/{article_id}

  • Create Article: POST /api/v1/articles

Quick Crawl

GET http://localhost:8000/api/v1/crawl?source_url=https://example.com/rss&platform=rss

Features

  • Multi-Platform Crawling: Support for RSS feeds and extensible to other platforms
  • AI Processing: Automatic summarization, tagging, and quality scoring
  • Async Task Queue: Celery-based background processing
  • Duplicate Detection: Prevents storing duplicate articles
  • Clean Content: HTML to Markdown conversion for better readability
  • RESTful API: Well-structured API with FastAPI
  • Database Migrations: SQLAlchemy ORM with PostgreSQL

Database Models

Article

  • id: Primary key
  • title: Article title
  • author: Author name
  • source_url: Original URL (unique)
  • content: Cleaned Markdown content
  • summary: AI-generated summary
  • tags: AI-extracted tags
  • ai_score: Quality score (0-10)
  • status: Processing status (pending, processed, failed)
  • created_at: Creation timestamp
  • updated_at: Last update timestamp

Source

  • id: Primary key
  • platform: Platform type (rss, zhihu, etc.)
  • identity: URL or user identifier
  • is_active: Active status
  • last_crawled_at: Last crawl timestamp

Development

Running Tests

pytest

Database Migrations

Using Alembic (to be set up):

alembic init alembic
alembic revision --autogenerate -m "Initial migration"
alembic upgrade head

Troubleshooting

Worker not processing tasks

  • Check Redis connection: docker-compose logs redis
  • Check worker logs: docker-compose logs worker
  • Verify tasks are registered: celery -A app.workers.tasks inspect registered

Database connection issues

  • Ensure PostgreSQL is running: docker-compose logs db
  • Check DATABASE_URL in .env matches docker-compose configuration

AI processing fails

  • Verify OPENAI_API_KEY is set correctly
  • Check API quota/billing status
  • Review worker logs for detailed error messages

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

个人信息聚合平台 - 支持 RSS 和知乎内容抓取,AI 摘要和评分

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages

AltStyle によって変換されたページ (->オリジナル) /