An AI-powered research assistant for collecting and analyzing academic papers from ArXiv and Semantic Scholar. Built with async Python, FAISS vector search, and LLM integration for intelligent paper analysis.
- π Multi-source paper collection from ArXiv and Semantic Scholar APIs
- β‘ Async/await architecture for efficient concurrent API calls
- π¦ Intelligent rate limiting with adaptive backoff strategies
- π§ LLM-powered analysis for extracting insights from papers
- π Vector similarity search using FAISS for finding related papers
- π₯οΈ Rich CLI interface with colorful tables and progress tracking
# Clone the repository git clone https://github.com/davidburton/ResearchAssistantAgent.git cd ResearchAssistantAgent # Create and activate virtual environment python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install in development mode pip install -e .
Search for papers across both ArXiv and Semantic Scholar:
# Basic search research-assistant search "transformer neural networks" # Search only ArXiv research-assistant search "quantum computing" --source arxiv --limit 20 # Search by author research-assistant advanced-search --author "Yoshua Bengio" --limit 10 # Search by category (ArXiv) research-assistant advanced-search --category cs.AI --limit 15 # Store results in vector database (requires OpenAI API key for embeddings) research-assistant search "large language models" --store
import asyncio from research_assistant import ArxivCollector, SemanticScholarCollector async def search_papers(): # Search ArXiv async with ArxivCollector() as arxiv: papers = await arxiv.search("cat:cs.LG transformer", max_results=5) for paper in papers: print(f"{paper.title} - {paper.arxiv_id}") # Search Semantic Scholar async with SemanticScholarCollector() as s2: papers = await s2.search("deep learning", limit=5) for paper in papers: print(f"{paper.title} - Citations: {paper.citation_count}") asyncio.run(search_papers())
The project follows a modular architecture:
src/research_assistant/
βββ collectors/ # API clients for paper sources
β βββ arxiv_collector.py
β βββ semantic_scholar_collector.py
βββ analyzers/ # LLM-based paper analysis
β βββ paper_analyzer.py
βββ vector_store/ # FAISS similarity search
β βββ faiss_store.py
βββ utils/ # Rate limiting and helpers
βββ rate_limiter.py
Set environment variables for API keys:
export OPENAI_API_KEY="your-api-key" # For paper analysis and embeddings export SEMANTIC_SCHOLAR_API_KEY="your-key" # Optional, for higher rate limits
# Install development dependencies pip install -r requirements-dev.txt # Run tests pytest # Format code black src/ tests/ # Type checking mypy src/
The tool respects API rate limits:
- ArXiv: Max 3 requests/second (configurable)
- Semantic Scholar: 100 requests per 5 minutes (anonymous)
from research_assistant import RateLimiter, AdaptiveRateLimiter # Fixed rate limiting limiter = RateLimiter(max_calls=10, time_window=60) # 10 calls per minute # Adaptive rate limiting (adjusts based on server responses) adaptive = AdaptiveRateLimiter( initial_rate=10.0, min_rate=1.0, max_rate=50.0, backoff_factor=0.5 ) # Use with async context manager async with limiter: # Your API call here pass
from research_assistant import PaperAnalyzer, AnalysisType analyzer = PaperAnalyzer(api_key="your-openai-key") # Analyze a paper analysis = await analyzer.analyze_paper( paper_text="Paper abstract or full text...", paper_id="arxiv.2301.00001", paper_title="Attention Is All You Need", analysis_type=AnalysisType.METHODOLOGY ) print(analysis.methodology) print(analysis.key_contributions)
from research_assistant import FAISSVectorStore, Document # Initialize vector store store = FAISSVectorStore(dimension=1536, index_type="flat") # Add documents doc = Document( id="paper_001", text="Paper content...", metadata={"title": "Paper Title", "authors": ["Author 1"]}, embedding=[0.1, 0.2, ...] # 1536-dimensional vector ) store.add_documents([doc]) # Search similar documents results = store.search(query_embedding, k=10) # Save and load store.save("./my_index") loaded_store = FAISSVectorStore.load("./my_index")
Run the test suite:
# Run all tests pytest # Run with coverage pytest --cov=research_assistant tests/ # Run specific test file pytest tests/unit/utils/test_rate_limiter.py
This is an actively developed research tool. Current focus areas:
- β Core API collectors (ArXiv, Semantic Scholar)
- β Rate limiting and async architecture
- β FAISS vector store integration
- β CLI interface
- π§ Full paper content extraction
- π§ Advanced LLM analysis pipelines
- π Web UI dashboard
- π Citation graph analysis
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - see LICENSE file for details.
Built with: