DarkDk123/AI-Web-Scraper

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.example_env		.example_env
.gitignore		.gitignore
Readme.md		Readme.md
example.gif		example.gif
geckodriver		geckodriver
parse_LLM.py		parse_LLM.py
requirements.txt		requirements.txt
scrape.py		scrape.py
streamlit_main.py		streamlit_main.py

Repository files navigation

AI Web Scraper 🤖

An AI Web Scraper using LangChain, HuggingFace, selenium etc.

Usage

Install the required packages: pip install -r requirements.txt.
Set the environments variables as explained below.
Run the Streamlit app: streamlit run streamlit_main.py.
Enter a URL and a description of what you want to parse from the website.
The app will scrape the website, extract the relevant text, and use the HuggingFace model to parse the text.

Example: Scraping Github profiles

URL: https://github.com/techwithtim
query: Provide info about the Github profile

demo

Environment Variables

The AI Web Scraper uses the following environment variables:

HUGGINGFACE_MODEL_ID: The ID of the HuggingFace model to use for parsing the text.
UGGINGFACEHUB_API_TOKEN : HuggingFace Hub API token.
SBR_WEBDRIVER (Optional for captcha support): The URL of the Bright Data Webdriver to use for solving captchas.

Development

The AI Web Scraper is built using the following technologies:

streamlit: The web app framework.
langchain_huggingface: The library for using HuggingFace models in langchain.
langchain: Main langchain library.
selenium: The library for interacting with the browser.
bs4: The library for parsing HTML.

About

AI Web Scraper to scrape simple webpages using an LLM.

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DarkDk123/AI-Web-Scraper

Folders and files

Latest commit

History

Repository files navigation

AI Web Scraper 🤖

Usage

Example: Scraping Github profiles

Environment Variables

Development

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

DarkDk123/AI-Web-Scraper

Folders and files

Latest commit

History

Repository files navigation

AI Web Scraper 🤖

Usage

Example: Scraping Github profiles

Environment Variables

Development

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages