Not actually scraping/no response · ScrapeGraphAI/Scrapegraph-ai · Discussion #507

Chris-421
Aug 6, 2024

Hi so i am starting out with this project to scrape some data from the following website: jumbo.com. however, I am not getting the response. the code is basically this tutorial and only adding headless: False and changing both the link and prompt.

from scrapegraphai.graphs import SmartScraperGraph
graph_config = {
 "llm": {
 "model": "ollama/llama3.1",
 "temperature": 0,
 "format": "json",
 "base_url": "http://localhost:11434",
 },
 "embeddings": {
 "model": "ollama/nomic-embed-text",
 "temperature": 0,
 "base_url": "http://localhost:11434",
 },
 "verbose": True,
 "headless": False,
 "loader_kwargs": {
 "proxy" : {
 "server": "broker",
 "criteria": {
 "anonymous": True,
 "secure": True,
 "countryset": {"IT"},
 "timeout": 10.0,
 "max_shape": 3
 },
 },
 },
}
# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
 prompt="list me all categories of the products and corresponding links to these categories",
 source="https://www.jumbo.com/producten/",
 config=graph_config,
)
# Run the scraper graph
result = smart_scraper_graph.run()
print("Scraper Result:", result)
graph_exec_info = smart_scraper_graph.get_execution_info()
print(graph_exec_info)

This however does not generate the expected response. instead of an expected list of product categories and weblinks i get:
--- Executing Fetch Node ---
--- (Fetching HTML from: https://www.jumbo.com/producten/) ---
--- Executing Parse Node ---
--- Executing GenerateAnswer Node ---
Processing chunks: 0%| | 0/1 [00:28<?, ?it/s]
Scraper Result: {'type': 'accordion', 'title': 'Openingstijden', 'content': 'https://www.jumbo.com/winkels'}
exec_info: [{'node_name': 'Fetch', 'total_tokens': 0, 'prompt_tokens': 0, 'completion_tokens': 0, 'successful_requests': 0, 'total_cost_USD': 0.0, 'exec_time': 83.95596218109131}, {'node_name': 'Parse', 'total_tokens': 0, 'prompt_tokens': 0, 'completion_tokens': 0, 'successful_requests': 0, 'total_cost_USD': 0.0, 'exec_time': 0.00398707389831543}, {'node_name': 'GenerateAnswer', 'total_tokens': 0, 'prompt_tokens': 0, 'completion_tokens': 0, 'successful_requests': 0, 'total_cost_USD': 0.0, 'exec_time': 28.795868158340454}, {'node_name': 'TOTAL RESULT', 'total_tokens': 0, 'prompt_tokens': 0, 'completion_tokens': 0, 'successful_requests': 0, 'total_cost_USD': 0.0, 'exec_time': 112.75581741333008}]

I also tested the tutorial itself (aka the original prompt and link) which only results in one video with title:
Scraper Result: {'type': 'video', 'title': 'Tech Support: Pyrotechnician Answers Fireworks Questions From Twitter', 'description': 'WIRED is where tomorrow is realized. It is the essential source of information and ideas that make sense of a world in constant transformation.', 'url': 'https://www.wired.com/video/watch/tech-support-pyrotechnician-answers-fireworks-questions-from-twitter'} with a similar exec info with 0 tokens.
What am i doing wrong? Is the code incorrect or is my llm setup not working or what?
PS: from similar discussion i found out it might be due to blockers, so i tried other sites, including wikipedea. however the results were still not matching the prompt or the tutorial's. additionally these blockers should theoretically be circumvented using the proxy and headless: False right?

Replies: 1 comment

VinciGit00
Sep 19, 2024
Maintainer

ok please update to the new version

0 replies

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Not actually scraping/no response #507

Uh oh!

{{title}}

Uh oh!

Chris-421
Aug 6, 2024

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

VinciGit00
Sep 19, 2024
Maintainer

Select a reply

Uh oh!

Uh oh!

Not actually scraping/no response #507

Uh oh!

Chris-421 Aug 6, 2024

Replies: 1 comment

Uh oh!

VinciGit00 Sep 19, 2024 Maintainer

Chris-421
Aug 6, 2024

VinciGit00
Sep 19, 2024
Maintainer