pulling data from USPSA and avoiding rate limits · CodeHowlerMonkey/hitfactor.info · Discussion #18

jrdoran
Feb 25, 2024

so I do something very similar here: http://www.steelrankings.com/
I too ran into the http429 issue. Cloudflare is checking IP; I ended up using zenrows.com. I process 17000 USPSA numbers for steel challenge classification rankings 4x / week. I use 25 concurrent threads and it averages about .07 per request; Happy to have more discussion. I did take a look at your data files and happy to help if you wanted to rebuild them on-demand.

Replies: 5 comments 2 replies

CodeHowlerMonkey
Feb 26, 2024
Maintainer

Hey this is awesome! I didn't even think of checking if there's a SaaS for scraping around rate limits.

I just use mobile API that doesn't have rate limiting (well it didn't back in January).

Looking how USPSA and SCSA mobile apps are very similar, maybe you can use their mobile api too and save money on the scraper (don't know if it costs you anything). Let me know if you need help with that.

Link you posted didn't open for me for some reason.

0 replies

jrdoran
Feb 26, 2024
Author

Could you paste a sample mobile api link ? I wish they had a swagger doc, but from what I see they ( USPSA / SCSA ) don't really have genuine api's ( maybe you can prove me wrong ).
Fat finger on my part, I typed the name of my own site wrong !
http://www.steelrankings.com/

1 reply

@CodeHowlerMonkey

CodeHowlerMonkey Feb 26, 2024
Maintainer

I used a sniffer on iOS app to see what it's hitting. Then I just reused same request from the browser snippets tab using this:

https://github.com/CodeHowlerMonkey/hitfactorlol/blob/main/scripts/uspsaScript.js

All hhfs come from single endpoint, so I just took that thin tight off the sniffer app (it can save files).

For clarifications and classifiers api urls are these:

https://api.uspsa.org/api/app/classification/A100099
https://api.uspsa.org/api/app/classifiers/A100099

jrdoran
Feb 26, 2024
Author

OK, this is getting good !
This is the page which is rate limited in SCSA ( likely USPSA also )
https://scsa.org/classification/FY105260
https://uspsa.org/classification/fy105260
So, I'm scraping via python url request and then parsing the DoM via Soup. I run this on both MacOS or AWS Linux depending on if I'm busy with my machine. It seems like you are running almost like a selenium approach in your browser. I'll need to play with the Sniffer to track the endpoints. I'm using HTTP Get w/o an API key ( I'm familiar with how to use them ) but I don't see how I get an API key to consume those mobile api endpoints ? Could you explain if there is an api I call to create my token ?

Here is my sample on HTTP Req ( I feed into this 17k urls which I retrieve from AWS RDS )

client_key = "xxx"
client = ZenRowsClient(client_key)

def make_request_with_retry(url, retries=9, backoff_factor=1.9, timeout=50):
for attempt in range(1, retries + 1):
try:
print(f"URL Request Attempt {attempt}/{retries} for {url}")
request_start_time = time.time() # Start timing the request

 #params = {"premium_proxy":"true"}
 #response = client.get(url, params=params)
 response = client.get(url)
 
 
 request_end_time = time.time() # End timing the request
 elapsed_time = request_end_time - request_start_time # Calculate elapsed time
 print(f"Elapsed Time for {url}: {elapsed_time:.2f} seconds")
 
 if response.status_code == 200:
 soup = BeautifulSoup(response.text, 'html.parser')
 result = get_expiration_date(soup)
 print("\t\n"+url+" expiration date ",result)
 
 return 
 else:
 print(f"Request for {url} returned a non-success status code: {response.status_code}")
 print(f"Elapsed Time for {url}: {elapsed_time:.2f} seconds")
 except requests.exceptions.RequestException as e:
 print(f"Request for {url} failed: {e}")
 print(f"Elapsed Time for {url}: {elapsed_time:.2f} seconds")
 time.sleep(backoff_factor * (2 ** (attempt - 1))) # Exponential backoff
 except Exception as e:
 print(f"An unexpected error occurred for {url}: {e}")
 print(f"Elapsed Time for {url}: {elapsed_time:.2f} seconds")
print(f"Maximum retry attempts reached for {url}. Request failed.")
return None

1 reply

@CodeHowlerMonkey

CodeHowlerMonkey Feb 26, 2024
Maintainer

The API key I took from the sniffer. For iOS I used this app: https://apps.apple.com/us/app/storm-sniffer-packet-capture/id1610958307

It comes with instructions how to install MITM certificate for sniffing https traffic

jrdoran
May 25, 2024
Author

Hey it looks like you are using Zenrows; how do you like it ? Have you taken any different directions as a result of SaaS scraping ?

0 replies

jrdoran
Jun 21, 2024
Author

Congrats on the launch of your site. your UI work is outstanding. I having been trying various scraping methods with and without zenrows.

0 replies

pulling data from USPSA and avoiding rate limits #18

Uh oh!

Uh oh!

jrdoran Feb 25, 2024

Replies: 5 comments · 2 replies

Uh oh!

CodeHowlerMonkey Feb 26, 2024 Maintainer

Uh oh!

jrdoran Feb 26, 2024 Author

Uh oh!

CodeHowlerMonkey Feb 26, 2024 Maintainer

Uh oh!

Uh oh!

jrdoran Feb 26, 2024 Author

Uh oh!

CodeHowlerMonkey Feb 26, 2024 Maintainer

Uh oh!

jrdoran May 25, 2024 Author

Uh oh!

jrdoran Jun 21, 2024 Author

jrdoran
Feb 25, 2024

Replies: 5 comments 2 replies

CodeHowlerMonkey
Feb 26, 2024
Maintainer

jrdoran
Feb 26, 2024
Author

CodeHowlerMonkey Feb 26, 2024
Maintainer

jrdoran
Feb 26, 2024
Author

CodeHowlerMonkey Feb 26, 2024
Maintainer

jrdoran
May 25, 2024
Author

jrdoran
Jun 21, 2024
Author