Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

pulling data from USPSA and avoiding rate limits #18

jrdoran started this conversation in Ideas
Discussion options

so I do something very similar here: http://www.steelrankings.com/
I too ran into the http429 issue. Cloudflare is checking IP; I ended up using zenrows.com. I process 17000 USPSA numbers for steel challenge classification rankings 4x / week. I use 25 concurrent threads and it averages about .07 per request; Happy to have more discussion. I did take a look at your data files and happy to help if you wanted to rebuild them on-demand.

You must be logged in to vote

Replies: 5 comments 2 replies

Comment options

Hey this is awesome! I didn't even think of checking if there's a SaaS for scraping around rate limits.

I just use mobile API that doesn't have rate limiting (well it didn't back in January).

Looking how USPSA and SCSA mobile apps are very similar, maybe you can use their mobile api too and save money on the scraper (don't know if it costs you anything). Let me know if you need help with that.

Link you posted didn't open for me for some reason.

You must be logged in to vote
0 replies
Comment options

Could you paste a sample mobile api link ? I wish they had a swagger doc, but from what I see they ( USPSA / SCSA ) don't really have genuine api's ( maybe you can prove me wrong ).
Fat finger on my part, I typed the name of my own site wrong !
http://www.steelrankings.com/

You must be logged in to vote
1 reply
Comment options

I used a sniffer on iOS app to see what it's hitting. Then I just reused same request from the browser snippets tab using this:

https://github.com/CodeHowlerMonkey/hitfactorlol/blob/main/scripts/uspsaScript.js

All hhfs come from single endpoint, so I just took that thin tight off the sniffer app (it can save files).

For clarifications and classifiers api urls are these:

https://api.uspsa.org/api/app/classification/A100099
https://api.uspsa.org/api/app/classifiers/A100099

Comment options

OK, this is getting good !
This is the page which is rate limited in SCSA ( likely USPSA also )
https://scsa.org/classification/FY105260
https://uspsa.org/classification/fy105260
So, I'm scraping via python url request and then parsing the DoM via Soup. I run this on both MacOS or AWS Linux depending on if I'm busy with my machine. It seems like you are running almost like a selenium approach in your browser. I'll need to play with the Sniffer to track the endpoints. I'm using HTTP Get w/o an API key ( I'm familiar with how to use them ) but I don't see how I get an API key to consume those mobile api endpoints ? Could you explain if there is an api I call to create my token ?

Here is my sample on HTTP Req ( I feed into this 17k urls which I retrieve from AWS RDS )

client_key = "xxx"
client = ZenRowsClient(client_key)

def make_request_with_retry(url, retries=9, backoff_factor=1.9, timeout=50):
for attempt in range(1, retries + 1):
try:
print(f"URL Request Attempt {attempt}/{retries} for {url}")
request_start_time = time.time() # Start timing the request

 #params = {"premium_proxy":"true"}
 #response = client.get(url, params=params)
 response = client.get(url)
 
 
 request_end_time = time.time() # End timing the request
 elapsed_time = request_end_time - request_start_time # Calculate elapsed time
 print(f"Elapsed Time for {url}: {elapsed_time:.2f} seconds")
 
 if response.status_code == 200:
 soup = BeautifulSoup(response.text, 'html.parser')
 result = get_expiration_date(soup)
 print("\t\n"+url+" expiration date ",result)
 
 return 
 else:
 print(f"Request for {url} returned a non-success status code: {response.status_code}")
 print(f"Elapsed Time for {url}: {elapsed_time:.2f} seconds")
 except requests.exceptions.RequestException as e:
 print(f"Request for {url} failed: {e}")
 print(f"Elapsed Time for {url}: {elapsed_time:.2f} seconds")
 time.sleep(backoff_factor * (2 ** (attempt - 1))) # Exponential backoff
 except Exception as e:
 print(f"An unexpected error occurred for {url}: {e}")
 print(f"Elapsed Time for {url}: {elapsed_time:.2f} seconds")
print(f"Maximum retry attempts reached for {url}. Request failed.")
return None
You must be logged in to vote
1 reply
Comment options

The API key I took from the sniffer. For iOS I used this app: https://apps.apple.com/us/app/storm-sniffer-packet-capture/id1610958307

It comes with instructions how to install MITM certificate for sniffing https traffic

Comment options

Hey it looks like you are using Zenrows; how do you like it ? Have you taken any different directions as a result of SaaS scraping ?

You must be logged in to vote
0 replies
Comment options

Congrats on the launch of your site. your UI work is outstanding. I having been trying various scraping methods with and without zenrows.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Ideas
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /