I need to make some get/post requests to a website that I have credential to log in. I plan to do this with Ruby and Net::Http. As I'm new to this kind of experience, I'm struggling with the fact that the log-in page requests robot verification (check-box kind) - that means I'm not able to automate the log-in phase. Besides that, the server keeps alive for some time until it verify that no active has been made, after that it request the log-in page again. The website is build with PHP and JS (most of it is JS) and it requires that the user enter with a "restrict-area" browser's mode after the log in phase.
It would be no problem I manually log in and execute an operation (few requests) for every time I need it. But I don't know how I could pass credential information from the browser, as session id, to my script. I need some concepts ideas about this.
Additional information:
- There is no public API.
- The "restrict-area" browser's mode is a browser without some buttons (forward and backward in history pages) and it don't permit to change the URL - that is all I know.
- I need this for automating some manually tasks that take hours to do.
- The website uses Ajax.
If additional information is needed I can add it, just ask in the comments.
Thanks in advance!
EDIT
My intension isn't to crawl random websites, but how to make specifics HTTP request in a specific website where is necessary credentials to do so.
1 Answer 1
For JS-intensive websites, it might be much more convenient to use a "headless-browser" approach, such as capybara-webkit gem, which basically allows automation on top of a popular browser engine used in Chrome, Safari, Opera, etc. I'm not sure if it's good enough to cheat the robot verification (leaving moral aspect aside), but at least it beats Net::Http in cases like getting Google search results.
Also, have a look at PhantomJS which is a JS browser automation (as capybara-webkit is a Ruby browser automation), which gives an additional convenience of working with in-page elements in the same language which controls the browser.
-
Hey, thank you for the inside +1. I didn't let clear in the question the kind of searching I will be doing, but is basically request for specific endpoints and dealing with json data. I will probably not need to scrape pages for data. Right now I'm testing Ruby Mechanize. My intention is copy the cookies from a open session in Firefox and use the Mechanize to simulate a parallel browser using the same session (Hijack session), so I won't be dealing with robot verification.Pedro Gabriel Lima– Pedro Gabriel Lima2018年03月06日 13:36:56 +00:00Commented Mar 6, 2018 at 13:36
Authorization: Bearer
tokens. Unfortunately, without a public API, what you build is prone to break every time the target website does a new release.