Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

sunil-dhaka/python-webScrappers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

40 Commits

Repository files navigation

points and resources

creating api key on data-gov-in

  • register and login
  • go to my account section(logged In) and
  • just click on create API key; there you have it
  • this key can be used for any datasset
  • to see how to use it look at

important open data websites


avoid headers problem

  • some websites like amazon use systems that can detect bot/automated(basically not through a web-browser, rather done programmatically) requests,
  • to avoid them we can use headers parameter in out get request
  • to get your user-agent visit
  • or to learn more about headers go to requests docs and also look into network-inspection tab to know more about them for a particular website
  • we also can use slenium/helium automation but that is resource heavy even with headless running
  • Notes:
    • we also can use other user agents like for other web-browsers(safari,firefox etc) and try what sort of info we get
    • you also can trying to rotate through different user-agents when facing problem your usual user-agent; might help when there is some IP blocks etc

About

web crawler to collect data from internet ...

Topics

Resources

License

Stars

Watchers

Forks

AltStyle によって変換されたページ (->オリジナル) /