Bulbasaur was created with the objective of sharing components used by the Preadly crawler. It is a module for crawler operations, which uses the XML parser Nokogiri. Bulbasaur helps to simplify those HTML operations. This is one of contributions of pread.ly to the open source community.
Build Status Code Climate Test Coverage
Add this line to your application's Gemfile:
gem 'bulbasaur'
Or to get the latest updates:
gem 'bulbasaur', github: 'preadly/bulbasaur', branch: 'master'
And then execute:
$ bundle
Or install it manually with:
$ gem install bulbasaur
Bulbasaur has three main operations: Extract, Replace and Other.
Has four sub-operations:
- ExtractImagesFromHTML
- ExtractImagesFromYoutube
- ExtractImagesFromVimeo
- ExtractImagesFromAllResorces
html = "<img src='test.jpg' alt='test' /><img src='test-2.jpg' alt='test' />" images = Bulbasaur::ExtractImagesFromHTML.new(html).call puts images #print [{url: 'test.jpg', alt='alt'}, {url: 'test-2.jpg', alt='test'}]
Has two sub-operations:
- ReplacesByTagImage
- ReplacesByTagLink
html = "<img src='test.jpg' alt='test' />" image_replaces = [{original_image_url:"test.jpg", url: "new-image.png"}] Bulbasaur::ReplacesByTagImage.new(html, image_replaces).call puts html #print <img src='new-image.png' alt='test' />
- NormalizeURL
base_url = 'http://github.com' context_url = 'preadly' url = Bulbasaur::NormalizeURL.new(base_url, context_url).call puts url #print http://github.com/preadly
For more information about the components, run the RSpec tests with parameter --format d.
rspec --format d --color
- Fork it ( https://github.com/preadly/bulbasaur );
- Create your feature branch (
git checkout -b my-new-feature); - Commit your changes (
git commit -am 'Add some feature'); - Push to the branch (
git push origin my-new-feature); - Create a new Pull Request.