= Image Downloader
Quite often there is a need to collect pictures from one or another page on the Internet. This plugin solves this particular task.
== Installation
sudo gem install image_downloader
== Requirements
- ruby 1.8 or 1.9
- gem nokogiri
== Description
Image Downloader is a rather simple library which does the following:
- get web page (with Net::HTTP)
- parse html page (use regexp or nokogiri)
- download images (in one or multi-threads)
== Example usage
After installation, you can use the following code as an example:
require 'rubygems'
require 'image_downloader'
page_url = 'www.test.com'
target_path = 'img_dir/'
downloader = ImageDownloader::Process.new(page_url,target_path)
download all images on page in any place (by regexp, all that look like url with image)
downloader.parse(:any_looks_like_image => true)
or
download images from all elements where usually images placed (<img...>, <a...>, ...)
downloader.parse()
or
download image from exect places in page
downloader.parse(:collect => {:link_icon => true})
or
download images by regexp
downloader.parse(:regexp => /[^'"]+.jpg/i)
downloader.download()
For "parse" method available following options
find all url which contain image extansion
:any_looks_like_image => true
find images in specified location
:collect => {
:all => true, # all image places
:(img_src|a_href|style_url|link_icon) => true # specified location
}
find by regexp
:regexp => /'"[^'"]*['"]/i) # for ruby 1.8 (in 1.9 not allowed () for scan method)
:regexp => /[^'"]+.jpg/i # the same, but shorter
:regexp => /[^'"]+.css/ # other files can also be downloaded
ignore URLs with images according to given parameters
:ignore_without => {:(extension|image_extension) => true}
setting the favorite User-Agent (vary important for exclude 403, 404... responses from server)
:user_agent => "ruby" # Mozilla/5.0 by default
Detailed location description
- img_src - tag: img, attribute: src="url"
- a_href - tag: a, attribute: href="url"
- style_url - tag: any, attribute: style="(background|background-image): url('url')"
- link_icon - tag: link, attribute: rel="shortcut icon" href="url"
For "download" method you can use following directives
:parallel => true # for multi thread downloading (this is default if no options)
:consequentially => true, # for sequential downloading into a single stream
:user_agent => "ruby" # Mozilla/5.0 by default
== Executables
You can simply use the executed shell commands:
For any looks like image download
download_any_images url dir/
For download favicon only
download_icon url dir/
For download all, that is located in the places for pictures
download_images url dir/
For download by regexp
download_by_regexp url dir/ "[^'"]+\.js"
== Debugging
"-d", "--debug"
To monitor the process of downloading, use the -d flag in the parameters.
Perhaps there is an error URI::InvalidURIError in some cases.
download_images url dir/ -d
== Copyright
Copyright (c) 2011 Malykh Oleg. See LICENSE.txt for
further details.
== License
The MIT License
== Authors
Personal blog author: {Malykh Oleg}[http://malyh.com/] - blog in russian