Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

image_downloader

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

image_downloader

0.2.4
Rubygems

Version published: 13 years ago

Maintainers: 1

Created: 13 years ago

Source

= Image Downloader

Quite often there is a need to collect pictures from one or another page on the Internet. This plugin solves this particular task.

== Installation

sudo gem install image_downloader

== Requirements

ruby 1.8 or 1.9
gem nokogiri

== Description

Image Downloader is a rather simple library which does the following:

get web page (with Net::HTTP)
parse html page (use regexp or nokogiri)
download images (in one or multi-threads)

== Example usage After installation, you can use the following code as an example:

require 'rubygems' require 'image_downloader'

page_url = 'www.test.com' target_path = 'img_dir/' downloader = ImageDownloader::Process.new(page_url,target_path)

download all images on page in any place (by regexp, all that look like url with image)

downloader.parse(:any_looks_like_image => true)

or

download images from all elements where usually images placed (<img...>, <a...>, ...)

downloader.parse()

or

download image from exect places in page

downloader.parse(:collect => {:link_icon => true})

or

download images by regexp

downloader.parse(:regexp => /[^'"]+.jpg/i)

downloader.download()

For "parse" method available following options

find all url which contain image extansion

:any_looks_like_image => true

find images in specified location

:collect => { :all => true, # all image places :(img_src|a_href|style_url|link_icon) => true # specified location }

find by regexp

:regexp => /'"[^'"]*['"]/i) # for ruby 1.8 (in 1.9 not allowed () for scan method) :regexp => /[^'"]+.jpg/i # the same, but shorter :regexp => /[^'"]+.css/ # other files can also be downloaded

ignore URLs with images according to given parameters

:ignore_without => {:(extension|image_extension) => true}

setting the favorite User-Agent (vary important for exclude 403, 404... responses from server)

:user_agent => "ruby" # Mozilla/5.0 by default

Detailed location description

img_src - tag: img, attribute: src="url"
a_href - tag: a, attribute: href="url"
style_url - tag: any, attribute: style="(background|background-image): url('url')"
link_icon - tag: link, attribute: rel="shortcut icon" href="url"

For "download" method you can use following directives

:parallel => true # for multi thread downloading (this is default if no options) :consequentially => true, # for sequential downloading into a single stream :user_agent => "ruby" # Mozilla/5.0 by default

== Executables You can simply use the executed shell commands:

For any looks like image download download_any_images url dir/

For download favicon only download_icon url dir/

For download all, that is located in the places for pictures download_images url dir/

For download by regexp download_by_regexp url dir/ "[^'"]+\.js"

== Debugging

"-d", "--debug"

To monitor the process of downloading, use the -d flag in the parameters. Perhaps there is an error URI::InvalidURIError in some cases.

download_images url dir/ -d

== Copyright

== License

The MIT License

== Authors

Personal blog author: {Malykh Oleg}[http://malyh.com/] - blog in russian

FAQs

What is image_downloader?

Is image_downloader well maintained?

Package last updated on 26 Jul 2011

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

image_downloader

download all images on page in any place (by regexp, all that look like url with image)

or

download images from all elements where usually images placed (<img...>, <a...>, ...)

or

download image from exect places in page

or

download images by regexp

find all url which contain image extansion

find images in specified location

find by regexp

ignore URLs with images according to given parameters

setting the favorite User-Agent (vary important for exclude 403, 404... responses from server)

Related posts

Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm

Malicious npm Package Typosquats Popular TypeScript ESLint Plugin, Exfiltrates Data and Enables Remote Exploitation