Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

uri_pathname

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

uri_pathname

  • 0.1.1.0
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

= uri_pathname

UriPathname eases the conversions between URIs and unique valid pathnames. This feature might be useful, for instance, when:

  • Web Spidering: You want to save webpages to files, or save all their contents within a directory, or only some scraped data, ... and don't know how to name them. UriPathname can assign easily unique valide names to files or directories from already known URIs, combinig scheme, hostname, path and vars.

  • Web Stubbing / Testing: You need to retrieve previously saved webpages by means of their URIs. UriPathname guesses the pathname from a given URI, then you can use, for instance, your File.read to read that file/page.

== Installation

$ sudo gem install uri_pathname

The main source repository is http://github.com/syborg/uri_pathname.

== Examples

(see examples directory)

=== First of all

Require at least ...

require 'rubygems' require 'uri_pathname'

=== Generate pathnames from URIs

Theses examples are useful when web spidering. You will need to generate a pathname given an URI.

From URI with path and query

puts up.uri_to_pathname("http://www.fake.fak/path1/path2?query")

=> www.fake.fak__|path1_|_path2?query(http)

From URI without path and query

puts up.uri_to_pathname("http://www.fake.fak")

=> www.fake.fak__|NOPATH(http)

Fragments or other URI parts are silently ignored

puts up.uri_to_pathname('http://donaldfagen.com/disc_nightfly.php#rubybaby')

=> donaldfagen.com__|disc_nightfly.php(http)

If a directory is given as a second arg, pathname will be prepended with that

expanded directory

puts up.uri_to_pathname("http://www.fake.fak", "~/my_webdumps")

=> /home/marcel/my_webdumps/www.fake.fak__|NOPATH(http)

Also a third argument can be given and will be appended as an extension

puts up.uri_to_pathname("http://www.fake.fak/path","", ".html")

=> www.fake.fak__|path(http).html

=== Recovering URIs from (correct) pathnames

When stubbing with tools like fakeweb, you can use reverse conversion to register fake accesses to real URIs.

pathnames can be parsed if correctly generated as above examples (see docs)

p up.parse('/home/marcel/my_webdumps/www.fake.fak__|path1_|_path2?query(http).html.gz')

URI can be also retrieved from correct (Uri)pathnames

puts up.pathname_to_uri('/home/marcel/my_webdumps/www.fake.fak__|NOPATH(http).html.gz')

=> http://www.fake.fak

puts up.pathname_to_uri('www.fake.fak__|path?query(http).html.gz')

=> http://www.fake.fak/path?query

=== Web Spidering / Dumping / Stubbing

This example shows how tu use UriPathname to assign names to files and also registering those files to stubb real accesses later.

require 'rubygems'
require 'fileutils'
require 'open-uri'
require 'uri_pathname'
require 'fakeweb' # gem install fakeweb

# put here whatever temporary directory name to use
MY_DIR=File.expand_path '~/my_dumps'
# put here whatever URIs u want to access
MY_URIS = [
  'http://en.wikipedia.org/wiki/Ruby_Bridges',
  'http://donaldfagen.com/disc_nightfly.php',
  'http://www.rubi.cat/ajrubi/portada/index.php',
  'http://www.google.com/cse?q=array&cx=013598269713424429640%3Ag5orptiw95w&ie=UTF-8&sa=Search'
]

# some convenient defs
def prepare_example
  File.makedirs(MY_DIR) unless (File.exist?(MY_DIR) and File.directory?(MY_DIR))
end

# preparation (comment this if you've already got your test dir)
prepare_example

up = UriPathname.new

# 1st round: Capture MY_URIS, and save them with appropiate UriPathname
puts "1- Capturing URIs"
data = nil
sizes = []
MY_URIS.each do |uri|
  open uri do |u|
    data=u.read
    pathname = up.uri_to_pathname(uri,MY_DIR,".html")
    File.open(pathname,'w') do |f|
      f.write data
      sizes << data.size
      puts "SAVED #{uri} : #{data.size} bytes"
    end
  end
end

# 2nd round: checking saved files and preparing fake web accesses
puts "\n2- CHECKING CAPTURED FILES AND PREPARING FAKE WEB ACCESSES"
FakeWeb.allow_net_connect=false
Dir[File.join(MY_DIR,"*")].each do |name|
  uri = up.pathname_to_uri name
  FakeWeb.register_uri :any, uri, :body=>name, :content_type=>"text/html"
  puts "#{name}\n\tcorresponds to #{uri}"
end

# 3nd round: Access Web without actually accessing web
puts "\n3- FAKE WEB ACCESSES"
MY_URIS.each_with_index do |uri,i|
  open uri do |u|
    data=u.read
    puts "FAKE #{uri} ACCESS #{(data.size == sizes[i]) ? 'OK' : 'KO'}: #{data.size} bytes"
  end
end

== Release Notes

  • At present, UriPathname uses only some parts of an URI (scheme, hostname, path and query) to generate a valid and unique pathname that can be backconverted to URI. Port, User and other URI features are not yet used. I haven't had the necessity to include them too ;-).

  • Only Linux pathnames have been taken into account. I don't know if UriPathname will generate correct Windows or OSX pathnames, for instance. Test it and feel free to collaborate.

  • This is a very early release. I haven't got the time to study and prepare tests, nonetheless, some examples will become tests in the future

== Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Add tests for it. This is important so I don't break it in a future version unintentionally.
  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
  • Send me a pull request. Bonus points for topic branches.

== Copyright

Copyright (c) 2011 Marcel Massana. See LICENSE for details.

FAQs

Package last updated on 31 Aug 2011

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc