tag href attribute
tag background attribute
tag src attribute
tag src attribute
tag src attribute
<img> tag src attribute
tag href attribute
tag content attribute containing URL
tag data attribute
style attribute of any tag (parsed as CSS content)
body of tag (parsed as CSS content)
body of <script> tag when type or language attribute identifies JavaScript (parsed as JavaScript content)
CSS content
JavaScript content
Can convert relative URLs to absolute URLs by providing resource URL
Can convert relative URLs to absolute URLs when base URL found in HTML content
== Examples
=== Find URLs in an HTML document
Provide the HTML content and the content type and obtain an array of unique URLs.
ContentUrls.urls(html, 'text/html').each do |url|
puts "Found URL: #{url}"
end
=== Rewrite URLs in an HTML document
Provide the HTML content, the content type, and a block to rewrite each URL's extension.
rewritten_html = ContentUrls.rewrite_each_url(html, 'text/html') {|url| url.sub(/.htm/, '.html'}
== Requirements
== Development
To test and develop this gem, additional requirements are:
- bundler
- jeweler
- rake
- rdoc
- rspec
- yard
== Goals for ContentUrls
- Include support for:
- Acrobat (.pdf)
- Flash (.swf)
- Microsoft Office (.doc, .xls, .ppt)
- text (regular expression for URLs)
- Capture links retrieved from a headless web browser which executes the code (JavaScript, etc.)
== Contributing to content_urls
- Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
- Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
- Fork the project.
- Start a feature/bugfix branch.
- Commit and push until you are happy with your contribution.
- Make sure to add tests for it. This is important so I don't unintentionally break it in a future version.
- Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
== Copyright
Copyright (c) 2013 Dennis Sutch. See LICENSE.txt for further details.