Socket
Book a DemoInstallSign in
Socket

semantic-crawler

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

semantic-crawler

0.7.1
bundlerRubygems
Version published
Maintainers
1
Created
Source

= Ruby Semantic Crawler Library

This project encapsulates data gathering from different sources. It simplifies the extension of internal data with public available knowledge. The major strength is the use of semantic technology to bypass complex NLP (natural language processing).

== Supported Sources

  • {Geonames}[http://www.geonames.org/]
  • {CIA Factbook RDF Dump}[http://www4.wiwiss.fu-berlin.de/factbook/directory/countries]
  • {FAO - Food and Agriculture Organization of the United Nations}[http://www.fao.org]
  • {LinkedGeoData - LGD}[http://linkedgeodata.org]
  • {GDACS}[http://gdacs.org]
  • {Freebase}[http://freebase.com]
  • Microdata on Websites, e.g. annotated with {schema.org}[http://schema.org/] vocabulary

=== TODO

  • DBPedia
  • Different Government Sources

== Installation

$ gem install semantic-crawler

Or from source:

$ git clone git://github.com/obale/semantic_crawler.git
$ cd semantic_crawler
$ bundle install
$ rake build
$ rake install pkg/semantic-crawler-*.gem

You can add this library also as dependency in your Gemfile:

gem "semantic-crawler"

Or from source:

gem "semantic-crawler", :git => "git://github.com/obale/semantic_crawler.git"                       # for the master branch or
gem "semantic-crawler", :git => "git://github.com/obale/semantic_crawler.git", :branch => "develop" # for the developer branch

== Examples

This examples are only a short outline how to use the library. For more information read the documentation or look into the source code. To use the library include or execute the following line:

>> require "semantic_crawler"

=== Microdata from Website

Extract in an easy way microdata, e.g. with schema.org vocabulary, from websites.

>>> microdata = SemanticCrawler::Websites::MicroData.new("https://www.alex-oberhauser.com")
>>> org = microdata.to_s['http://schema.org/Organization'][1]
>>> puts org['name']
"Sigimera Ltd."
>>> puts org['url']
"http://www.sigimera.com"
>>> puts org['employee']['http://schema.org/Person'].first['name']
"Alex Oberhauser"
>>> puts org['employee']['http://schema.org/Person'].first['jobTitle']
"Co-Founder & CEO"

=== GeoNames

The GeoNames module is able to return a Factbook::Country and Fao::Country module on the base of input GPS coordinates (lat/long).

>> innsbruck = SemanticCrawler::GeoNames::Country.new(47.271338, 11.395333)
>> articles = innsbruck.get_wikipedia_articles
>> articles.each do |article|
>>      puts article.wikipedia_url
>> end
>> factbook_obj = innsbruck.get_factbook_country
>> fao_obj = innsbruck.get_fao_country

=== Factbook

Fetch Factbook information about Austria:

>> austria = SemanticCrawler::Factbook::Country.new("austria")
>> puts austria.background
>> puts austria.climate

=== GDACS

Parse crisis information feed from GDACS.org:

>> feed = SemanticCrawler::Gdacs::Feed.new
>> puts feed.title.to_s
>> puts feed.description.to_s
>> feed.items.each do |item|
>>      puts item.title.to_s
>>      puts item.eventtype.to_s
>>      item.resources.each do |resource|
>>          puts resource.url.to_s
>>      end
>> end

Get information from the the GDACS.org emergency feed:

>> emergency_feed = SemanticCrawler::Gdacs::EmergencyFeed.new
>> puts emergency_feed.title.to_s
>> items = @emergency_feed.items
>> items.each do |item|
>>      puts item.title.to_s
>>      puts item.link.to_s
>> end

=== FAO

Country information from {FAO}[http://www.fao.org]:

>> austria = SemanticCrawler::Fao::Country.new("Austria")
>> puts austria.name_currency("en")
>> puts austria.official_name("es")

=== LinkedGeoData

Geo information from {LinkedGeoData}[http://linkedgeodata.org]:

>> # All nodes around the center of dresden, in a radius of 1000m
>> @dresden = SemanticCrawler::LinkedGeoData::RelevantNodes.new(51.033333, 13.733333, 1000)
>> # Optional with type: SemanticCrawler::LinkedGeoData::RelevantNodes.new(51.033333, 13.733333, 1000, "TrafficSignals")
>> nodes = @dresden.relevant_nodes
>> nodes.each do |item|
>>      puts item.latitude
>>      puts item.longitude
>>      puts item.type
>> end

=== Freebase

Freebase.com country information:

>> country = SemanticCrawler::Freebase::Country.new("Austria")
>> links = country.same_as
>> links.each do |link|
>>      be_valid link.start_with?("http")
>> end
>> puts country.website

== Tested with

  • Ruby 1.9.3-p125 and Rails 3.2
  • Ruby 2.0.0-p0 and Rails 3.2

== Additional Links

== License

(c) 2012 - 2013 by Alex Oberhauser for {Sigimera Ltd.}[http://www.sigimera.com], published under MIT license.

== Warranty

This software is provided "as is" and without any express or implied warranties, including, without limitation, the implied warranties of merchantibility and fitness for a particular purpose.

FAQs

Package last updated on 07 Apr 2013

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.