Security News
Weekly Downloads Now Available in npm Package Search Results
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.
DoverToCalais allows the user to send a wide range of data sources (files & URLs) to OpenCalais and receive asynchronous responses when OpenCalais has finished processing the inputs. In addition, DoverToCalais enables response filtering in order to find relevant tags and/or tag values.
In short -and quoting the OpenCalais creators:
"The OpenCalais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing (NLP), machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well."
In general, OpenCalais Simple XML Format (the one used by DoverToCalais) returns three kinds of tags: Entitites, Events and Topics. Entities are static 'things', like Persons, Places, et al. that are involved in the textual context in some capacity. OpenCalais assigns a relevance score to each entity to indicate it's relevance within the context of the data source's general topic. Events are facts or actions that pertain to one or more Entities. Topics are a characterisation or generic description of the data source's context.
We can use these tags and the information within them to extract relevant information from the data or to draw useful conclusions about it. For example, if the data source tags include an <Event> with the value of 'CompanyExpansion', I can then look for the <City> or <Company> tags to find out which company is expanding and if it's near my location (hint: they may be looking for more staff :)) Or, I could pick out all <Company>s involved in a <JointVenture>, or all <Person>s implicated in an <Arrest> in my <City>, etc.
There are many reasons, mainly to:
Multiple data source support: Thanks to the power of Yomu, DoverToCalais can process a vast range of files (and, of course, web pages), extract text from them and send them to OpenCalais for analysis and tag generation.
Asynchronous responses (callbacks): Users can set callbacks to receive the processed meta-data, once the OpenCalais Web Service response has been received. Furthermore, a user can set multiple callbacks for the same request (data source), thus enabling cleaner, more modular code.
Result filtering: DoverToCalais uses the OpenCalais Simple XML Format as the preferred response format. The user can work directly with the XML-formatted response, or -if feeling a bit lazy- can take advantage of the DoverToCalais filtering functionality and receive specific entities, optionally based on specified conditions.
For more details of the features and code samples, see Usage.
##Pre-requisites
To use the OpenCalais Web Service and -by extension- DoverToCalais, one needs to possess an OpenCalais API key, which is easily obtainable from the OpenCalais web site.
Also, DoverToCalais requires the presence of a working JRE.
Add this line to your application's Gemfile:
gem 'dover_to_calais'
And then execute:
$ bundle
Or install it yourself as:
$ gem install dover_to_calais
DoverToCalais has been developed in Ruby 1.9.3 and relies on the following gems to work (installation with the gem command will automatically install all dependencies)
As Yomu depends on a working JRE in order to function, so does DoverToCalais.
Using DoverToCalais is extremely simple.
As DoverToCalais uses the awesome-ness of EventMachine, code must be placed within an EM run block:
EM.run do
# use Control + C to stop the EM
Signal.trap('INT') { EventMachine.stop }
Signal.trap('TERM') { EventMachine.stop }
# we need an API key to use OpenCalais
DoverToCalais::API_KEY = 'my-opencalais-api-key'
# create a new dover
dover = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/world-africa-24412315')
# parse the text and send it to OpenCalais
dover.analyse_this
puts 'do some stuff....'
# set a callback for when we receive a response
dover.to_calais { |response| puts response.error ? response.error : response }
puts 'do some more stuff....'
end
This will produce the following result:
do some stuff....
do some more stuff....
<OpenCalaisSimple>
..........
(the rest of the XML response from OpenCalais)
As can be observed, the callback (#to_calais) is trigerred after the rest of the code has been executed and only when the OpenCalais request has been completed.
Of course, we can analyse more than one sources at a time:
EM.run do
# use Control + C to stop the EM
Signal.trap('INT') { EventMachine.stop }
Signal.trap('TERM') { EventMachine.stop }
DoverToCalais::API_KEY = 'my-opencalais-api-key'
d1 = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/world-africa-24412315')
d2 = DoverToCalais::Dover.new('/home/fred/Documents/RailsRecipes.pdf')
d3 = DoverToCalais::Dover.new('//network-drive/annual_forecast.doc')
d1.analyse_this; d2.analyse_this; d3.analyse_this;
puts 'do some stuff....'
d1.to_calais { |response| puts response.error ? response.error : response }
d2.to_calais { |response| puts response.error ? response.error : response }
d3.to_calais { |response| puts response.error ? response.error : response }
puts 'do some more stuff....'
end
This will output the two puts statements followed by the three callbacks (d1, d2, d3) in the order in which they are triggered, i.e. the first callback to receive a response from OpenCalais will fire first.
###Filtering the response Why parse the response XML ourselves when DoverToCalais can do it for us? We'll just use the #filter method on the response object, passing a filtering hash:
my_filter = {:entity => 'Entity1', :value => 'Value1', :given => {:entity => 'Entity2', :value => 'Value2'}}
reponse.filter(my_filter)
The above tells DoverToCalais to look in the reponse for an entity called 'Entity1' with a value of 'Value1', only if the response contains an entity called 'Entity2' which has a value of 'Value2'.
The conditional clause (:given) is optional; the filtering hash can be used in pretty much any permutation. For instance:
EM.run do
DoverToCalais::API_KEY = 'my-opencalais-api-key'
dover = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/world-africa-24412315')
dover.analyse_this
dover.to_calais do |response|
if response.error
puts response.error
else
puts response.filter({:entity => 'Company'})
end
end
end
This will pick out all entities tagged 'Company' from the data source. The output will be an Array of ResponseItem objects.
<struct DoverToCalais::ResponseItem name="Company", value="BBC News", relevance=0.654, count=13, normalized=nil, importance=nil, originalValue=nil>
<struct DoverToCalais::ResponseItem name="Company", value="TV Radio", relevance=0.565, count=2, normalized="HERALD & WEEKLY-TV,RADIO OPS", importance=nil, originalValue=nil>
<struct DoverToCalais::ResponseItem name="Company", value="Reuters", relevance=0.255, count=2, normalized="THOMSON REUTERS GROUP LIMITED", importance=nil, originalValue=nil>
<struct DoverToCalais::ResponseItem name="Company", value="Twitter", relevance=0.395, count=1, normalized="TWITTER, INC.", importance=nil, originalValue=nil>
<struct DoverToCalais::ResponseItem name="Company", value="Huffington Post UK", relevance=0.136, count=1, normalized=nil, importance=nil, originalValue=nil>
<struct DoverToCalais::ResponseItem name="Company", value="Ireland Kenya", relevance=0.144, count=1, normalized=nil, importance=nil, originalValue=nil>
<struct DoverToCalais::ResponseItem name="Company", value="Yahoo! UK", relevance=0.144, count=1, normalized="YAHOO! UK LIMITED", importance=nil, originalValue=nil>
If this output looks a bit cluttered, we can easily tidy it up:
EM.run do
DoverToCalais::API_KEY = 'my-opencalais-api-key'
dover = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/world-africa-24412315')
dover.analyse_this
dover.to_calais do |response|
if response.error
puts response.error
else
items = response.filter({:entity => 'Company'})
items.each do |item|
puts "#{item.name}: #{item.value}, relevance = #{item.relevance}"
end
end
end
end
Which will give us:
Company: BBC News, relevance = 0.656
Company: TV Radio, relevance = 0.566
Company: Reuters, relevance = 0.26
Company: Guardian.co.uk, relevance = 0.143
Company: Twitter, relevance = 0.399
Company: Huffington Post UK, relevance = 0.132
Company: Ireland Kenya, relevance = 0.139
Company: Yahoo! UK, relevance = 0.139
Let's see if the data source refers to any business partnerships:
EM.run do
DoverToCalais::API_KEY = 'my-opencalais-api-key'
dover = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/technology-24380202')
dover.analyse_this
dover.to_calais do |response|
if response.error
puts response.error
else
items = response.filter({:entity => 'Event', :value => 'Business Partnership'})
puts "There are #{items.length} events like that in the source"
end
end
end
which will produce:
There are 1 events like that in the source
Now let's find all companies involved in any business partnerships:
EM.run do
DoverToCalais::API_KEY = 'my-opencalais-api-key'
dover = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/technology-24380202')
dover.analyse_this
dover.to_calais do |response|
if response.error
puts response.error
else
items = response.filter( {:entity => 'Company', :given => {:entity => 'Event', :value => 'Business Partnership'}} )
items.each do |item|
puts "#{item.name}: #{item.value} a.k.a #{item.normalized}, relevance = #{item.relevance}"
end
end
end
end
which gives us:
Company: BBC News a.k.a , relevance = 0.678
Company: Google a.k.a GOOGLE INC., relevance = 0.508
Company: Flutter a.k.a FLUTTER COM INC, relevance = 0.531
Company: TV Radio a.k.a HERALD & WEEKLY-TV,RADIO OPS, relevance = 0.558
Company: Microsoft a.k.a MICROSOFT CORPORATION, relevance = 0.303
Company: Adobe a.k.a ADOBE SYSTEMS INCORPORATED, relevance = 0.193
Company: Netflix a.k.a NETFLIX, INC., relevance = 0.301
Company: Y Combinator a.k.a Y Combinator, relevance = 0.258
Company: Nintendo a.k.a Nintendo Co., Ltd., relevance = 0.286
Company: Samsung a.k.a Samsung C&T Corporation, relevance = 0.285
Company: Glyndwr University a.k.a , relevance = 0.269
At this point, someone may ask: "But what if we want to get more than one entity for a given condition? The filter hash doesn't allow that!"
No it doesn't. However, given that filtering is done on the whole reponse after it's been received, we can apply many filters on the same response:
EM.run do
DoverToCalais::API_KEY = 'my-opencalais-api-key'
dover = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/technology-24380202')
dover.analyse_this
dover.to_calais do |response|
if response.error
puts response.error
else
result1 = response.filter( {:entity => 'Company', :value => 'Google', :given => {:entity => 'Technology', :value => 'gesture recognition'}} )
result2 = response.filter( {:entity => 'Product', :given => {:entity => 'Technology', :value => 'gesture recognition'}} )
puts result1 | result2
end
end
end
Which will give us all the gesture-recognition products that Google is associated with according to our data source:
<struct DoverToCalais::ResponseItem name="Company", value="Google", relevance=0.506, count=7, normalized="GOOGLE INC.", importance=nil, originalValue=nil>
<struct DoverToCalais::ResponseItem name="Product", value="Xbox Kinect", relevance=0.286, count=1, normalized=nil, importance=nil, originalValue=nil>
<struct DoverToCalais::ResponseItem name="Product", value="Galaxy S4 smartphone", relevance=0.282, count=1, normalized=nil, importance=nil, originalValue=nil>
<struct DoverToCalais::ResponseItem name="Product", value="Wii", relevance=0.286, count=1, normalized=nil, importance=nil, originalValue=nil>
<struct DoverToCalais::ResponseItem name="Product", value="Galaxy S4", relevance=0.282, count=1, normalized=nil, importance=nil, originalValue=nil>
PS: If you're not sure about the names or values of the tags you want to filter, you can get a listing with the following Constants:
CalaisOntology::CALAIS_ENTITIES
CalaisOntology::CALAIS_EVENTS
CalaisOntology::CALAIS_TOPICS
###Code samples
More examples of using DoverToCalais can be found as GitHub Gists:
Using DoverToCalais to semantically tag all files in a directory
Use DoverToCalais to find all Persons or Organizations with a relevance score greater than 0.1, if the data source contains an environmental event
If you're behind a corporate firewall and the only way to reach outside is through a proxy then you need to set the DoverToCalais::PROXY constant:
DoverToCalais::PROXY =
:proxy => {
:host => 'www.myproxy.com',
:port => 8080,
:authorization => ['username', 'password'] #optional
}
If you're connecting through a SOCKS5 Proxy just set the :type key to :socks5.
DoverToCalais::PROXY =
:proxy => {
:host => 'www.myproxy.com',
:port => 8080,
:type => :socks5
}
Comprehensive documentation can be found at http://rubydoc.info/gems/dover_to_calais.
A list of Cucumber features and scenarios can be found in the features directory. The list is far from exhaustive, so feel free to add your own scenarios and steps.
To run the tests, there is already a rake task set up. Just type:
rake features API_KEY='my_api_key'
git checkout -b my-new-feature
)git commit -am 'Add some feature'
)git push origin my-new-feature
)##Changelog
FAQs
Unknown package
We found that dover_to_calais demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.
Security News
A Stanford study reveals 9.5% of engineers contribute almost nothing, costing tech $90B annually, with remote work fueling the rise of "ghost engineers."
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.