calais-entity-extractor
Advanced tools
Comparing version 1.0.0 to 1.1.0
@@ -0,6 +1,10 @@ | ||
//Sample plaintext. | ||
var text = "The gravity of Volkswagen's emissions scandal is sinking in as the automaker's new boss warned that the company is planning comprehensive cuts to navigate the costly crisis. We will review all planned investments, and what isn’t absolutely vital will be canceled or delayed,' Volkswagen CEO Matthias Mueller told workers at Volkswagen's headquarters in Wolfsburg, Germany. 'And that’s why we will re-adjust our efficiency program. I will be completely clear: this won’t be painless. Mueller's comment — which was made in German and confirmed by an English speaking spokesman for this story — reflects the depth of the financial crisis Volkswagen is facing after it admitted to rigging 11 million diesel cars worldwide with software that allowed cars to cheat emissions regulations. The cost cuts come amid swirling speculation over the ultimate pricettag of the scandal, which is expected to be much higher than the $7 billion Volkswagen has already set aside. In the U.S., where only 482,000 cars were fitted with the 'defeat device' software, fines and the cost of vehicle buybacks could exceed 15 billion euros, or nearly $17 billion, Sanford C. Bernstein analyst Max Warburton said Monday in a research note. The company — which passed Toyota as the world's largest automaker for the first six months of 2015 — made a global operating profit of 12.7 billion euros in 2014. Warburton posited that the scandal may not be as grave in Europe, where, he said, cars may be fitted with the cheating software yet still compliant with local emissions regulations. European standards on nitrogen oxide emissions — which can exacerbate respiratory conditions such as asthma — are not as stringent as U.S. standards. Still, the financial ripple effects of the crisis are 'potentially terrifying' if the scandal mushrooms in Europe, Warburton said. Industry observers have theorized it could reach the $30 billion range, although speculation varies widely. It was not immediately clear how cost-cuts would affect Volkswagen's only U.S. factory in Chattanooga, Tenn. A U.S.-based spokeswoman did not respond to a request for information. According to a copy of Mueller's prepared remarks, the CEO reiterated that 'it is still not possible to quantify' how much the scandal will cost. But he vowed to ensure the company's survival and regain the public's trust. 'We can and we will overcome this crisis, because Volkswagen is a group with a strong foundation,' He said. 'And above all because we have the best automobile team anyone could wish for.' Mueller admitted, however, that he does not know the full extent of the scandal, which dates back to 2009 models and led to the resignation of his predecessor, Martin Winterkorn. 'Believe me — like you, I am impatient,' he said. Volkswagen is expected to propose fixes to regulators this month. Mueller said software updates will be enough for many vehicles, but others will require hardware upgrades. In his remarks, Mueller signaled that the company won't reduce its commitment to new product development. 'We cannot afford to economize on the future,' he said. 'That is something else we will also be addressing over the coming weeks and months.' The automaker is facing a litany of investigations and lawsuits over the scandal, including a U.S. Justice Department criminal probe and numerous class-action suits filed by consumers. It has also ordered a outside law firm to conduct an investigation of its handling of the matter."; | ||
var Calais = require('./lib/calais-entity-extractor.js').Calais; | ||
var calais = new Calais('API KEY GOES HERE'); //See valid options below | ||
var Calais = require('calais-entity-extractor').Calais; | ||
//You can enter options as the second parameter. | ||
var calais = new Calais('ENTER API KEY HERE'); | ||
// You can set options after the constructor using .set(option, value). The example below sets | ||
@@ -11,3 +15,5 @@ // the text that we want to analyze. | ||
calais.extract(function(result, err) { //perform the request | ||
var util = require('util'); //for printing the results. | ||
calais.extractFromText(function(result, err) { //perform the request | ||
if (err) { | ||
@@ -18,4 +24,2 @@ console.log('Uh oh, we got an error! : ' + err); | ||
//Take a look at the results! | ||
var util = require('util'); | ||
@@ -29,2 +33,21 @@ //The results have two fields: 'entities' and 'tags' | ||
console.log('\nTags: ' + util.inspect(result.tags, false, null)); | ||
//Now lets try analyzing a webpage. We supply a URL. | ||
calais.extractFromUrl('http://www.reuters.com/article/2015/10/07/us-iran-us-talks-idUSKCN0S10P220151007', function(result, err) { | ||
if (err) { | ||
console.log('Uh oh, we got an error! : ' + err); | ||
return; | ||
} | ||
//The results have the same format as the extractFromText function. | ||
//'entities' contains a list of the detected entities, and gives basic info & confidence | ||
console.log('Entities: ' + util.inspect(result.entities, false, null)); | ||
//'tags' are a list of string tags (the "socialTags" from Calais). | ||
console.log('\nTags: ' + util.inspect(result.tags, false, null)); | ||
}); | ||
}); |
@@ -40,2 +40,54 @@ var request = require('request'); | ||
_parseCalaisData: function(result, minConfidence) { | ||
var entities = [ ]; | ||
var tags = [ ]; | ||
for(var i in result) { | ||
var p = result[i]; | ||
for (var key in p) { | ||
if (p.hasOwnProperty(key) && (key == "name")) { | ||
if (!p.hasOwnProperty('_typeGroup')) | ||
continue; | ||
if (p._typeGroup === 'socialTag') { | ||
var name = p[key]; | ||
tags.push(name); | ||
} else if (p._typeGroup == 'entities') { //if it's an entity, grab that | ||
if (p.hasOwnProperty('_type')) { | ||
var type = p._type; | ||
var confidenceLevel = 0.0; | ||
var name = ""; | ||
var fullName = ""; | ||
if (p.hasOwnProperty('confidencelevel')) | ||
confidenceLevel = p.confidencelevel; | ||
if (p.hasOwnProperty('resolutions')) { | ||
name = p[key]; | ||
fullName = p.resolutions[0].name; | ||
} else | ||
name = p[key]; | ||
//No further full name? Use the 'short' name | ||
if (fullName.length == 0) | ||
fullName = name; | ||
if (confidenceLevel >= minConfidence) | ||
entities.push({ | ||
'type': type, | ||
'name': name, | ||
'fullName': fullName, | ||
'confidence': confidenceLevel | ||
}); | ||
} | ||
} | ||
} | ||
} | ||
} | ||
return {'entities' : entities, 'tags' : tags }; | ||
}, | ||
set: function (key, value) { | ||
@@ -50,4 +102,13 @@ this.options[key] = value; | ||
//cb = function(resultData, error); | ||
extract: function (cb) { | ||
/** | ||
* Perform the analysis request with Calais. If no |text| is given or |text| is empty, | ||
* then we fall back to the set options.content value. If that is also empty, an error is | ||
* returned. | ||
* | ||
* @param cb Callback function of form function(resultData, error); | ||
* @param text Optional, the text to perform extraction on. If not set, the options.content | ||
* value is used. | ||
* @returns nothing | ||
*/ | ||
extractFromText: function (cb, text) { | ||
var calais = this; | ||
@@ -58,3 +119,8 @@ | ||
var outputFormat = calais.options.outputFormat; | ||
if (this._undefinedOrNull(text) || typeof text != 'string' || text.length == 0) | ||
text = this.options.content; | ||
if (this._undefinedOrNull(text) || typeof text != 'string' || text.length == 0) | ||
return cb({}, 'No text given in options or parameter'); | ||
var params = { | ||
@@ -65,4 +131,4 @@ 'Host' : calais.options.apiHost, | ||
'Content-Type' : calais.options.contentType, | ||
'Accept' : outputFormat, | ||
'Content-Length' : calais.options.content.length, | ||
'Accept' : 'application/json', | ||
'Content-Length' : text.length, | ||
'OutputFormat' : 'application/json' | ||
@@ -73,5 +139,5 @@ } | ||
var options = { | ||
uri : 'https://' + this.options.apiHost + this.options.apiPath, | ||
uri : 'https://' + calais.options.apiHost + calais.options.apiPath, | ||
method : 'POST', | ||
body : this.options.content, | ||
body : text, | ||
headers: params | ||
@@ -90,62 +156,84 @@ }; | ||
// take note of whether Javascript object output was requested | ||
var jsOutput = (calais.options.outputFormat === 'object'); | ||
// parse to a Javascript object if requested | ||
var result = (jsOutput) ? JSON.parse(calaisData) : calaisData; | ||
var result = JSON.parse(calaisData); | ||
result = (typeof result === 'string') ? JSON.parse(result) : result; | ||
var entities = [ ]; | ||
var tags = [ ]; | ||
for(var i in result) { | ||
var p = result[i]; | ||
var parsedResult = calais._parseCalaisData(result, calais.options.minConfidence); | ||
for (var key in p) { | ||
if (p.hasOwnProperty(key) && (key == "name")) { | ||
if (!p.hasOwnProperty('_typeGroup')) | ||
continue; | ||
return cb(parsedResult, calais.errors); | ||
} else | ||
return cb({}, 'Request error: ' + (typeof response === 'string' ? response : JSON.stringify(response))); | ||
if (p._typeGroup === 'socialTag') { | ||
var name = p[key]; | ||
tags.push(name); | ||
} else if (p._typeGroup == 'entities') { //if it's an entity, grab that | ||
}); | ||
}, | ||
if (p.hasOwnProperty('_type')) { | ||
var type = p._type; | ||
var confidenceLevel = 0.0; | ||
var name = ""; | ||
var fullName = ""; | ||
/** | ||
* Extract tags and entities from a given URL. We download the HTML from the URL, and submit | ||
* that to Calais using the extractFromText function | ||
* | ||
* @param url The URL to analyze. | ||
* @param cb The callback function, of form function(result, error) | ||
*/ | ||
extractFromUrl: function(url, cb) { | ||
var calais = this; | ||
if (p.hasOwnProperty('confidencelevel')) | ||
confidenceLevel = p.confidencelevel; | ||
if (!calais.validateOptions()) | ||
return cb({}, 'Bad options'); | ||
if (p.hasOwnProperty('resolutions')) { | ||
name = p[key]; | ||
fullName = p.resolutions[0].name; | ||
//Make sure we were given a URL. | ||
if (this._undefinedOrNull(url) || typeof url != 'string' || url.length == 0) | ||
return cb({}, 'No URL given.'); | ||
} else | ||
name = p[key]; | ||
//Make sure it's a valid URL. | ||
if (!(/^(https?|ftp):\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*@)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|\/|\?)*)?$/i.test(url))) | ||
return cb({}, 'Bad URL'); | ||
//No further full name? Use the 'short' name | ||
if (fullName.length == 0) | ||
fullName = name; | ||
if (confidenceLevel >= calais.options.minConfidence) | ||
entities.push({ | ||
'type': type, | ||
'name': name, | ||
'fullName': fullName, | ||
'confidence': confidenceLevel | ||
}); | ||
} | ||
} | ||
} | ||
} | ||
} | ||
request(url, function(error, response, html) { | ||
if (error) | ||
return cb({}, error); | ||
return cb({'entities' : entities, 'tags' : tags }, calais.errors); | ||
} else | ||
return cb({}, 'Request error: ' + (typeof response === 'string' ? response : JSON.stringify(response))); | ||
//We can upload the html directly to Calais if we set the contentType as text/html | ||
var params = { | ||
'Host' : calais.options.apiHost, | ||
'x-ag-access-token' : calais.apiKey, | ||
'x-calais-language' : calais.options.language, | ||
'Content-Type' : 'text/html', | ||
'Accept' : 'application/json', | ||
'Content-Length' : html.length, | ||
'OutputFormat' : 'application/json' | ||
}; | ||
var options = { | ||
uri : 'https://' + calais.options.apiHost + calais.options.apiPath, | ||
method : 'POST', | ||
body : html, | ||
headers: params | ||
}; | ||
request(options, function(error, response, calaisData) { | ||
if (error) | ||
return cb({}, error); | ||
if (response === undefined) { | ||
return cb({}, 'Undefined Calais response'); | ||
} else if (response.statusCode === 200) { | ||
// parse to a Javascript object if requested | ||
var result = JSON.parse(calaisData); | ||
result = (typeof result === 'string') ? JSON.parse(result) : result; | ||
var parsedResult = calais._parseCalaisData(result, calais.options.minConfidence); | ||
return cb(parsedResult, calais.errors); | ||
} else | ||
return cb({}, 'Request error: ' + (typeof response === 'string' ? response : JSON.stringify(response))); | ||
}); | ||
}); | ||
} | ||
@@ -152,0 +240,0 @@ }; |
{ | ||
"name": "calais-entity-extractor", | ||
"version": "1.0.0", | ||
"version": "1.1.0", | ||
"description": "Extract entities from text using Open Calais.", | ||
@@ -5,0 +5,0 @@ "scripts": { |
calais-entity-extractor | ||
======================= | ||
An npm package that provides an easy way to extract entities from blocks of text using Open Calais. A valid Calais key is required. You can get a free one at the [Open Calais site](http://new.opencalais.com). | ||
An npm package that provides an easy way to extract entities from blocks of text using Open Calais. A valid Calais key is required. You can get a free one at the [Open Calais site](http://new.opencalais.com). This module was inspired by [node-calais](https://github.com/mcantelon/node-calais), but that project doesn't (as of 10/6/2015) support the Calais API changes. | ||
@@ -23,3 +23,3 @@ We perform *named entity recognition* and output clean *entity markup tags* and *socialTags* in JSON. | ||
calais.extract(function(result, err) { //perform the request | ||
calais.extractFromText(function(result, err) { //perform the request | ||
if (err) { | ||
@@ -89,2 +89,13 @@ console.log('Uh oh, we got an error! : ' + err); | ||
We also support analyzing text directly from webpages. Set up the `calais` objects just like in | ||
the previous example, and perform a query like this: | ||
calais.extractFromUrl(url, function(result, err) { | ||
... | ||
}); | ||
The results are returned in the same way as the extractFromText method. | ||
For working examples, see *example.js* | ||
## Tests | ||
@@ -91,0 +102,0 @@ |
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
20359
236
106