
Security News
NVD Quietly Sweeps 100K+ CVEs Into a “Deferred” Black Hole
NVD now marks all pre-2018 CVEs as "Deferred," signaling it will no longer enrich older vulnerabilities, further eroding trust in its data.
A suite of tools for collecting, pre-processing, analyzing and sentiment-scoring twitter data. A additional brief walkthrough can be found here.
Install:
pip install twitter-nlp-toolkit
To utilize the sentiment analysis package, you will also need to install SpaCy's small English language model.
python -m spacy download en_core_web_sm
While the package is still under active development, the following functionality is expected to be stable:
twitter_nlp_toolkit.twitter_listener is the listener module, which can be used to monitor Twitter and stream tweets to disk in .json format.
keywords = ["python"]
stream = twitter_listener.TwitterStreamListener(**credentials)
stream.collect_from_stream(max_tweets=10,output_json_name="python_tweets.json", target_words=keywords)
"keywords" uses the Twitter API. Documentation and tips for setting up smart keyword queries can be found here
"credentials" contains your Twitter API key, which can be obtained for free here
The module also contains a parser to convert the .json-formatted tweets into .csv for easy use (ie, with Pandas) or convert straight to a pandas dataframe.
parser = tweet_json_parser.json_parser()
parser.stream_json_file(json_file_name="python_tweets.json",output_file_name="parsed_python_tweets.csv")
parser = tweet_json_parser.json_parser()
df = parser.parse_json_file_into_dataframe(json_file_name="ai_tweets.json")
twitter_nlp_toolkit.twitter_REST_downloader is the bulk download module, which can be used to collect the last 200 (or so) tweets from a single user.
downloader = twitter_REST_downloader.bulk_downloader(**credentials)
downloader.get_tweets_csv_for_this_user("@nytimes","nyt_tweet_output.csv")
twitter_nlp_toolkit.tweet_sentiment_classifier is the sentiment analysis module, which can be used to classify the sentiment of tweets.
Classifier = tweet_sentiment_classifier.SentimentAnalyzer()
Classifier.load_small_ensemble()
Classifier.predict(['I am happy', 'I am sad', 'I am cheerful', 'I am mad']) # will return [1, 0, 1, 0]
Currently only two ensembles are provided: the small ensemble, which uses bag-of-words logistic regression model and two long short-term memory neural networks, and the large ensemble, which uses the bog-of-words model, two larger LSTM networks, and a Google BERT model. The large ensemble is more accurate (and expected to become much more accurate), but is extremely resource intensive and as such, isn't recommended for processing large numbers of tweets unless you have a powerful GPU.
These ensembles were trained primarily on the Sent140 dataset and primarily tested against the US Airlines dataset previously hosted on Crowdflower.com.
Please see the jupyter notebook (.ipynb) files at the root directory for further demonstrations of working code.
If you have domain-specific training data, you can refine the ensembles:
Classifier.refine(train_x, train_y)
Classifier.save_models()
# To reload your saved models, you can run Classifier.load_models()
Other advanced use, such as building your own models, is possible but is not currently recommended as the models are still in development. Further documentation will be added once development stabilizes.
The developers are always open to feature requests, bugs reports, pull requests, and new opportunities to collaborate. Don't hesitate to reach out with questions, beedback, or requests.
Developers:
Moe Antar (@Moe520)
Eric Schibli (@eschibli)
FAQs
Tools for collecting , processing and analyzing twitter data
We found that twitter-nlp-toolkit demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
NVD now marks all pre-2018 CVEs as "Deferred," signaling it will no longer enrich older vulnerabilities, further eroding trust in its data.
Research
Security News
Lazarus-linked threat actors expand their npm malware campaign with new RAT loaders, hex obfuscation, and over 5,600 downloads across 11 packages.
Security News
Safari 18.4 adds support for Iterator Helpers and two other TC39 JavaScript features, bringing full cross-browser coverage to key parts of the ECMAScript spec.