Socket
Socket
Sign inDemoInstall

twitter-scraper-selenium

Package Overview
Dependencies
6
Maintainers
1
Alerts
File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

    twitter-scraper-selenium

Python package to scrap twitter's front-end easily with selenium


Maintainers
1

Readme

Twitter scraper selenium

Python's package to scrape Twitter's front-end easily with selenium.

PyPI license Python >=3.6.9 Maintenance

Table of Contents

Table of Contents
  1. Getting Started
  2. Usage
  3. Privacy
  4. License


Prerequisites

  • Internet Connection
  • Python 3.6+
  • Chrome or Firefox browser installed on your machine

  • Installation

    Installing from the source

    Download the source code or clone it with:

    git clone https://github.com/shaikhsajid1111/twitter-scraper-selenium
    

    Open terminal inside the downloaded folder:


     python3 setup.py install
    

    Installing with PyPI

    pip3 install twitter-scraper-selenium
    

    Usage

    Available Function In this Package - Summary

    Function NameFunction DescriptionScraping MethodScraping Speed
    scrape_profile()Scrape's Twitter user's profile tweetsBrowser AutomationSlow
    get_profile_details()Scrape's Twitter user details.HTTP RequestFast
    scrape_profile_with_api()Scrape's Twitter tweets by twitter profile username. It expects the username of the profileBrowser Automation & HTTP RequestFast

    Note: HTTP Request Method sends the request to Twitter's API directly for scraping data, and Browser Automation visits that page, scroll while collecting the data.



    To scrape twitter profile details:

    from twitter_scraper_selenium import get_profile_details
    
    twitter_username = "TwitterAPI"
    filename = "twitter_api_data"
    browser = "firefox"
    headless = True
    get_profile_details(twitter_username=twitter_username, filename=filename, browser=browser, headless=headless)
    
    

    Output:

    {
    	"id": 6253282,
    	"id_str": "6253282",
    	"name": "Twitter API",
    	"screen_name": "TwitterAPI",
    	"location": "San Francisco, CA",
    	"profile_location": null,
    	"description": "The Real Twitter API. Tweets about API changes, service issues and our Developer Platform. Don't get an answer? It's on my website.",
    	"url": "https:\/\/t.co\/8IkCzCDr19",
    	"entities": {
    		"url": {
    			"urls": [{
    				"url": "https:\/\/t.co\/8IkCzCDr19",
    				"expanded_url": "https:\/\/developer.twitter.com",
    				"display_url": "developer.twitter.com",
    				"indices": [
    					0,
    					23
    				]
    			}]
    		},
    		"description": {
    			"urls": []
    		}
    	},
    	"protected": false,
    	"followers_count": 6133636,
    	"friends_count": 12,
    	"listed_count": 12936,
    	"created_at": "Wed May 23 06:01:13 +0000 2007",
    	"favourites_count": 31,
    	"utc_offset": null,
    	"time_zone": null,
    	"geo_enabled": null,
    	"verified": true,
    	"statuses_count": 3656,
    	"lang": null,
    	"contributors_enabled": null,
    	"is_translator": null,
    	"is_translation_enabled": null,
    	"profile_background_color": null,
    	"profile_background_image_url": null,
    	"profile_background_image_url_https": null,
    	"profile_background_tile": null,
    	"profile_image_url": null,
    	"profile_image_url_https": "https:\/\/pbs.twimg.com\/profile_images\/942858479592554497\/BbazLO9L_normal.jpg",
    	"profile_banner_url": null,
    	"profile_link_color": null,
    	"profile_sidebar_border_color": null,
    	"profile_sidebar_fill_color": null,
    	"profile_text_color": null,
    	"profile_use_background_image": null,
    	"has_extended_profile": null,
    	"default_profile": false,
    	"default_profile_image": false,
    	"following": null,
    	"follow_request_sent": null,
    	"notifications": null,
    	"translator_type": null
    }
    

    get_profile_details() arguments:

    ArgumentArgument TypeDescription
    twitter_usernameStringTwitter Username
    output_filenameStringWhat should be the filename where output is stored?.
    output_dirStringWhat directory output file should be saved?
    proxyStringOptional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.


    Keys of the output:

    Detail of each key can be found here.


    To scrape profile's tweets:

    In JSON format:

    from twitter_scraper_selenium import scrape_profile
    
    microsoft = scrape_profile(twitter_username="microsoft",output_format="json",browser="firefox",tweets_count=10)
    print(microsoft)
    

    Output:

    {
      "1430938749840629773": {
        "tweet_id": "1430938749840629773",
        "username": "Microsoft",
        "name": "Microsoft",
        "profile_picture": "https://twitter.com/Microsoft/photo",
        "replies": 29,
        "retweets": 58,
        "likes": 453,
        "is_retweet": false,
        "retweet_link": "",
        "posted_time": "2021-08-26T17:02:38+00:00",
        "content": "Easy to use and efficient for all \u2013 Windows 11 is committed to an accessible future.\n\nHere's how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW ",
        "hashtags": [],
        "mentions": [],
        "images": [],
        "videos": [],
        "tweet_url": "https://twitter.com/Microsoft/status/1430938749840629773",
        "link": "https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC"
      },...
    }
    

    In CSV format:

    from twitter_scraper_selenium import scrape_profile
    
    
    scrape_profile(twitter_username="microsoft",output_format="csv",browser="firefox",tweets_count=10,filename="microsoft",directory="/home/user/Downloads")
    
    
    

    Output:

    tweet_idusernamenameprofile_picturerepliesretweetslikesis_retweetretweet_linkposted_timecontenthashtagsmentionsimagesvideospost_urllink
    1430938749840629773MicrosoftMicrosofthttps://twitter.com/Microsoft/photo6475521False 2021-08-26T17:02:38+00:00Easy to use and efficient for all – Windows 11 is committed to an accessible future.

    Here's how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW
    [][][][]https://twitter.com/Microsoft/status/1430938749840629773https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC

    ...



    scrape_profile() arguments:

    ArgumentArgument TypeDescription
    twitter_usernameStringTwitter username of the account
    browserStringWhich browser to use for scraping?, Only 2 are supported Chrome and Firefox. Default is set to Firefox
    proxyStringOptional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.
    tweets_countIntegerNumber of posts to scrape. Default is 10.
    output_formatStringThe output format, whether JSON or CSV. Default is JSON.
    filenameStringIf output parameter is set to CSV, then it is necessary for filename parameter to passed. If not passed then the filename will be same as username passed.
    directoryStringIf output_format parameter is set to CSV, then it is valid for directory parameter to be passed. If not passed then CSV file will be saved in current working directory.
    headlessBooleanWhether to run crawler headlessly?. Default is True


    Keys of the output

    KeyTypeDescription
    tweet_idStringPost Identifier(integer casted inside string)
    usernameStringUsername of the profile
    nameStringName of the profile
    profile_pictureStringProfile Picture link
    repliesIntegerNumber of replies of tweet
    retweetsIntegerNumber of retweets of tweet
    likesIntegerNumber of likes of tweet
    is_retweetbooleanIs the tweet a retweet?
    retweet_linkStringIf it is retweet, then the retweet link else it'll be empty string
    posted_timeStringTime when tweet was posted in ISO 8601 format
    contentStringcontent of tweet as text
    hashtagsArrayHashtags presents in tweet, if they're present in tweet
    mentionsArrayMentions presents in tweet, if they're present in tweet
    imagesArrayImages links, if they're present in tweet
    videosArrayVideos links, if they're present in tweet
    tweet_urlStringURL of the tweet
    linkStringIf any link is present inside tweet for some external website.


    To Scrap profile's tweets with API:

    from twitter_scraper_selenium import scrape_profile_with_api
    
    scrape_profile_with_api('elonmusk', output_filename='musk', tweets_count= 100)
    

    scrape_profile_with_api() Arguments:

    ArgumentArgument TypeDescription
    usernameStringTwitter's Profile username
    tweets_countIntegerNumber of tweets to scrape.
    output_filenameStringWhat should be the filename where output is stored?.
    output_dirStringWhat directory output file should be saved?
    proxyStringOptional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.
    browserStringWhich browser to use for extracting out graphql key. Default is firefox.
    headlessStringWhether to run browser in headless mode?

    Output:

    {
      "1608939190548598784": {
        "tweet_url" : "https://twitter.com/elonmusk/status/1608939190548598784",
        "tweet_details":{
          ...
        },
        "user_details":{
          ...
        }
      }, ...
    }
    


    Using scraper with proxy (http proxy)

    Just pass proxy argument to function.

    from twitter_scraper_selenium import scrape_profile
    
    scrape_profile("elonmusk", headless=False, proxy="66.115.38.247:5678", output_format="csv",filename="musk") #In IP:PORT format
    
    

    Proxy that requires authentication:

    
    from twitter_scraper_selenium import scrape_profile
    
    microsoft_data = scrape_profile(twitter_username="microsoft", browser="chrome", tweets_count=10, output="json",
                          proxy="sajid:pass123@66.115.38.247:5678")  #  username:password@IP:PORT
    print(microsoft_data)
    
    
    


    Privacy

    This scraper only scrapes public data available to unauthenticated user and does not holds the capability to scrape anything private.



    LICENSE

    MIT

    Keywords

    FAQs


    Did you know?

    Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

    Install

    Related posts

    SocketSocket SOC 2 Logo

    Product

    • Package Alerts
    • Integrations
    • Docs
    • Pricing
    • FAQ
    • Roadmap

    Stay in touch

    Get open source security insights delivered straight into your inbox.


    • Terms
    • Privacy
    • Security

    Made with ⚡️ by Socket Inc