Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

facebook-page-scraper

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

facebook-page-scraper

Python package to scrap facebook's pages front end with no limitations

5.0.6
PyPI

Maintainers: 1

Facebook Page Scraper

No need of API key, No limitation on number of requests. Import the library and Just Do It !

Table of Contents

Table of Contents

Getting Started
Prerequisites
Installation
Installing from source
Installing with PyPI

Usage
How to instantiate?
Parameters for Facebook_scraper()
Scrape in JSON format
JSON Output Format

Scrape in CSV format
Parameters for scrape_to_csv() method

Keys of the output data
Tech
License

Prerequisites

Internet Connection
Python 3.7+
Chrome or Firefox browser installed on your machine

Installation:

Installing from source:

git clone https://github.com/shaikhsajid1111/facebook_page_scraper

Inside project's directory

python3 setup.py install

Installing with pypi

pip3 install facebook-page-scraper

How to use?

#import Facebook_scraper class from facebook_page_scraper from facebook_page_scraper import Facebook_scraper #instantiate the Facebook_scraper class page_or_group_name = "Meta" posts_count = 10 browser = "firefox" proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT timeout = 600 #600 seconds headless = True # get env password fb_password = os.getenv('fb_password') fb_email = os.getenv('fb_email') # indicates if the Facebook target is a FB group or FB page isGroup= False meta_ai = Facebook_scraper(page_or_group_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless, isGroup=isGroup)

Parameters for Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless) class

Parameter Name Parameter Type Description
page_or_group_name String Name of the facebook page or group
posts_count Integer Number of posts to scrap, if not passed default is 10
browser String Which browser to use, either chrome or firefox. if not passed,default is chrome
proxy(optional) String Optional argument, if user wants to set proxy, if proxy requires authentication then the format will be user:password@IP:PORT
timeout Integer The maximum amount of time the bot should run for. If not passed, the default timeout is set to 10 minutes
headless Boolean Whether to run browser in headless mode?. Default is True
isGroup Boolean Whether the Facebook target is a group or page. Default is False
username String username to log into Facebook when scraping (recommended to use .env)
password String password to log into Facebook when scraping (recommended to use .env)

⚠️ Warning: Use Logged-In Scraping at Your Own Risk ⚠️
Using logged-in scraping methods may result in the permanent suspension of your account. Proceed with caution, as violating a platform's terms of service can lead to severe consequences. Exercise discretion and adhere to ethical practices when collecting data through scraping. The library/provider assumes no responsibility for any consequences resulting from the misuse of scraping methods.

Done with instantiation?. Let the scraping begin!

For post's data in JSON format:

#call the scrap_to_json() method json_data = meta_ai.scrap_to_json() print(json_data)

Output:

{ "2024182624425347": { "name": "Meta AI", "shares": 0, "reactions": { "likes": 154, "loves": 19, "wow": 0, "cares": 0, "sad": 0, "angry": 0, "haha": 0 }, "reaction_count": 173, "comments": 2, "content": "We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code: https://ai.facebook.com/…/the-first-high-performance-self-s…", "posted_on": "2022-01-20T22:43:35", "video": [], "image": [ "https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71" ], "post_url": "https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARBoSaQ-pAC_ApucZNHZ6R-BI3YUSjH4sXsfdZRQ2zZFOwgWGhjt6dmg0VOcmGCLhSFyXpecOY9g1A94vrzU_T-GtYFagqDkJjHuhoyPW2vnkn7fvfzx-ql7fsBYxL5DgQVSsiC1cPoycdCvHmi6BV5Sc4fKADdgDhdFvVvr-ttzXG1ng2DbLzU-XfSes7SAnrPs-gxjODPKJ7AdqkqkSQJ4HrsLgxMgcLFdCsE6feWL7rXjptVWegMVMthhJNVqO0JHu986XBfKKqB60aBFvyAzTSEwJD6o72GtnyzQ-BcH7JxmLtb2_A&__tn__=-R" }, ... }

Output Structure for JSON format:
{ "id": { "name": string, "shares": integer, "reactions": { "likes": integer, "loves": integer, "wow": integer, "cares": integer, "sad": integer, "angry": integer, "haha": integer }, "reaction_count": integer, "comments": integer, "content": string, "video" : list, "image" : list, "posted_on": datetime, //string containing datetime in ISO 8601 "post_url": string } }

For saving post's data directly to CSV file

#call scrap_to_csv(filename,directory) method filename = "data_file" #file name without CSV extension,where data will be saved directory = "E:\data" #directory where CSV file will be saved meta_ai.scrap_to_csv(filename, directory)

content of data_file.csv:

id,name,shares,likes,loves,wow,cares,sad,angry,haha,reactions_count,comments,content,posted_on,video,image,post_url 2024182624425347,Meta AI,0,154,19,0,0,0,0,0,173,2,"We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code: https://ai.facebook.com/…/the-first-high-performance-self-s…",2022-01-20T22:43:35,,https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71,https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARAse4eiZmZQDOZumNZEDR0tQkE5B6g50K6S66JJPccb-KaWJWg6Yz4v19BQFSZRMd04MeBmV24VqvqMB3oyjAwMDJUtpmgkMiITtSP8HOgy8QEx_vFlq1j-UEImZkzeEgSAJYINndnR5aSQn0GUwL54L3x2BsxEqL1lElL7SnHfTVvIFUDyNfAqUWIsXrkI8X5KjoDchUj7aHRga1HB5EE0x60dZcHogUMb1sJDRmKCcx8xisRgk5XzdZKCQDDdEkUqN-Ch9_NYTMtxlchz1KfR0w9wRt8y9l7E7BNhfLrmm4qyxo-ZpA&__tn__=-R ...

Parameters for scrap_to_csv(filename, directory) method.

Parameter Name Parameter Type Description
filename String Name of the CSV file where post's data will be saved
directory String Directory where CSV file have to be stored.

Keys of the outputs:

Key Type Description

id String Post Identifier(integer casted inside string)
name String Name of the page
shares Integer Share count of post
reactions Dictionary Dictionary containing reactions as keys and its count as value. Keys => ["likes","loves","wow","cares","sad","angry","haha"]
reaction_count Integer Total reaction count of post
comments Integer Comments count of post
content String Content of post as text
video List URLs of video present in that post
images List List containing URLs of all images present in the post
posted_on Datetime Time at which post was posted(in ISO 8601 format)
post_url String URL for that post

Tech

This project uses different libraries to work properly.

Selenium
Webdriver Manager
Python Dateutil
Selenium-wire

If you encounter anything unusual please feel free to create issue here

LICENSE
MIT


Parameter Name	Parameter Type	Description
page_or_group_name	String	Name of the facebook page or group
posts_count	Integer	Number of posts to scrap, if not passed default is 10
browser	String	Which browser to use, either chrome or firefox. if not passed,default is chrome
proxy(optional)	String	Optional argument, if user wants to set proxy, if proxy requires authentication then the format will be `user:password@IP:PORT`
timeout	Integer	The maximum amount of time the bot should run for. If not passed, the default timeout is set to 10 minutes
headless	Boolean	Whether to run browser in headless mode?. Default is True
isGroup	Boolean	Whether the Facebook target is a group or page. Default is False
username	String	username to log into Facebook when scraping (recommended to use .env)
password	String	password to log into Facebook when scraping (recommended to use .env)


Parameter Name	Parameter Type	Description
filename	String	Name of the CSV file where post's data will be saved
directory	String	Directory where CSV file have to be stored.


Key	Type	Description

id	String	Post Identifier(integer casted inside string)
name	String	Name of the page
shares	Integer	Share count of post
reactions	Dictionary	Dictionary containing reactions as keys and its count as value. Keys => `["likes","loves","wow","cares","sad","angry","haha"]`
reaction_count	Integer	Total reaction count of post
comments	Integer	Comments count of post
content	String	Content of post as text
video	List	URLs of video present in that post
images	List	List containing URLs of all images present in the post
posted_on	Datetime	Time at which post was posted(in ISO 8601 format)
post_url	String	URL for that post

Keywords

web-scraping selenium facebook facebook-pages

FAQs

What is facebook-page-scraper?

Is facebook-page-scraper well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

facebook-page-scraper

Facebook Page Scraper

Table of Contents

Prerequisites

Installation:

Installing from source:

Inside project's directory

How to use?

Parameters for `Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless)` class

Done with instantiation?. Let the scraping begin!

For post's data in JSON format:

For saving post's data directly to CSV file

Parameters for `scrap_to_csv(filename, directory)` method.

Keys of the outputs:

Tech

LICENSE

Keywords

Related posts

facebook-page-scraper

Facebook Page Scraper

Table of Contents

Prerequisites

Installation:

Installing from source:

Inside project's directory

How to use?

Parameters for Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless) class

Done with instantiation?. Let the scraping begin!

For post's data in JSON format:

For saving post's data directly to CSV file

Parameters for scrap_to_csv(filename, directory) method.

Keys of the outputs:

Tech

LICENSE

Keywords

Related posts

Malicious npm Package Typosquats Popular TypeScript ESLint Plugin, Exfiltrates Data and Enables Remote Exploitation

Ultralytics PyPI Package Compromised Through GitHub Actions Cache Poisoning

Parameters for `Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless)` class

Parameters for `scrap_to_csv(filename, directory)` method.