instamancer

Package Overview

Dependencies

Maintainers

Versions

Alerts

File Explorer

Advanced tools

License

Install Socket

Detect and block malicious and high-risk dependencies

Install

instamancer

Scrape the Instagram API with Puppeteer

latest

Source

npm

Version: 3.3.1

Version published: 5 years ago

Maintainers: 1

Created: 7 years ago

Source

Instamancer

Scrape Instagram's API with Puppeteer.

Install | Usage | Comparison | Website | FAQ | Examples

Instamancer is a new type of scraping tool that leverages Puppeteer's ability to intercept requests made by a webpage to an API.

Read more about how Instamancer works here.

Features

Scrape hashtags, users' posts, and individual posts
Download images, albums, and videos
Output JSON, CSV
Batch scraping
Search hashtags, users, and locations
API response validation
Upload files to S3 and depot
Plugins

Data

Metadata that Instamancer is able to gather from posts:

Text
Timestamps
Tagged users
Accessibility captions
Like counts
Comment counts
Images (Thumbnails, Dimensions, URLs)
Videos (URL, View count, Duration)
Comments (Timestamp, Text, Like count, User)
User (Username, Full name, Profile picture, Profile privacy)
Location (Name, Street, Zip code, City, Region, Country)
Sponsored status
Gating information
Fact checking information

Install

Linux

Enable user namespace cloning:

sysctl -w kernel.unprivileged_userns_clone=1

Or run without a sandbox:

# WARNING: unsafe
export NO_SANDBOX=true

See Puppeteer troubleshooting

Without downloading chromium

If you wish to install Instamancer without downloading chromium, enable the PUPPETEER_SKIP_CHROMIUM_DOWNLOAD environment variable before installation

export PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true

From NPM

npm install -g instamancer

If you're using root to install globally, use the following command to install the Puppeteer dependency

sudo npm install -g instamancer --unsafe-perm=true

From NPX

npx instamancer

From this repository

git clone https://github.com/ScriptSmith/instamancer.git
cd instamancer
npm install
npm run build
npm install -g

Usage

Command Line

$ instamancer
Usage: instamancer <command> [options]

Commands:
  instamancer hashtag [id]       Scrape a hashtag
  instamancer user [id]          Scrape a users posts
  instamancer post [ids]         Scrape a comma-separated list of posts
  instamancer search [query]     Perform a search of users, tags and places
  instamancer batch [batchfile]  Read newline-separated arguments from a file

Configuration
  --count, -c    Number of posts to download (0 for all)   [number] [default: 0]
  --full, -f     Retrieve full post data              [boolean] [default: false]
  --sleep, -s    Seconds to sleep between interactions     [number] [default: 2]
  --graft, -g    Enable grafting                       [boolean] [default: true]
  --browser, -b  Browser path. Defaults to the puppeteer version        [string]
  --sameBrowser  Use a single browser when grafting   [boolean] [default: false]

Download
  --download, -d      Save images from posts          [boolean] [default: false]
  --downdir           Download path       [default: "downloads/[endpoint]/[id]"]
  --video, -v         Download videos (requires full) [boolean] [default: false]
  --sync              Force download between requests [boolean] [default: false]
  --threads, -k       Parallel download / depot threads    [number] [default: 4]
  --waitDownload, -w  Download media after scraping   [boolean] [default: false]

Upload
  --bucket  Upload files to an AWS S3 bucket                            [string]
  --depot   Upload files to a URL with a PUT request (depot)            [string]

Output
  --file, -o       Output filename. '-' for stdout    [string] [default: "[id]"]
  --type, -t       Filetype   [choices: "csv", "json", "both"] [default: "json"]
  --mediaPath, -m  Add filepaths to _mediaPath        [boolean] [default: false]

Display
  --visible    Show browser on the screen             [boolean] [default: false]
  --quiet, -q  Disable progress output                [boolean] [default: false]

Logging
  --logging, -l    [choices: "none", "error", "info", "debug"] [default: "none"]
  --logfile      Log file name             [string] [default: "instamancer.log"]

Validation
  --strict  Throw an error on response type mismatch  [boolean] [default: false]

Plugins
  --plugin, -p  Use a plugin from the plugins directory    [array] [default: []]

Options:
  --help     Show help                                                 [boolean]
  --version  Show version number                                       [boolean]

Examples:
  instamancer hashtag instagood -fvd        Download all the available posts,
                                            and their media from #instagood
  instamancer user arianagrande --type=csv  Download Ariana Grande's posts to a
  --logging=info --visible                  CSV file with a non-headless
                                            browser, and log all events

Source code available at https://github.com/ScriptSmith/instamancer

Module

ES2018 Typescript example:

import {createApi, IOptions} from "instamancer"

const options: IOptions = {
    total: 10
};
const hashtag = createApi("hashtag", "beach", options);

(async () => {
    for await (const post of hashtag.generator()) {
        console.log(post);
    }
})();

Generator functions

import {createApi} from "instamancer"

createApi("hashtag", id, options);
createApi("user", id, options);
createApi("post", ids, options);
createApi("search", query, options);

Options

const options: Instamancer.IOptions = {
    // Total posts to download. 0 for unlimited
    total: number,

    // Run Chrome in headless mode
    headless: boolean,

    // Logging events
    logger: winston.Logger,

    // Run without output to stdout
    silent: boolean,

    // Time to sleep between interactions with the page
    sleepTime: number,

    // Throw an error if type validation has been failed
    strict: boolean,

    // Time to sleep when rate-limited
    hibernationTime: number,

    // Enable the grafting process
    enableGrafting: boolean,

    // Extract the full amount of information from the API
    fullAPI: boolean,

    // Use a proxy in Chrome to connect to Instagram
    proxyURL: string,

    // Location of the chromium / chrome binary executable
    executablePath: string,

    // Custom io-ts validator
    validator: Type<unknown>,

    // Custom plugins
    plugins: IPlugin[]
}

Comparison

A comparison of Instagram scraping tools. Please suggest more tools and criteria through a pull request.

To see a speed comparison, visit this page

Tool	Hashtags	Users	Tagged posts	Locations	Posts	Stories	Login not required	Private feeds	Batch mode	Plugins	Command-line	Library/Module	Download media	Download metadata	Scraping method	Daily builds	Main language	Speed ____________________________	Build status ____________________________	Test coverage ____________________________	Code quality ____________________________
Instamancer	:heavy_check_mark:	:heavy_check_mark:	:x:	:x:	:heavy_check_mark:	:x:	:heavy_check_mark:	:x:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	Web API request interception	:heavy_check_mark:	Typescript
Instaphyte	:heavy_check_mark:	:x:	:x:	:x:	:x:	:x:	:heavy_check_mark:	:x:	:x:	:x:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	Web API simulation	:heavy_check_mark:	Python
Instaloader	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:x:	:x:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	Web API simulation	:x:	Python			:question:	:question:
Instalooter	:heavy_check_mark:	:heavy_check_mark:	:x:	:heavy_check_mark:	:heavy_check_mark:	:x:	:x:	:heavy_check_mark:	:heavy_check_mark:	:x:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	Web API simulation	:x:	Python
Instagram crawler	:heavy_check_mark:	:heavy_check_mark:	:x:	:x:	:heavy_check_mark:	:x:	:heavy_check_mark:	:x:	:x:	:x:	:heavy_check_mark:	:heavy_check_mark:	:x:	:heavy_check_mark:	Web DOM reading	:x:	Python	:question:		:question:	:question:
Instagram Scraper	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:x:	:heavy_check_mark:	:x:	:heavy_check_mark:	:x:	:x:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	Web API simulation	:x:	Python			:question:	:question:
Instagram Private API	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:x:	:x:	:x:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	App and Web API simulation	:x:	Python	:question:		:question:	:question:
Instagram PHP Scraper	:heavy_check_mark:	:heavy_check_mark:	:x:	:heavy_check_mark:	:heavy_check_mark:	:x:	:heavy_check_mark:	:heavy_check_mark:	:x:	:x:	:x:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	Web API simulation	:x:	PHP	:question:	:question:	:question:	:question:

Keywords

FAQs

What is instamancer?

Is instamancer well maintained?

Package last updated on 24 May 2020

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

instamancer

Instamancer

Install | Usage | Comparison | Website | FAQ | Examples

Features

Data

Install

Linux

Without downloading chromium

From NPM

From NPX

From this repository

Usage

Command Line

Module

Generator functions

Options

Comparison

Keywords

Related posts

Malicious fezbox npm Package Steals Browser Passwords from Cookies via Innovative QR Code Steganographic Technique

Identifying and Preventing Fraudulent Engineering Candidates: An Investigation into 80 Confirmed Cases