github.com/amyangfei/image_viewer

v0.0.0-20180927120347-691e9b3f688a
Source
Go

Version published: 6 years ago

Created: 6 years ago

Source

Simple Image Crawling and File System Mapping

Have fun

Introduction

This is a simple tool mapping images and sub links in a single html page to file system directory structure.

When we run image_tool a simple file system server will be running background. The file system server is based on PathFileSystem provided by go-fuse. File system operation such as ls, cd, cat will trigger interface defined in go-fuse, so we implement some useful interface in order to update file system structure dynamicly. Currently the file system information including dir entry list, file attributes and file data is all stored in memory.

Build

$ export build_path=/path/to/build
$ mkdir -p $build_path/src/github.com/amyangfei && cd $build_path/src/github.com/amyangfei
$ git clone https://github.com/amyangfei/image_viewer
$ export GOPATH=$GOPATH:$build_path
$ cd image_viewer && make

Headless Crawling

Javascript executing is turned off by default. If we want to execute js, turn on --headless option and chrome headless will be used. Chrome and chrome driver is needed in headless mode. Dependencies installation instructions in Ubuntu/Debian is following:

$ curl -sSL https://dl.google.com/linux/linux_signing_key.pub | apt-key add -
$ echo "deb [arch=amd64] https://dl.google.com/linux/chrome/deb/ stable main" > /etc/apt/sources.list.d/google.list
$ apt-get update && apt-get install -y google-chrome-stable
$ wget -N https://chromedriver.storage.googleapis.com/2.42/chromedriver_linux64.zip && unzip chromedriver_linux64.zip
$ mv -f chromedriver /usr/local/bin/chromedriver

TODO

Add test case
Dependency management
Javascript simulator, eg chrome headless
Better filename against urlencode
Image type detection, used for filename without extension
Image pre load acceleration for dir list
CI support
Duplicate url optimization
Better url and img src extract strategy

FAQs

What is github.com/amyangfei/image_viewer?

Package last updated on 27 Sep 2018

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

github.com/amyangfei/image_viewer

Simple Image Crawling and File System Mapping

Introduction

Build

Headless Crawling

TODO

Related posts

PyPI’s New Archival Feature Closes a Major Security Gap

North Korean APT Lazarus Targets Developers with Malicious npm Package