![Create React App Officially Deprecated Amid React 19 Compatibility Issues](https://cdn.sanity.io/images/cgdhsj6q/production/04fa08cf844d798abc0e1a6391c129363cc7e2ab-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Create React App Officially Deprecated Amid React 19 Compatibility Issues
Create React App is officially deprecated due to React 19 issues and lack of maintenance—developers should switch to Vite or other modern alternatives.
.. image:: https://travis-ci.org/eight04/ComicCrawler.svg?branch=master :target: https://travis-ci.org/eight04/ComicCrawler
Comic Crawler 是用來扒圖的一支 Python Script。擁有簡易的下載管理員、圖書館功能、 與方便的擴充能力。
Comic Crawler is on
PyPI <https://pypi.python.org/pypi/comiccrawler/>
__. 安裝完
python 後,可以直接用 pip 指令自動安裝。
Install Python
你需要 Python 3.11 以上。安裝檔可以從它的
`官方網站 <https://www.python.org/>`__ 下載。
安裝時記得要選「Add python.exe to path」,才能使用 pip 指令。
Install Deno
~~~~~~~~~~~~
Comic Crawler 使用 Deno 來分析需要執行 JavaScript 的網站︰
https://docs.deno.com/runtime/manual/getting_started/installation
Windows 10 (1709) 以上的版本,可以直接在 cmd 底下輸入以下指令安裝︰
::
winget install deno
Install Comic Crawler
在 cmd 底下輸入以下指令︰
::
pip install comiccrawler
更新時︰
::
pip install comiccrawler --upgrade --upgrade-strategy eager
最後在 cmd 底下輸入以下指令執行 Comic Crawler︰
::
comiccrawler gui
.. DOMAINS ..
163.bilibili.com 8comic.com 99.hhxxee.com ac.qq.com beta.sankakucomplex.com chan.sankakucomplex.com comic.acgn.cc comic.sfacg.com comicbus.com coomer.su copymanga.com danbooru.donmai.us deviantart.com e-hentai.org exhentai.org fanbox.cc fantia.jp gelbooru.com hk.dm5.com ikanman.com imgbox.com jpg4.su kemono.party kemono.su konachan.com linevoom.line.me m.dmzj.com m.manhuabei.com m.wuyouhui.net manga.bilibili.com manhua.dmzj.com manhuagui.com nijie.info pixabay.com raw.senmanga.com seemh.com seiga.nicovideo.jp smp.yoedge.com tel.dm5.com tsundora.com tuchong.com tumblr.com tw.weibo.com twitter.com wix.com www.177pic.info www.1manhua.net www.33am.cn www.36rm.cn www.99comic.com www.aacomic.com www.artstation.com www.buka.cn www.cartoonmad.com www.chuixue.com www.chuixue.net www.cocomanhua.com www.colamanga.com www.comicabc.com www.comicvip.com www.dm5.com www.dmzj.com www.facebook.com www.flickr.com www.gufengmh.com www.gufengmh8.com www.hhcomic.cc www.hheess.com www.hhmmoo.com www.hhssee.com www.hhxiee.com www.iibq.com www.instagram.com www.mangacopy.com www.manhuadui.com www.manhuaren.com www.mh160.com www.mhgui.com www.ohmanhua.com www.pixiv.net www.sankakucomplex.com www.setnmh.com www.tohomh.com www.tohomh123.com www.xznj120.com x.com yande.re
.. END DOMAINS
As a CLI tool:
::
Usage: comiccrawler [--profile=] ( domains | download [--dest=<save_path>] | gui ) comiccrawler (--help | --version)
Commands: domains 列出支援的網址 download 下載指定的 url gui 啟動主視窗
Options: --profile 指定設定檔存放的資料夾(預設為 "~/comiccrawler") --dest 設定下載目錄(預設為 ".") --help 顯示幫助訊息 --version 顯示版本
or you can use it in your python script:
.. code:: python
from comiccrawler.mission import Mission
from comiccrawler.analyzer import Analyzer
from comiccrawler.crawler import download
# create a mission
m = Mission(url="http://example.com")
Analyzer(m).analyze()
# select the episodes you want
for ep in m.episodes:
if ep.title != "chapter 123":
ep.skip = True
# download to savepath
download(m, "path/to/save")
.. figure:: http://i.imgur.com/ZzF0YFx.png :alt: 主視窗
.. code:: ini
[DEFAULT]
; 設定下載完成後要執行的程式,{target} 會被替換成任務資料夾的絕對路徑
runafterdownload = 7z a "{target}.zip" "{target}"
; 啟動時自動檢查圖書館更新
libraryautocheck = true
; 檢查更新間隔(單位︰小時)
autocheck_interval = 24
; 下載目的資料夾。相對路徑會根據設定檔資料夾的位置。
savepath = download
; 開啟 grabber 偵錯
errorlog = false
; 每隔 5 分鐘自動存檔
autosave = 5
; 存檔時使用下載時的原始檔名而不用頁碼
; 強列建議不要使用這個選項,見 https://github.com/eight04/ComicCrawler/issues/90
originalfilename = false
; 自動轉換集數名稱中數字的格式,可以用於補0
; 例︰第1集 -> 第001集
; 詳細的格式指定方式請參考 https://docs.python.org/3/library/string.html#format-specification-mini-language
; 注意︰這個設定會影響檔名中的所有數字,包括檔名中英數混合的ID如instagram
titlenumberformat = {:03d}
; 連線時使用 http/https proxy
proxy = 127.0.0.1:1080
; 加入新任務時,預設選擇所有集數
selectall = true
; 不要根據各集名稱建立子資料夾,將所有圖片放在任務資料夾內
noepfolder = true
; 遇到重複任務時的動作
; update: 檢查更新
; reselect_episodes: 重新選取集數
mission_conflict_action = update
; 是否驗證加密連線(SSL),預設是 true
verify = false
; 從瀏覽器中讀取 cookies,使用 yt-dlp 的 cookies-from-browser
; https://github.com/yt-dlp/yt-dlp/blob/e5d4f11104ce7ea1717a90eea82c0f7d230ea5d5/yt_dlp/cookies.py#L109
browser = firefox
; 瀏覽器 profile 的名稱
browser_profile = act3nn7e.default
設定檔位於 ~\comiccrawler\setting.ini
。可以在執行時指定 --profile
選項以變更預設的位置。(在 Windows 中 ~
會被展開為 %HOME%
或 %USERPROFILE%
)
執行一次 comiccrawler gui
後關閉,設定檔會自動產生。若 Comic Crawler 更新後有新增的設定,在關閉後會自動將新設定加入設定檔。
各別的網站會有自己的設定,通常是要填入一些登入相關資訊
設定檔會在重新啟動後生效。若 ComicCrawler 正在執行中,可以點「重載設定檔」來載入新設定
.. warning::
若在執行時,修改設定檔並儲存,接著結束 ComicCrawler,修改會遺失。因為 ComicCrawler 結束前會把設定寫回設定檔。
各別網站的設定不會互相影響。假如在 [DEFAULT] 設 savepath = a;在 [Pixiv] 設 savepath = b,那麼從 pixiv 下載的都會存到 b 資料夾,其它的就用預設值,存到 a 資料夾。
只要在設定檔裡指定 browser
和 browser_profile
,Comic Crawler 就可以自動從瀏覽器讀取 cookies 並登入。然而最新版的 Chrome 加強了對 Cookie 的保護︰
所以目前只有 Firefox 可以正常運作。
有些網站可以在設定檔裡指定 cookie 或 curl,但這些設定在未來會逐步淘汰,改用瀏覽器 cookie 自動登入。
Starting from version 2016.4.21, you can add your own module to ~/comiccrawler/mods/module_name.py
.
.. code:: python
#! python3
"""
This is an example to show how to write a comiccrawler module.
"""
import re
from urllib.parse import urljoin
from comiccrawler.episode import Episode
# The header used in grabber method. Optional.
header = {}
# The cookies. Optional.
cookie = {}
# Match domain. Support sub-domain, which means "example.com" will match
# "*.example.com"
domain = ["www.example.com", "comic.example.com"]
# Module name
name = "Example"
# With noepfolder = True, Comic Crawler won't generate subfolder for each
# episode. Optional, default to False.
noepfolder = False
# If False then setup the referer header automatically to mimic browser behavior.
# If True then disable this behavior.
# Default: False
no_referer = True
# Wait 5 seconds before downloading another image. Optional, default to 0.
rest = 5
# Wait 5 seconds before analyzing the next page in the analyzer. Optional,
# default to 0.
rest_analyze = 5
# User settings which could be modified from setting.ini. The keys are
# case-sensitive.
#
# After loading the module, the config dictionary would be converted into
# a ConfigParser section data object so you can e.g. call
# config.getboolean("use_large_image") directly.
#
# Optional.
config = {
# The config value can only be str
"use_largest_image": "true",
# These special config starting with `cookie__` will be automatically
# used when grabbing html or image.
"cookie_user": "user-default-value",
"cookie_hash": "hash-default-value"
}
def load_config():
"""This function will be called each time the config reloads. Optional.
"""
pass
def get_title(html, url):
"""Return mission title.
The title would be used in saving filepath, so be sure to avoid
duplicated title.
"""
return re.search("<h1 id='title'>(.+?)</h1>", html).group(1)
def get_episodes(html, url):
"""Return episode list.
The episode list should be sorted by date, oldest first.
If is a multi-page list, specify the URL of the next page in
get_next_page. Comic Crawler would grab the next page and call this
function again.
The `Episode` object accepts an `image` property which can be a list of `Image`.
However, unlike `get_images`, the `Episode` object is JSON-stringified and saved
to the disk, therefore you must only use JSON-compatible types i.e. no `Image.get_url`.
"""
match_list = re.findall("<a href='(.+?)'>(.+?)</a>", html)
return [Episode(title, urljoin(url, ep_url))
for ep_url, title in match_list]
def get_images(html, url):
"""Get the URL of all images.
The return value could be:
- A list of image.
- A generator yielding image.
- An image, when there is only one image on the current page.
Comic Crawler treats following types as an image:
- str - the URL of the image
- callable - return a URL when called
- comiccrawler.core.Image - use it to provide customized filename.
While receiving the value, it is converted to an Image instance. See ``comiccrawler.core.Image.create()``.
If the episode has multi-pages, uses get_next_page to change page.
Use generator in caution! If the generator raises any error between
two images, next call to the generator will always result in
StopIteration, which means that Comic Crawler will think it had crawled
all images and navigate to next page. If you have to call grabhtml()
for each image (i.e. it may raise HTTPError), use a list of
callback instead!
"""
return re.findall("<img src='(.+?)'>", html)
def get_next_page(html, url):
"""Return the URL of the next page."""
match = re.search("<a id='nextpage' href='(.+?)'>next</a>", html)
if match:
return match.group(1)
def get_next_image_page(html, url):
"""Return the URL of the next page.
If this method is defined, it will be used by the crawler and ``get_next_page`` would be ignored.
Therefore ``get_next_page`` will only be used by the analyzer.
"""
pass
def redirecthandler(response, crawler):
"""Downloader will call this hook if redirect happens during downloading
an image. Sometimes services redirects users to an unexpected URL. You
can check it here.
"""
if response.url.endswith("404.jpg"):
raise Exception("Something went wrong")
def errorhandler(error, crawler):
"""Downloader will call errorhandler if there is an error happened when
downloading image. Normally you can just ignore this function.
"""
pass
def imagehandler(ext, b):
"""If this function exists, Comic Crawler will call it before writing
the image to disk. This allow the module to modify the image after
the download.
@ext str, file extension, including ".". (e.g. ".jpg")
@b The bytes object of the image.
It should return a (modified_ext, modified_b) tuple.
"""
return (ext, b)
def grabhandler(grab_method, url, **kwargs):
"""Called when the crawler is going to make a web request. Use this hook
to override the default grabber behavior.
@grab_method function, could be ``grabhtml`` or ``grabimg``.
@url str, request URL.
@kwargs other arguments that will be passed to grabber.
By returning ``None``
"""
if "/api/" in URL:
kwargs["headers"] = {"some-api-header": "some-value"}
return grab_method(url, **kwargs)
def after_request(crawler, response):
"""Called after the request is made."""
if response.url.endswith("404.jpg"):
raise Exception("Something went wrong")
def session_key(url):
"""Return a key to identify the session. If the key is the same, the
session would be shared. Otherwise, a new session would be created.
For example, you may want to separate the session between the main site
and the API endpoint.
Return None to pass the URL to next key function.
"""
r = urlparse(url)
if r.path.startswith("/api/"):
return (r.scheme, r.netloc, "api")
2024.12.9
2024.12.6
2024.11.14
2024.8.14
2024.8.9
2024.4.11
2024.4.10
max_errors
setting.2024.4.2
2024.3.25
2024.1.4
2023.12.24
2023.12.11
2023.10.11
2023.10.8
get_next_image_page
.2022.11.21
2022.11.11
2022.2.6
Fix: magic import error.
Add: support replacing argument in runafterdownload.
2022.2.3
Fix: analyze error in seemh.
Add: support fantia.jp
Add: support br encoding.
Add: open episode URL on right-click when selecting episodes.
Add: display completed episodes as green.
Add: exponential backoff.
Change: use curl in sankaku.
Change: skip 404 ep on twitter.
Change: use python-magic to detect file type.
2021.12.2
2021.11.15
2021.9.15
2021.8.31
grabhandler
hook.2020.10.29
2020.9.2
no_referer
.2020.6.3
verify
option to disable security check.2019.12.25
2019.11.19
LastPageError
in get_episodes
.jpg@YYYY-mm-dd
).redirecthandler
hook.2019.11.12
get_images
to raise SkipPageError
.2019.10.28
2019.10.19
2019.9.2
2019.8.19
2019.7.1
2019.5.20
2019.5.3
2019.3.27
2019.3.26
2019.3.18
2019.3.13
2018.12.25
2018.11.18
2018.10.24
hhmmoo.com
for hhxiee.2018.9.30
2018.9.24
2018.9.23
on_success
is executed when analyzation failed.2018.9.11
2018.9.7
2018.8.20
2018.8.11
2018.8.10
2018.7.18
2018.6.21
2018.6.14
2018.6.8
2018.5.24
2018.5.13
2018.5.5
2018.4.16
raise_429
arg in grabhtml
. Add retry
.2018.4.8
2018.3.18
2018.3.15
2018.3.9
2018.3.7
2018.1.30.2
2018.1.30.1
2018.1.30
2017.12.15
2017.12.14
2017.12.9
2017.12.4
#82 <https://github.com/eight04/ComicCrawler/issues/82>
_#83 <https://github.com/eight04/ComicCrawler/issues/83>
_2017.11.29
#81 <https://github.com/eight04/ComicCrawler/issues/81>
_动漫之家助手 <https://greasyfork.org/zh-TW/scripts/33087-%E5%8A%A8%E6%BC%AB%E4%B9%8B%E5%AE%B6%E5%8A%A9%E6%89%8B>
. #78 <https://github.com/eight04/ComicCrawler/issues/78>
2017.9.9
2017.9.5
2017.8.31
2017.8.26
2017.8.20.1
2017.8.20
proxy
.2017.8.16
2017.8.13
#66 <https://github.com/eight04/ComicCrawler/issues/66>
_2017.6.14
2017.5.29
#63 <https://github.com/eight04/ComicCrawler/issues/63>
_2017.5.26
2017.5.22
#62 <https://github.com/eight04/ComicCrawler/issues/62>
_2017.5.19
#58 <https://github.com/eight04/ComicCrawler/issues/58>
_#59 <https://github.com/eight04/ComicCrawler/issues/59>
_#61 <https://github.com/eight04/ComicCrawler/issues/61>
_2017.5.5
<title>
as title in search result (pixiv).2017.4.26
#54 <https://github.com/eight04/ComicCrawler/issues/54>
_2017.4.24
2017.4.23
2017.4.22
2017.4.18
#49 <https://github.com/eight04/ComicCrawler/issues/49>
_#47 <https://github.com/eight04/ComicCrawler/issues/47>
_#46 <https://github.com/eight04/ComicCrawler/issues/46>
_#45 <https://github.com/eight04/ComicCrawler/issues/45>
_2017.4.6
2017.4.3
2017.3.26
2017.3.25
2017.3.9
#36 <https://github.com/eight04/ComicCrawler/issues/36>
__2017.3.6
#35 <https://github.com/eight04/ComicCrawler/issues/35>
__2017.2.5
#33 <https://github.com/eight04/ComicCrawler/issues/33>
__#33 <https://github.com/eight04/ComicCrawler/issues/33>
__2017.1.10
#31 <https://github.com/eight04/ComicCrawler/issues/31>
__2017.1.6
#30 <https://github.com/eight04/ComicCrawler/pull/30>
__ by @kuanyui <https://github.com/kuanyui>
__.2017.1.3.1
comiccrawler.core.Image
.2017.1.3
2016.12.20
2016.12.6
2016.12.1
mimetypes.guess_extension
is not reliable with application/octet-stream
.webp
to valid file type.2016.11.27
2016.11.25
2016.11.2
#16 <https://github.com/eight04/ComicCrawler/issues/16>
__2016.10.8
2016.10.4
2016.9.30
params
option to grabber.2016.9.27
2016.9.11
2016.8.24.1
2016.8.24
2016.8.22
2016.8.19
2016.8.8
2016.7.2
2016.7.1
clam
theme for GUI under linux.pythreadworker
to 0.6.gui.get_scale
.2016.6.30
2016.6.25
(error, crawler)
instead of (error, episode)
.2016.6.14.1
2016.6.14
2016.6.12
2016.6.10
2016.6.4
~/comiccrawler/pool/
folder to save the memory.2016.6.3
runafterdownload
command both from the default section and the module section.2016.5.30
2016.5.28
2016.5.24
2016.5.20
2016.5.15
2016.5.2
Conten-Type
header to guess file extension.2016.5.1.1
Episode.image
, so the module can supply image list during constructing Episode.2016.5.1
2016.4.27
2016.4.26.1
2016.4.26
2016.4.22.3
ComicCrawler
section to replace DEFAULT
section.2016.4.22.2
2016.4.22.1
2016.4.22
2016.4.21
~/comiccrawler/mods
2016.4.20
2016.4.13
2016.4.8
2016.4.4
2016.4.2
mods.get_module
.2016.3.27
#8 <https://github.com/eight04/ComicCrawler/issues/8>
__.2016.2.29
2016.2.27
2016.2.15.1
2016.2.15
lastcheckupdate
setting. Now the library will only automatically check updates once a day.2016.1.26
2016.1.17
2016.1.15
2016.1.13
2016.1.12
circular
option in module. Which should be set to True
if downloader doesn't know which is the last page of the album. (e.g. Facebook)2016.1.3
2015.12.9
2015.11.8
2015.10.25
2015.10.9
2015.10.8
2015.10.7
2015.10.6
2015.9.29
2015.8.7
2015.7.31
2015.7.23
http://manhua.dmzj.com/name => http://m.dmzj.com/info/name.html
2015.7.22
2015.7.17
2015.7.16
Added:
Changed:
Fixed:
2015.7.15
2015.7.14
2015.7.7
2015.7.6
2015.7.5
2015.7.4
2015.6.22
2015.6.18
2015.6.14
safeprint
. Use echo
command.content_write
. Add append=False
option.Crawler
. Cache imgurl.grabber
. Add cookie=None
option. Change errorlog behavior.grabber
unicode encoding issue.2015.6.13
clean_finished
console_download
get_by_state
FAQs
An image crawler, including multiple modules and GUI.
We found that comiccrawler demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Create React App is officially deprecated due to React 19 issues and lack of maintenance—developers should switch to Vite or other modern alternatives.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.