scrapy-folder-tree
This is a scrapy pipeline that provides an easy way to store files and images using various folder structures.
Supported folder structures:
Given this scraped file: 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
, you can choose the following folder structures:
Using file name
class: scrapy-folder-tree.ImagesHashTreePipeline
full
├── 0
. ├── 5
. . ├── b
. . . ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
Using crawling time
class: scrapy-folder-tree.ImagesTimeTreePipeline
full
├── 0
. ├── 11
. . ├── 48
. . . ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
Using crawling date
class: scrapy-folder-tree.ImagesDateTreePipeline
full
├── 2022
. ├── 1
. . ├── 24
. . . ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
Installation
pip install scrapy_folder_tree
Usage
Use the following settings in your project:
ITEM_PIPELINES = {
'scrapy_folder_tree.FilesHashTreePipeline': 300
}
FOLDER_TREE_DEPTH = 3