Research
Security News
Quasar RAT Disguised as an npm Package for Detecting Vulnerabilities in Ethereum Smart Contracts
Socket researchers uncover a malicious npm package posing as a tool for detecting vulnerabilities in Etherium smart contracts.
.. image:: https://badges.frapsoft.com/os/mit/mit.png?v=103 :target: https://opensource.org/licenses/mit-license.php :alt: MIT License
.. image:: https://badge.fury.io/py/gs-wrap.svg :target: https://badge.fury.io/py/gs-wrap :alt: PyPI - version
.. image:: https://img.shields.io/pypi/pyversions/gs-wrap.svg :alt: PyPI - Python Version
.. image:: https://readthedocs.org/projects/gs-wrap/badge/?version=latest :target: https://gs-wrap.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status
gs-wrap
wraps Google Cloud Storage API <https://cloud.google.com/storage/>
_
for multi-threaded data manipulation including copying, reading, writing and
hashing.
Originally, we used our gsutilwrap <https://github.com/Parquery/gsutilwrap/>
_,
a thin wrapper around gsutil
command-line interface, to simplify
the deployment and backup tasks related to Google Cloud Storage.
However, gsutilwrap
was prohibitively slow at copying many objects into
different destinations.
Therefore we developed gs-wrap
to accelerate these operations while keeping
it equally fast or faster than gsutilwrap
at other operations.
While the google-cloud-storage <https://github.com/googleapis/google-cloud-python/tree/master/storage/>
_
library provided by Google offers sophisticated features and good performance,
its use cases and behavior differ from gsutil
.
Since we wanted the simplicity and usage patterns of gsutil
, we created
gs-wrap
, which wraps google-cloud-storage
in its core and with its
interface set to behave like gsutil
.
gs-wrap
is not the first Python library wrapping Google Cloud Storage API.
cloud-storage-client <https://github.com/Rakanixu/cloud-storage-client/>
_
takes a similar approach and aims to manage both Amazon's S3 and Google Cloud
Storage. Parts of it are also based on google-cloud-storage
, however the
library's behaviour differs from gsutil
which made it hard to use as an
in-place replacement for gsutilwrap
. Additionally, the library did not
offer all needed operations, for example copying to many destinations, reading,
writing and hashing.
The main strength of gs-wrap
is the ability to copy many objects from many
different paths to multiple destinations, while still mimicking gsutil
interface. A direct comparison of performance between gs-wrap
and
gsutilwrap
can be found in the section Benchmarks <https://github.com/Parquery/gs-wrap#benchmarks>
_.
You need to create a Google Cloud Storage bucket to use this client library.
Follow along with the official Google Cloud Storage documentation <https://cloud.google.com/storage/docs/cloud-console#_creatingbuckets>
_ to
learn how to create a bucket.
First a client for interacting with the Google Cloud Storage API needs to be
created. This one uses internally the Storage Client <https://googleapis.github.io/google-cloud-python/latest/storage/client.html#google.cloud.storage.client.Client/>
_
from google-cloud-storage
.
One parameter can be passed to the client:
The Google Cloud Storage project which the client acts on behalf of. It will
be passed when creating the internal client. If not passed, falls back to the
default inferred from the locally authenticated Google Cloud SDK <http://cloud.google.com/sdk>
_ environment. Each project needs a separate
client. Operations between two different projects are not supported.
.. code-block:: python
import gswrap
client = gswrap.Client() # project is optional
.. warning::
Wildcards (\*, \*\*, \?, \[chars\], \[char range\]) are not supported by
Google Cloud Storage API and neither by ``gs-wrap`` at the moment
[2019-01-16]. Reasons are that the ``gsutil`` with wildcards can hardly be
equivalently reconstructed and that the toplevel search is extremely
inefficient. More information about ``gsutil`` wildcards can be found here:
`<https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames>`_
.. code-block:: python
client.ls(gcs_url="gs://your-bucket/your-dir", recursive=False)
# gs://your-bucket/your-dir/your-subdir1/
# gs://your-bucket/your-dir/your-subdir2/
# gs://your-bucket/your-dir/file1
client.ls(gcs_url="gs://your-bucket/your-dir", recursive=True)
# gs://your-bucket/your-dir/your-subdir1/file1
# gs://your-bucket/your-dir/your-subdir1/file2
# gs://your-bucket/your-dir/your-subdir2/file1
# gs://your-bucket/your-dir/file1
If both the source and destination URL are cloud URLs from the same provider,
gsutil
copies data "in the cloud" (i.e. without downloading to and
uploading from the machine where you run gs-wrap
).
.. note::
client.cp() runs single-threaded by default. When multi-threading is
activated, the maximum number of workers is the number of processors on the
machine, multiplied by 5. This is the multi-threading default of the
ThreadPoolExecuter from the concurrent.futures library <https://docs.python.org/3.5/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor>
_.
.. code-block:: python
# your-bucket before:
# gs://your-bucket/file1
client.cp(src="gs://your-bucket/file1",
dst="gs://your-bucket/your-dir/",
recursive=True)
# your-bucket after:
# gs://your-bucket/file1
# gs://your-bucket/your-dir/file1
# your-backup-bucket before:
# "empty"
client.cp(src="gs://your-bucket/file1",
dst="gs://your-backup-bucket/backup-file1",
recursive=False)
# your-backup-bucket after:
# gs://your-backup-bucket/backup-file1
.. code-block:: python
# your-bucket before:
# "empty"
client.cp(src="gs://your-bucket/some-dir/",
dst="gs://your-bucket/another-dir/", recursive=False)
# google.api_core.exceptions.GoogleAPIError: No URLs matched
# your-bucket before:
# gs://your-bucket/some-dir/file1
# gs://your-bucket/some-dir/dir1/file11
# Destination URL without slash
client.cp(src="gs://your-bucket/some-dir/",
dst="gs://your-bucket/another-dir", recursive=True)
# your-bucket after:
# gs://your-bucket/another-dir/file1
# gs://your-bucket/another-dir/dir1/file11
# Destination URL with slash
client.cp(src="gs://your-bucket/some-dir/",
dst="gs://your-bucket/another-dir/", recursive=True)
# your-bucket after:
# gs://your-bucket/another-dir/some-dir/file1
# gs://your-bucket/another-dir/some-dir/dir1/file11
# Choose to copy multi-threaded. (default=False)
client.cp(src="gs://your-bucket/some-dir/",
dst="gs://your-bucket/another-dir", recursive=True, multithreaded=True)
# your-bucket after:
# gs://your-bucket/another-dir/file1
# gs://your-bucket/another-dir/dir1/file11
.. note::
**recursive** causes directories, buckets, and bucket subdirectories to be
copied recursively. If you upload from local disk to Google Cloud Storage
and set recursive to ``False``, ``gs-wrap``
will raise an exception and inform you that no URL matched.
This mimicks the behaviour of ``gsutil`` when no wildcards are used.
.. code-block:: python
# Your local directory:
# /home/user/storage/file1
# /home/user/storage/file2
# your-bucket before:
# "empty"
client.cp(src="/home/user/storage/",
dst="gs://your-bucket/local/",
recursive=True)
# your-bucket after:
# gs://your-bucket/local/storage/file1
# gs://your-bucket/local/storage/file2
.. note::
**recursive** causes directories, buckets, and bucket subdirectories to be
copied recursively. If you upload from local disk to Google Cloud Storage
and set recursive to ``False``, ``gs-wrap``
will raise an exception and inform you that no URL matched.
This mimicks the behaviour of ``gsutil`` when no wildcards are used.
.. code-block:: python
import os
# Current your-bucket:
# gs://your-bucket/file1
client.cp(
src="gs://your-bucket/file1",
dst="/home/user/storage/file1")
# Your local directory:
# /home/user/storage/file1
.. note::
All parameters can be used for any kind of ``cp`` operation.
.. code-block:: python
# Parameter: no_clobber example:
import os
# File content before: "hello"
os.stat("/home/user/storage/file1").st_mtime # 1537947563
client.cp(
src="gs://your-bucket/file1",
dst="/home/user/storage/file1",
no_clobber=True)
# no_clobber option stops from overwriting.
# File content after: "hello"
os.stat("/home/user/storage/file1").st_mtime # 1537947563
client.cp(
src="gs://your-bucket/file1",
dst="/home/user/storage/file1",
no_clobber=False)
# File content after: "hello world"
os.stat("/home/user/storage/file1").st_mtime # 1540889799
# Parameter: recursive and multi-threaded example:
# Your local directory:
# /home/user/storage/file1
# ...
# /home/user/storage/file1000
# your-bucket before:
# "empty"
# Execute normal recursive copy in multiple threads.
client.cp(src="/home/user/storage/",
dst="gs://your-bucket/local/",
recursive=True, multithreaded=True)
# your-bucket after:
# gs://your-bucket/local/storage/file1
# ...
# gs://your-bucket/local/storage/file1000
# Parameter: preserve_posix example:
# Your file before:
# /home/user/storage/file1
# e.g. file_mtime: 1547653413 equivalent to 2019-01-16 16:43:33
client.cp(src="/home/user/storage/file1",
dst="gs://your-backup-bucket/file1",
preserve_posix=False)
# your-backup-bucket after:
# gs://your-backup-bucket/file1 e.g. "no metadata file_mtime"
# Preserve the POSIX attributes. POSIX attributes are the metadata of a file.
client.cp(src="/home/user/storage/file1",
dst="gs://your-backup-bucket/file1",
preserve_posix=True)
# your-backup-bucket after:
# gs://your-backup-bucket/file1 e.g. file_mtime: 2019-01-16 16:43:33
.. code-block:: python
sources_destinations = [
# Copy on Google Cloud Storage
('gs://your-bucket/your-dir/file',
'gs://your-bucket/backup-dir/file'),
# Copy from gcs to local
('gs://your-bucket/your-dir/file',
pathlib.Path('/home/user/storage/backup-file')),
# Copy from local to gcs
(pathlib.Path('/home/user/storage/new-file'),
'gs://your-bucket/your-dir/new-file'),
# Copy locally
(pathlib.Path('/home/user/storage/file'),
pathlib.Path('/home/user/storage/new-file'))]
client.cp_many_to_many(srcs_dsts=sources_destinations)
.. code-block:: python
# your-bucket before:
# gs://your-bucket/file
client.rm(url="gs://your-bucket/file")
# your-bucket after:
# "empty"
# your-bucket before:
# gs://your-bucket/file1
# gs://your-bucket/your-dir/file2
# gs://your-bucket/your-dir/sub-dir/file3
client.rm(url="gs://your-bucket/your-dir", recursive=True)
# your-bucket after:
# gs://your-bucket/file1
.. code-block:: python
client.write_text(url="gs://your-bucket/file",
text="Hello, I'm text",
encoding='utf-8')
client.read_text(url="gs://your-bucket/file",
encoding='utf-8')
# Hello I'm text
client.write_bytes(url="gs://your-bucket/data",
data="I'm important data".encode('utf-8'))
data = client.read_bytes(url="gs://your-bucket/data")
data.decode('utf-8')
# I'm important data
.. note::
POSIX attributes include meta information about a file. When copying a file
locally or copying a file within Google Cloud Storage, the POSIX attributes
are always preserved. On the other hand, when downloading or uploading file
to Google Cloud Storage, the POSIX attributes is only preserved when
**preserve_posix** is set to True.
.. code-block:: python
file = pathlib.Path('/home/user/storage/file')
file.touch()
print(file.stat())
# os.stat_result(st_mode=33204, st_ino=19022665, st_dev=64769, st_nlink=1,
# st_uid=1000, st_gid=1000, st_size=0, st_atime=1544015997,
# st_mtime=1544015997, st_ctime=1544015997)
# Upload does not preserve POSIX attributes.
client.cp(src=pathlib.Path('/home/user/storage/file'),
dst="gs://your-bucket/file")
stats = client.stat(url="gs://your-bucket/file")
stats.creation_time # 2018-11-21 13:27:46.255000+00:00
stats.update_time # 2018-11-21 13:27:46.255000+00:00
stats.content_length # 1024 [bytes]
stats.storage_class # REGIONAL
stats.file_atime # None
stats.file_mtime # None
stats.posix_uid # None
stats.posix_gid # None
stats.posix_mode # None
stats.md5 # b'1B2M2Y8AsgTpgAmY7PhCfg=='
stats.crc32c # b'AAAAAA=='
# Upload with preserve_posix also copy POSIX attributes to blob.
# POSIX attributes are the metadata of a file.
# It also works for downloading.
client.cp(src=pathlib.Path('/home/user/storage/file'),
dst="gs://your-bucket/file", preserve_posix=True)
stats = client.stat(url="gs://your-bucket/file")
stats.creation_time # 2018-11-21 13:27:46.255000+00:00
stats.update_time # 2018-11-21 13:27:46.255000+00:00
stats.content_length # 1024 [bytes]
stats.storage_class # REGIONAL
stats.file_atime # 2018-11-21 13:27:46
stats.file_mtime # 2018-11-21 13:27:46
stats.posix_uid # 1000
stats.posix_gid # 1000
stats.posix_mode # 777
stats.md5 # b'1B2M2Y8AsgTpgAmY7PhCfg=='
stats.crc32c # b'AAAAAA=='
.. code-block:: python
# Check modification time when copied with preserve_posix.
client.same_modtime(path='/home/user/storage/file',
url='gs://your-bucket/file')
# Check md5 hash to ensure content equality.
client.same_md5(path='/home/user/storage/file', url='gs://your-bucket/file')
# Retrieve hex digests of MD5 checksums for multiple URLs.
urls = ['gs://your-bucket/file1', 'gs://your-bucket/file2']
client.md5_hexdigests(urls=urls, multithreaded=False)
The documentation is available on readthedocs <https://gs-wrap.readthedocs.io/en/latest/>
_.
In order to use this library, you need to go through the following steps:
Select or create a Cloud Platform project. <https://console.cloud.google.com/project>
_Enable billing for your project. <https://console.cloud.google.com/project>
_Enable the Google Cloud Storage API. <https://cloud.google.com/storage>
_Setup Authentication using the Google Cloud SDK. <https://googlecloudplatform.github.io/google-cloud-python/latest/core/auth.html>
_.. code-block:: bash
pip3 install gs-wrap
Check out the repository.
In the repository root, create the virtual environment:
.. code-block:: bash
python3 -m venv venv3
.. code-block:: bash
source venv3/bin/activate
.. code-block:: bash
pip3 install -e .[dev]
We use tox for testing and packaging the distribution. Assuming that the virtual environment has been activated and the development dependencies have been installed, run:
.. code-block:: bash
tox
We provide a set of pre-commit checks that lint and check code for formatting.
Namely, we use:
yapf <https://github.com/google/yapf>
_ to check the formatting.pydocstyle <https://github.com/PyCQA/pydocstyle>
_.mypy <http://mypy-lang.org/>
_.isort <https://github.com/timothycrosley/isort>
_ to sort your imports for you.pylint <https://www.pylint.org/>
_.doctest module <https://docs.python.org/3.5/library/doctest.html>
_.pyicontract-lint <https://github.com/Parquery/pyicontract-lint/>
_ lints contracts
in Python code defined with icontract library <https://github.com/Parquery/icontract/>
_.twine <https://pypi.org/project/twine/>
_ to check the README for invalid markup
which prevents it from rendering correctly on PyPI.Run the pre-commit checks locally from an activated virtual environment with development dependencies:
.. code-block:: bash
./precommit.py
.. code-block:: bash
./precommit.py --overwrite
Assuming that the virtual environment has been activated, the development
dependencies have been installed and the PYTHONPATH
has been set to the
project directory, run the benchmarks with:
.. code-block:: bash
./benchmark/main.py *NAME OF YOUR GCS BUCKET*
Here are some of our benchmark results:
Benchmark list 10000 files:
+------------+--------+---------+ | TESTED | TIME | SPEEDUP | +------------+--------+---------+ | gswrap | 3.22 s | - | +------------+--------+---------+ | gsutilwrap | 3.98 s | 1.24 x | +------------+--------+---------+
Benchmark upload 10000 files:
+------------+---------+---------+ | TESTED | TIME | SPEEDUP | +------------+---------+---------+ | gswrap | 45.12 s | - | +------------+---------+---------+ | gsutilwrap | 34.85 s | 0.77 x | +------------+---------+---------+
Benchmark upload-many-to-many 500 files:
+------------+--------+---------+ | TESTED | TIME | SPEEDUP | +------------+--------+---------+ | gswrap | 2.14 s | - | +------------+--------+---------+ | gsutilwrap | 65.2 s | 30.49 x | +------------+--------+---------+
Benchmark download 10000 files:
+------------+---------+---------+ | TESTED | TIME | SPEEDUP | +------------+---------+---------+ | gswrap | 43.92 s | - | +------------+---------+---------+ | gsutilwrap | 43.01 s | 0.98 x | +------------+---------+---------+
Benchmark download-many-to-many 500 files:
+------------+---------+---------+ | TESTED | TIME | SPEEDUP | +------------+---------+---------+ | gswrap | 5.85 s | - | +------------+---------+---------+ | gsutilwrap | 62.93 s | 10.76 x | +------------+---------+---------+
Benchmark copy on remote 1000 files:
+------------+--------+---------+ | TESTED | TIME | SPEEDUP | +------------+--------+---------+ | gswrap | 5.09 s | - | +------------+--------+---------+ | gsutilwrap | 4.47 s | 0.88 x | +------------+--------+---------+
Benchmark copy-many-to-many-on-remote 500 files:
+------------+---------+---------+ | TESTED | TIME | SPEEDUP | +------------+---------+---------+ | gswrap | 6.55 s | - | +------------+---------+---------+ | gsutilwrap | 62.76 s | 9.57 x | +------------+---------+---------+
Benchmark remove 1000 files:
+------------+--------+---------+ | TESTED | TIME | SPEEDUP | +------------+--------+---------+ | gswrap | 3.16 s | - | +------------+--------+---------+ | gsutilwrap | 3.66 s | 1.16 x | +------------+--------+---------+
Benchmark read 100 files:
+------------+---------+---------+ | TESTED | TIME | SPEEDUP | +------------+---------+---------+ | gswrap | 16.56 s | - | +------------+---------+---------+ | gsutilwrap | 64.73 s | 3.91 x | +------------+---------+---------+
Benchmark write 30 files:
+------------+---------+---------+ | TESTED | TIME | SPEEDUP | +------------+---------+---------+ | gswrap | 2.67 s | - | +------------+---------+---------+ | gsutilwrap | 32.55 s | 12.17 x | +------------+---------+---------+
Benchmark stat 100 files:
+------------+---------+---------+ | TESTED | TIME | SPEEDUP | +------------+---------+---------+ | gswrap | 6.39 s | - | +------------+---------+---------+ | gsutilwrap | 48.15 s | 7.53 x | +------------+---------+---------+
All results of our benchmarks can be found here <https://github.com/Parquery/gs-wrap/blob/master/benchmark/benchmark_results>
_.
We follow Semantic Versioning <http://semver.org/spec/v1.0.0.html>
_.
The version X.Y.Z indicates:
FAQs
gs-wrap wraps Google Cloud Storage API for multi-threaded data manipulations.
We found that gs-wrap demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover a malicious npm package posing as a tool for detecting vulnerabilities in Etherium smart contracts.
Security News
Research
A supply chain attack on Rspack's npm packages injected cryptomining malware, potentially impacting thousands of developers.
Research
Security News
Socket researchers discovered a malware campaign on npm delivering the Skuld infostealer via typosquatted packages, exposing sensitive data.