Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
A Taskcluster client library for Python.
This library is a complete interface to Taskcluster in Python. It provides both synchronous and asynchronous interfaces for all Taskcluster API methods.
For a general guide to using Taskcluster clients, see Calling Taskcluster APIs.
Before calling an API end-point, you'll need to create a client instance.
There is a class for each service, e.g., Queue
and Auth
. Each takes the
same options, described below. Note that only rootUrl
is
required, and it's unusual to configure any other options aside from
credentials
.
For each service, there are sync and async variants. The classes under
taskcluster
(e.g., taskcluster.Queue
) operate synchronously. The classes
under taskcluster.aio
(e.g., taskcluster.aio.Queue
) are asynchronous.
Here is a simple set-up of an Index client:
import taskcluster
index = taskcluster.Index({
'rootUrl': 'https://tc.example.com',
'credentials': {'clientId': 'id', 'accessToken': 'accessToken'},
})
The rootUrl
option is required as it gives the Taskcluster deployment to
which API requests should be sent. Credentials are only required if the
request is to be authenticated -- many Taskcluster API methods do not require
authentication.
In most cases, the root URL and Taskcluster credentials should be provided in standard environment variables. Use taskcluster.optionsFromEnvironment()
to read these variables automatically:
auth = taskcluster.Auth(taskcluster.optionsFromEnvironment())
Note that this function does not respect TASKCLUSTER_PROXY_URL
. To use the Taskcluster Proxy from within a task:
auth = taskcluster.Auth({'rootUrl': os.environ['TASKCLUSTER_PROXY_URL']})
If you wish to perform requests on behalf of a third-party that has small set
of scopes than you do. You can specify which scopes your request should be
allowed to
use,
in the authorizedScopes
option.
opts = taskcluster.optionsFromEnvironment()
opts['authorizedScopes'] = ['queue:create-task:highest:my-provisioner/my-worker-type']
queue = taskcluster.Queue(opts)
The following additional options are accepted when constructing a client object:
signedUrlExpiration
- default value for the expiration
argument to buildSignedUrl
maxRetries
- maximum number of times to retry a failed requestAPI methods are available as methods on the corresponding client object. For sync clients, these are sync methods, and for async clients they are async methods; the calling convention is the same in either case.
There are four calling conventions for methods:
client.method(v1, v1, payload)
client.method(payload, k1=v1, k2=v2)
client.method(payload=payload, query=query, params={k1: v1, k2: v2})
client.method(v1, v2, payload=payload, query=query)
Here, v1
and v2
are URL parameters (named k1
and k2
), payload
is the
request payload, and query
is a dictionary of query arguments.
For example, in order to call an API method with query-string arguments:
await queue.listTaskGroup('JzTGxwxhQ76_Tt1dxkaG5g',
query={'continuationToken': previousResponse.get('continuationToken')})
It is often necessary to generate the URL for an API method without actually calling the method.
To do so, use buildUrl
or, for an API method that requires authentication, buildSignedUrl
.
import taskcluster
index = taskcluster.Index(taskcluster.optionsFromEnvironment())
print(index.buildUrl('findTask', 'builds.v1.latest'))
secrets = taskcluster.Secrets(taskcluster.optionsFromEnvironment())
print(secret.buildSignedUrl('get', 'my-secret'))
Note that signed URLs are time-limited; the expiration can be set with the signedUrlExpiration
option to the client constructor, or with the expiration
keyword arguement to buildSignedUrl
, both given in seconds.
If you have non-temporary taskcluster credentials you can generate a set of temporary credentials as follows. Notice that the credentials cannot last more than 31 days, and you can only revoke them by revoking the credentials that was used to issue them (this takes up to one hour).
It is not the responsibility of the caller to apply any clock drift adjustment to the start or expiry time - this is handled by the auth service directly.
import datetime
start = datetime.datetime.now()
expiry = start + datetime.timedelta(0,60)
scopes = ['ScopeA', 'ScopeB']
name = 'foo'
credentials = taskcluster.createTemporaryCredentials(
# issuing clientId
clientId,
# issuing accessToken
accessToken,
# Validity of temporary credentials starts here, in timestamp
start,
# Expiration of temporary credentials, in timestamp
expiry,
# Scopes to grant the temporary credentials
scopes,
# credential name (optional)
name
)
You cannot use temporary credentials to issue new temporary credentials. You
must have auth:create-client:<name>
to create a named temporary credential,
but unnamed temporary credentials can be created regardless of your scopes.
Many taskcluster APIs require ISO 8601 time stamps offset into the future
as way of providing expiration, deadlines, etc. These can be easily created
using datetime.datetime.isoformat()
, however, it can be rather error prone
and tedious to offset datetime.datetime
objects into the future. Therefore
this library comes with two utility functions for this purposes.
dateObject = taskcluster.fromNow("2 days 3 hours 1 minute")
# -> datetime.datetime(2017, 1, 21, 17, 8, 1, 607929)
dateString = taskcluster.fromNowJSON("2 days 3 hours 1 minute")
# -> '2017-01-21T17:09:23.240178Z'
By default it will offset the date time into the future, if the offset strings
are prefixed minus (-
) the date object will be offset into the past. This is
useful in some corner cases.
dateObject = taskcluster.fromNow("- 1 year 2 months 3 weeks 5 seconds");
# -> datetime.datetime(2015, 10, 30, 18, 16, 50, 931161)
The offset string is ignorant of whitespace and case insensitive. It may also
optionally be prefixed plus +
(if not prefixed minus), any +
prefix will be
ignored. However, entries in the offset string must be given in order from
high to low, ie. 2 years 1 day
. Additionally, various shorthands may be
employed, as illustrated below.
years, year, yr, y
months, month, mo
weeks, week, w
days, day, d
hours, hour, h
minutes, minute, min
seconds, second, sec, s
The fromNow
method may also be given a date to be relative to as a second
argument. This is useful if offset the task expiration relative to the the task
deadline or doing something similar. This argument can also be passed as the
kwarg dateObj
dateObject1 = taskcluster.fromNow("2 days 3 hours");
dateObject2 = taskcluster.fromNow("1 year", dateObject1);
taskcluster.fromNow("1 year", dateObj=dateObject1);
# -> datetime.datetime(2018, 1, 21, 17, 59, 0, 328934)
To generate slugIds (Taskcluster's client-generated unique IDs), use
taskcluster.slugId()
, which will return a unique slugId on each call.
In some cases it is useful to be able to create a mapping from names to
slugIds, with the ability to generate the same slugId multiple times.
The taskcluster.stableSlugId()
function returns a callable that does
just this.
gen = taskcluster.stableSlugId()
sometask = gen('sometask')
assert gen('sometask') == sometask # same input generates same output
assert gen('sometask') != gen('othertask')
gen2 = taskcluster.stableSlugId()
sometask2 = gen('sometask')
assert sometask2 != sometask # but different slugId generators produce
# different output
The scopeMatch(assumedScopes, requiredScopeSets)
function determines
whether one or more of a set of required scopes are satisfied by the assumed
scopes, taking *-expansion into account. This is useful for making local
decisions on scope satisfaction, but note that assumed_scopes
must be the
expanded scopes, as this function cannot perform expansion.
It takes a list of a assumed scopes, and a list of required scope sets on disjunctive normal form, and checks if any of the required scope sets are satisfied.
Example:
requiredScopeSets = [
["scopeA", "scopeB"],
["scopeC:*"]
]
assert scopesMatch(['scopeA', 'scopeB'], requiredScopeSets)
assert scopesMatch(['scopeC:xyz'], requiredScopeSets)
assert not scopesMatch(['scopeA'], requiredScopeSets)
assert not scopesMatch(['scopeC'], requiredScopeSets)
Many Taskcluster API methods are paginated. There are two ways to handle pagination easily with the python client. The first is to implement pagination in your code:
import taskcluster
queue = taskcluster.Queue({'rootUrl': 'https://tc.example.com'})
i = 0
tasks = 0
outcome = queue.listTaskGroup('JzTGxwxhQ76_Tt1dxkaG5g')
while outcome.get('continuationToken'):
print('Response %d gave us %d more tasks' % (i, len(outcome['tasks'])))
if outcome.get('continuationToken'):
outcome = queue.listTaskGroup('JzTGxwxhQ76_Tt1dxkaG5g', query={'continuationToken': outcome.get('continuationToken')})
i += 1
tasks += len(outcome.get('tasks', []))
print('Task Group %s has %d tasks' % (outcome['taskGroupId'], tasks))
There's also an experimental feature to support built in automatic pagination in the sync client. This feature allows passing a callback as the 'paginationHandler' keyword-argument. This function will be passed the response body of the API method as its sole positional arugment.
This example of the built in pagination shows how a list of tasks could be built and then counted:
import taskcluster
queue = taskcluster.Queue({'rootUrl': 'https://tc.example.com'})
responses = []
def handle_page(y):
print("%d tasks fetched" % len(y.get('tasks', [])))
responses.append(y)
queue.listTaskGroup('JzTGxwxhQ76_Tt1dxkaG5g', paginationHandler=handle_page)
tasks = 0
for response in responses:
tasks += len(response.get('tasks', []))
print("%d requests fetch %d tasks" % (len(responses), tasks))
This library can generate exchange patterns for Pulse messages based on the
Exchanges definitions provded by each service. This is done by instantiating a
<service>Events
class and calling a method with the name of the vent.
Options for the topic exchange methods can be in the form of either a single
dictionary argument or keyword arguments. Only one form is allowed.
from taskcluster import client
qEvt = client.QueueEvents({rootUrl: 'https://tc.example.com'})
# The following calls are equivalent
print(qEvt.taskCompleted({'taskId': 'atask'}))
print(qEvt.taskCompleted(taskId='atask'))
Note that the client library does not provide support for interfacing with a Pulse server.
Logging is set up in taskcluster/__init__.py
. If the special
DEBUG_TASKCLUSTER_CLIENT
environment variable is set, the __init__.py
module will set the logging
module's level for its logger to logging.DEBUG
and if there are no existing handlers, add a logging.StreamHandler()
instance. This is meant to assist those who do not wish to bother figuring out
how to configure the python logging module but do want debug messages
The Object service provides an API for reliable uploads and downloads of large objects. This library provides convenience methods to implement the client portion of those APIs, providing well-tested, resilient upload and download functionality. These methods will negotiate the appropriate method with the object service and perform the required steps to transfer the data.
All methods are available in both sync and async versions, with identical APIs except for the async
/await
keywords.
In either case, you will need to provide a configured Object
instance with appropriate credentials for the operation.
NOTE: There is an helper function to upload s3
artifacts, taskcluster.helper.upload_artifact
, but it is deprecated as it only supports the s3
artifact type.
To upload, use any of the following:
await taskcluster.aio.upload.uploadFromBuf(projectId=.., name=.., contentType=.., contentLength=.., uploadId=.., expires=.., maxRetries=.., objectService=.., data=..)
- asynchronously upload data from a buffer full of bytes.await taskcluster.aio.upload.uploadFromFile(projectId=.., name=.., contentType=.., contentLength=.., uploadId=.., expires=.., maxRetries=.., objectService=.., file=..)
- asynchronously upload data from a standard Python file.
Note that this is probably what you want, even in an async context.await taskcluster.aio.upload(projectId=.., name=.., contentType=.., contentLength=.., expires=.., uploadId=.., maxRetries=.., objectService=.., readerFactory=..)
- asynchronously upload data from an async reader factory.taskcluster.upload.uploadFromBuf(projectId=.., name=.., contentType=.., contentLength=.., expires=.., uploadId=.., maxRetries=.., objectService=.., data=..)
- upload data from a buffer full of bytes.taskcluster.upload.uploadFromFile(projectId=.., name=.., contentType=.., contentLength=.., expires=.., uploadId=.., maxRetries=.., objectService=.., file=..)
- upload data from a standard Python file.taskcluster.upload(projectId=.., name=.., contentType=.., contentLength=.., expires=.., uploadId=.., maxRetries=.., objectService=.., readerFactory=..)
- upload data from a sync reader factory.A "reader" is an object with a read(max_size=-1)
method which reads and returns a chunk of 1 .. max_size
bytes, or returns an empty string at EOF, async for the async functions and sync for the remainder.
A "reader factory" is an async callable which returns a fresh reader, ready to read the first byte of the object.
When uploads are retried, the reader factory may be called more than once.
The uploadId
parameter may be omitted, in which case a new slugId will be generated.
To download, use any of the following:
await taskcluster.aio.download.downloadToBuf(name=.., maxRetries=.., objectService=..)
- asynchronously download an object to an in-memory buffer, returning a tuple (buffer, content-type).
If the file is larger than available memory, this will crash.await taskcluster.aio.download.downloadToFile(name=.., maxRetries=.., objectService=.., file=..)
- asynchronously download an object to a standard Python file, returning the content type.await taskcluster.aio.download.download(name=.., maxRetries=.., objectService=.., writerFactory=..)
- asynchronously download an object to an async writer factory, returning the content type.taskcluster.download.downloadToBuf(name=.., maxRetries=.., objectService=..)
- download an object to an in-memory buffer, returning a tuple (buffer, content-type).
If the file is larger than available memory, this will crash.taskcluster.download.downloadToFile(name=.., maxRetries=.., objectService=.., file=..)
- download an object to a standard Python file, returning the content type.taskcluster.download.download(name=.., maxRetries=.., objectService=.., writerFactory=..)
- download an object to a sync writer factory, returning the content type.A "writer" is an object with a write(data)
method which writes the given data, async for the async functions and sync for the remainder.
A "writer factory" is a callable (again either async or sync) which returns a fresh writer, ready to write the first byte of the object.
When uploads are retried, the writer factory may be called more than once.
Artifacts can be downloaded from the queue service with similar functions to those above.
These functions support all of the queue's storage types, raising an error for error
artifacts.
In each case, if runId
is omitted then the most recent run will be used.
await taskcluster.aio.download.downloadArtifactToBuf(taskId=.., runId=.., name=.., maxRetries=.., queueService=..)
- asynchronously download an object to an in-memory buffer, returning a tuple (buffer, content-type).
If the file is larger than available memory, this will crash.await taskcluster.aio.download.downloadArtifactToFile(taskId=.., runId=.., name=.., maxRetries=.., queueService=.., file=..)
- asynchronously download an object to a standard Python file, returning the content type.await taskcluster.aio.download.downloadArtifact(taskId=.., runId=.., name=.., maxRetries=.., queueService=.., writerFactory=..)
- asynchronously download an object to an async writer factory, returning the content type.taskcluster.download.downloadArtifactToBuf(taskId=.., runId=.., name=.., maxRetries=.., queueService=..)
- download an object to an in-memory buffer, returning a tuple (buffer, content-type).
If the file is larger than available memory, this will crash.taskcluster.download.downloadArtifactToFile(taskId=.., runId=.., name=.., maxRetries=.., queueService=.., file=..)
- download an object to a standard Python file, returning the content type.taskcluster.download.downloadArtifact(taskId=.., runId=.., name=.., maxRetries=.., queueService=.., writerFactory=..)
- download an object to a sync writer factory, returning the content type.The Python Taskcluster client has a module taskcluster.helper
with utilities which allows you to easily share authentication options across multiple services in your project.
Generally a project using this library will face different use cases and authentication options:
The class taskcluster.helper.TaskclusterConfig
is made to be instantiated once in your project, usually in a top level module. That singleton is then accessed by different parts of your projects, whenever a Taskcluster service is needed.
Here is a sample usage:
project/__init__.py
, no call to Taskcluster is made at that point:from taskcluster.helper import Taskcluster config
tc = TaskclusterConfig('https://community-tc.services.mozilla.com')
project/boot.py
, we authenticate on Taskcuster with provided credentials, or environment variables, or taskcluster proxy (in that order):from project import tc
tc.auth(client_id='XXX', access_token='YYY')
from project import tc
def sync_usage():
queue = tc.get_service('queue')
queue.ping()
async def async_usage():
hooks = tc.get_service('hooks', use_async=True) # Asynchronous service class
await hooks.ping()
Supported environment variables are:
TASKCLUSTER_ROOT_URL
to specify your Taskcluster instance base url. You can either use that variable or instanciate TaskclusterConfig
with the base url.TASKCLUSTER_CLIENT_ID
& TASKCLUSTER_ACCESS_TOKEN
to specify your client credentials instead of providing them to TaskclusterConfig.auth
TASKCLUSTER_PROXY_URL
to specify the proxy address used to reach Taskcluster in a task. It defaults to http://taskcluster
when not specified.For more details on Taskcluster environment variables, here is the documentation.
Another available utility is taskcluster.helper.load_secrets
which allows you to retrieve a secret using an authenticated taskcluster.Secrets
instance (using TaskclusterConfig.get_service
or the synchronous class directly).
This utility loads a secret, but allows you to:
Let's say you have a secret on a Taskcluster instance named project/foo/prod-config
, which is needed by a backend and some tasks. Here is its content:
common:
environment: production
remote_log: https://log.xx.com/payload
backend:
bugzilla_token: XXXX
task:
backend_url: https://backend.foo.mozilla.com
In your backend, you would do:
from taskcluster import Secrets
from taskcluster.helper import load_secrets
prod_config = load_secrets(
Secrets({...}),
'project/foo/prod-config',
# We only need the common & backend parts
prefixes=['common', 'backend'],
# We absolutely need a bugzilla token to run
required=['bugzilla_token'],
# Let's provide some default value for the environment
existing={
'environment': 'dev',
}
)
# -> prod_config == {
# "environment": "production"
# "remote_log": "https://log.xx.com/payload",
# "bugzilla_token": "XXXX",
# }
In your task, you could do the following using TaskclusterConfig
mentionned above (the class has a shortcut to use an authenticated Secrets
service automatically):
from project import tc
prod_config = tc.load_secrets(
'project/foo/prod-config',
# We only need the common & bot parts
prefixes=['common', 'bot'],
# Let's provide some default value for the environment and backend_url
existing={
'environment': 'dev',
'backend_url': 'http://localhost:8000',
}
)
# -> prod_config == {
# "environment": "production"
# "remote_log": "https://log.xx.com/payload",
# "backend_url": "https://backend.foo.mozilla.com",
# }
To provide local secrets value, you first need to load these values as a dictionary (usually by reading a local file in your format of choice : YAML, JSON, ...) and providing the dictionary to load_secrets
by using the local_secrets
parameter:
import os
import yaml
from taskcluster import Secrets
from taskcluster.helper import load_secrets
local_path = 'path/to/file.yml'
prod_config = load_secrets(
Secrets({...}),
'project/foo/prod-config',
# We support an optional local file to provide some configuration without reaching Taskcluster
local_secrets=yaml.safe_load(open(local_path)) if os.path.exists(local_path) else None,
)
This library is co-versioned with Taskcluster itself. That is, a client with version x.y.z contains API methods corresponding to Taskcluster version x.y.z. Taskcluster is careful to maintain API compatibility, and guarantees it within a major version. That means that any client with version x.* will work against any Taskcluster services at version x.*, and is very likely to work for many other major versions of the Taskcluster services. Any incompatibilities are noted in the Changelog.
FAQs
Python client for Taskcluster
We found that taskcluster demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.