Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
embulk-input-bigquery
Advanced tools
This is Embulk input plugin from Bigquery.
install it yourself as:
$ embulk gem install embulk-input-bigquery
This plugin uses the gem google-cloud(Google Cloud Client Library for Ruby)
and queries data using the synchronous method. Optional configuration items comply with the Google Cloud Client Library.
name | type | required? | default | description |
---|---|---|---|---|
max | integer | optional | null | The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies. |
cache | boolean | optional | true | Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching. |
standard_sql | boolean | optional | true | Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. When set to true, the values of large_results and flatten are ignored; the query will be run as if large_results is true and flatten is false. Optional. The default value is true. |
legacy_sql | boolean | optional | false | legacy_sql Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL When set to false, the values of large_results and flatten are ignored; the query will be run as if large_results is true and flatten is false. Optional. The default value is false. |
location | string | optional | null | If your data is in a location other than the US or EU multi-region, you must specify the location. See also Dataset Locations | BigQuery | Google Cloud |
in:
type: bigquery
project: 'project-name'
keyfile: '/home/hogehoge/bigquery-keyfile.json'
sql: 'SELECT price,category_id FROM [ecsite.products] GROUP BY category_id'
columns:
- {name: price, type: long}
- {name: category_id, type: string}
max: 2000
# # If your data is in a location other than the US or EU multi-region, you must specify the location.
# location: asia-northeast1
out:
type: stdout
If the table name is changeable, then
in:
type: bigquery
project: 'project-name'
keyfile: '/home/hogehoge/bigquery-keyfile.json'
sql_erb: 'SELECT price,category_id FROM [ecsite.products_<%= params["date"].strftime("%Y%m") %>] GROUP BY category_id'
erb_params:
date: "require 'date'; (Date.today - 1)"
columns:
- {name: price, type: long}
- {name: category_id, type: long}
- {name: month, type: timestamp, format: '%Y-%m', eval: 'require "time"; Time.parse(params["date"]).to_i'}
You first need to create a service account (client ID), download its json key and deploy the key with embulk.
in:
type: bigquery
project: project_name
keyfile: /path/to/keyfile.json
You can also embed contents of json_keyfile at config.yml.
in:
type: bigquery
project: project_name
keyfile:
content: |
{
"type": "service_account",
"project_id": "example-project",
"private_key_id": "1234567890ABCDEFG",
"private_key": "**************************************",
"client_email": "example-project@hogehoge.gserviceaccount.com",
"client_id": "12345678901234567890",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://accounts.google.com/o/oauth2/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/hogehoge.gcp.iam.gserviceaccount.com"
}
Column schema can be automatically determined from query results if columns
definition is not given.
Please note that we have to wait until BigQuery query job complets to get the schema information.
in:
type: bigquery
project: project_name
keyfile: /path/to/keyfile.json
sql: 'SELECT price,category_id FROM [ecsite.products] GROUP BY category_id'
out:
type: stdout
embulk-input-bigquery
queries to BigQuery, so it costs. To save money, you may take following procedures instead:
embulk bundle install --path vendor/bundle
embulk run -X page_size=1 -b . -l trace example/example.yml
Upgrade lib/embulk/input/bigquery/version.rb
, then
$ bundle exec rake release
FAQs
Unknown package
We found that embulk-input-bigquery demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.