Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
github.com/miku/solrbulk
Motivation:
Sometimes you need to index a bunch of documents really, really fast. Even with Solr 4.0 and soft commits, if you send one document at a time you will be limited by the network. The solution is two-fold: batching and multi-threading. http://lucidworks.com/blog/high-throughput-indexing-in-solr/
solrbulk expects as input a file with line-delimited JSON. Each line represents a single document. solrbulk takes care of reformatting the documents into the bulk JSON format, that SOLR understands.
solrbulk will send documents in batches and in parallel. The number of
documents per batch can be set via -size
, the number of workers with -w
.
This tool has been developed for project finc at Leipzig University Library.
Installation via Go tools.
$ go install github.com/miku/solrbulk/cmd/solrbulk@latest
There are also DEB and RPM packages available at https://github.com/miku/solrbulk/releases/.
Flags.
$ solrbulk
Usage of solrbulk:
-commit int
commit after this many docs (default 1000000)
-cpuprofile string
write cpu profile to file
-memprofile string
write heap profile to file
-no-final-commit
omit final commit
-optimize
optimize index
-purge
remove documents from index before indexing (use purge-query to selectively clean)
-purge-pause duration
insert a short pause after purge (default 2s)
-purge-query string
query to use, when purging (default "*:*")
-server string
url to SOLR server, including host, port and path to collection,
e.g. http://localhost:8983/solr/biblio
-size int
bulk batch size (default 1000)
-update-request-handler-name string
where solr.UpdateRequestHandler is mounted on the server,
https://is.gd/s0eirv (default "/update")
-v prints current program version
-verbose
output basic progress
-w int
number of workers to use (default 4)
-z unzip gz'd file on the fly
Given a newline delimited JSON file:
$ cat file.ldj
{"id": "1", "state": "Alaska"}
{"id": "2", "state": "California"}
{"id": "3", "state": "Oregon"}
...
$ solrbulk -verbose -server https://192.168.1.222:8085/collection1 file.ldj
The server parameter contains host, port and path up to, but excluding the
default update
route
for search (since 0.3.4, this can be adjusted via
-update-request-handler-name
flag).
For example, if you usually update via https://192.168.1.222:8085/solr/biblio/update
the server parameter would be:
$ solrbulk -server https://192.168.1.222:8085/solr/biblio file.ldj
autoCommit
, autoSoftCommit
and the transaction log in solrconfig.xml
.-commit
. solrbulk will issue a final commit request at the end of the processing anyway./solr/update
.Try esbulk.
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.