Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
A powerful scanner to scan your Filesystem, S3, MongoDB, MySQL, PostgreSQL, Redis, Slack, Google Cloud Storage and Firebase storage for PII and sensitive data using text and OCR analysis. Hawk-eye can also analyse supports most of the file types like docx, xlsx, pptx, pdf, jpg, png, gif, zip, tar, rar, etc.
Find PII & Secrets like never before across your entire infrastructure with same tool!
Description • Installation • Features • Configuration • Acknowledgements
HAWK Eye is a robust, command-line tool built to safeguard against data breaches and cyber threats. Much like the sharp vision of a hawk, it quickly scans multiple data sources—S3, MySQL, PostgreSQL, MongoDB, CouchDB, Google Drive, Slack, Redis, Firebase, file systems, and Google Cloud buckets (GCS)—for Personally Identifiable Information (PII) and secrets. Using advanced text analysis and OCR techniques, HAWK Eye delves into various document formats like docx, xlsx, pptx, pdf, images (jpg, png, gif), compressed files (zip, tar, rar), and even video files to ensure comprehensive protection across platforms.
Like the keen vision of a hawk, this tool enables you to monitor and safeguard your data with precision and accuracy, ensuring data privacy and security.
For commercial support and help with HAWK Eye, please contact us at LinkedIn or Twitter.
See how this works on Youtube - https://youtu.be/LuPXE7UJKOY
pip3 install hawk-scanner
Example working command (Use all/fs/s3/gcs etc...)
hawk_scanner all --connection connection.yml --fingerprint fingerprint.yml --json output.json --debug
Pass connection data as CLI input in --connection-json flag, and output in json data (Helpful for CI/CD pipeline or automation)
hawk_scanner fs --connection-json '{"sources": {"fs": {"fs1": {"quick_scan": true, "path": "/Users/rohitcoder/Downloads/data/KYC_PDF.pdf"}}}}' --stdout --quiet --fingerprint fingerprint.yml
You can also import Hawk-eye in your own python scripts and workflows, for better flexibility
from hawk_scanner.internals import system
pii = system.scan_file("/Users/kumarohit/Downloads/Resume.pdf")
print(pii)
You can also import Hawk-eye with custom fingerprints in your own python scripts like this
from hawk_scanner.internals import system
pii = system.scan_file("/Users/kumarohit/Downloads/Resume.pdf", {
"fingerprint": {
"Email": '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}',
}
)
print(pii)
You have to install some extra dependencies.
For scanning postgresql source, this tool requires psycopg2-binary
dependency, we can't ship this dependency with main package because psycopg2-binary not works with most of the systems espically with Windows, so you have to install it manually.
pip3 install psycopg2-binary
You may get error after running hawk-scanner
command on redhat from cv2
dependency . You need to install some extra dependencies
yum install mesa-libGL
HAWK Eye is a Python-based CLI tool that can be installed using the following steps:
git clone https://github.com/rohitcoder/hawk-eye.git
pip3 install -r requirements.txt
python3 hawk_scanner/main.py
--debug
flag enables printing of all debugging output for comprehensive troubleshooting.To unleash the power of HAWK Eye, simply follow the steps mentioned in the "Usage" section of the "README.md" file.
Note: If you don't provide any command, it will run all commands (firebase, fs, gcs, mysql, text, couchdb, gdrive, gdrive workspace, slack, postgresql, redis, s3) by default.
Option | Description |
---|---|
firebase | Scan Firebase profiles for PII and secrets data. |
fs | Scan filesystem profiles for PII and secrets data. |
gcs | Scan GCS (Google Cloud Storage) profiles for PII and secrets data. |
text | Scan text or string for PII and secrets data. |
mysql | Scan MySQL profiles for PII and secrets data. |
mongodb | Scan MongoDB profiles for PII and secrets data. |
couchdb | Scan CouchDB profiles for PII and secrets data. |
slack | Scan slack profiles for PII and secrets data. |
postgresql | Scan postgreSQL profiles for PII and secrets data. |
redis | Scan Redis profiles for PII and secrets data. |
s3 | Scan S3 profiles for PII and secrets data. |
gdrive | Scan Google drive profiles for PII and secrets data. |
gdrive_workspace | Scan Google drive Workspace profiles for PII and secrets data. |
--connection | Provide a connection YAML local file path like --connection connection.yml, this file will contain all creds and configs for different sources and other configurations. |
--connection-json | Provide a connection json as CLI Input, helpful when you want to run this tool in CI/CD pipeline or automation. |
--fingerprint | Provide a fingerprint file path like --fingerprint fingerprint.yml, this file will override default fingerprints. |
--debug | Enable Debug mode. |
--stdout | Print output on stdout or terminal. |
--quiet | Use --quiet flag if you want to hide all logs from your terminal. |
--json | Provide --json file name to save output in json file like --json output.json |
--shutup | Use --shutup flag if you want to hide Hawk ASCII art from your terminal 😁 |
HAWK Eye uses a YAML file to store connection profiles for various data sources. The connection.yml file is located in the config directory. You can add new profiles to this file to enable HAWK Eye to scan additional data sources. The following sections describe the process for adding new profiles to the connection.yml file.
notify:
redacted: True
suppress_duplicates: True
slack:
webhook_url: https://hooks.slack.com/services/T0XXXXXXXXXXX/BXXXXXXXX/1CIyXXXXXXXXXXXXXXX
sources:
redis:
redis_example:
host: YOUR_REDIS_HOST
password: YOUR_REDIS_PASSWORD
s3:
s3_example:
access_key: YOUR_S3_ACCESS_KEY
secret_key: YOUR_S3_SECRET_KEY
bucket_name: YOUR_S3_BUCKET_NAME
cache: true
gcs:
gcs_example:
credentials_file: /path/to/your/credential_file.json
bucket_name: YOUR_GCS_BUCKET_NAME
cache: true
exclude_patterns:
- .pdf
- .docx
firebase:
firebase_example:
credentials_file: /path/to/your/credential_file.json
bucket_name: YOUR_FIREBASE_BUCKET_NAME
cache: true
exclude_patterns:
- .pdf
- .docx
mysql:
mysql_example:
host: YOUR_MYSQL_HOST
port: YOUR_MYSQL_PORT
user: YOUR_MYSQL_USERNAME
password: YOUR_MYSQL_PASSWORD
database: YOUR_MYSQL_DATABASE_NAME
limit_start: 0 # Specify the starting limit for the range
limit_end: 500 # Specify the ending limit for the range
tables:
- table1
- table2
exclude_columns:
- column1
- column2
postgresql:
postgresql_example:
host: YOUR_POSTGRESQL_HOST
port: YOUR_POSTGRESQL_PORT
user: YOUR_POSTGRESQL_USERNAME
password: YOUR_POSTGRESQL_PASSWORD
database: YOUR_POSTGRESQL_DATABASE_NAME
limit_start: 0 # Specify the starting limit for the range
limit_end: 500 # Specify the ending limit for the range
tables:
- table1
- table2
mongodb:
mongodb_example:
uri: YOUR_MONGODB_URI
host: YOUR_MONGODB_HOST
port: YOUR_MONGODB_PORT
username: YOUR_MONGODB_USERNAME
password: YOUR_MONGODB_PASSWORD
database: YOUR_MONGODB_DATABASE_NAME
uri: YOUR_MONGODB_URI # Use either URI or individual connection parameters
limit_start: 0 # Specify the starting limit for the range
limit_end: 500 # Specify the ending limit for the range
collections:
- collection1
- collection2
fs:
fs_example:
path: /path/to/your/filesystem/directory
exclude_patterns:
- .pdf
- .docx
- private
- venv
- node_modules
gdrive:
drive_example:
folder_name:
credentials_file: /Users/kumarohit/Downloads/client_secret.json ## this will be oauth app json file
cache: true
exclude_patterns:
- .pdf
- .docx
gdrive_workspace:
drive_example:
folder_name:
credentials_file: /Users/kumarohit/Downloads/client_secret.json ## this will be service account json file
impersonate_users:
- usera@amce.org
- userb@amce.org
cache: true
exclude_patterns:
- .pdf
- .docx
text:
profile1:
text: "Hello World HHWPK6943Q"
slack:
slack_example:
token: xoxp-XXXXXXXXXXXXXXXXXXXXXXXXX # get your slack app these permissiosn https://api.slack.com/methods/team.info and https://api.slack.com/methods/conversations.list
channel_types: "public_channel,private_channel"
# Optional: List of channel names to check
# channel_names:
# - general
# - random
You can add or remove profiles from the connection.yml file as needed. You can also configure only one or two data sources if you don't need to scan all of them.
HAWK Eye's extensibility empowers developers to contribute new security commands. Here's how:
We welcome contributions from the open-source community to enhance HAWK Eye's capabilities in securing data sources. To contribute:
Join the HAWK Eye community and contribute to data source security worldwide. For any questions or assistance, feel free to open an issue on the repository.
If you find HAWK Eye useful and would like to support the project, please consider making a donation. All 100% of the donations will be distributed to charities focused on education welfare and animal help.
We extend our heartfelt appreciation to all contributors who continuously improve this tool! Your efforts are essential in strengthening the security landscape. 🙏
Feel free to make a donation directly to the charities of your choice or send it to us, and we'll ensure it reaches the deserving causes. Just reach out to us on LinkedIn or Twitter to let us know about your contribution. Your generosity and support mean the world to us, and we can't wait to express our heartfelt gratitude.
Your donations will play a significant role in making a positive impact in the lives of those in need. Thank you for considering supporting our cause!
FAQs
A powerful scanner to scan your Filesystem, S3, MongoDB, MySQL, PostgreSQL, Redis, Slack, Google Cloud Storage and Firebase storage for PII and sensitive data using text and OCR analysis. Hawk-eye can also analyse supports most of the file types like docx, xlsx, pptx, pdf, jpg, png, gif, zip, tar, rar, etc.
We found that hawk-scanner demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.