Research
Security News
Malicious npm Package Targets Solana Developers and Hijacks Funds
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
mysql_ch_replicator
is a powerful and efficient tool designed for real-time replication of MySQL databases to ClickHouse.
With a focus on high performance, it utilizes batching heavily and uses C++ extension for faster execution. This tool ensures seamless data integration with support for migrations, schema changes, and correct data management.
mysql_ch_replicator
ensures physical removal of data.MaterializedMySQL
, which replicates the log separately for each database.To install mysql_ch_replicator
, use the following command:
pip install mysql_ch_replicator
You may need to also compile C++ components if they're not pre-built for your platform.
For realtime data sync from MySQL to ClickHouse:
example_config.yaml
as an example.my.cnf
should include following settings (required to write binary log in raw format, and enable password authentication):[mysqld]
# ... other settings ...
gtid_mode = on
enforce_gtid_consistency = 1
binlog_expire_logs_seconds = 864000
max_binlog_size = 500M
binlog_format = ROW
[mysqld]
# ... other settings ...
gtid_strict_mode = ON
gtid_domain_id = 0
server_id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_expire_logs_seconds = 864000
max_binlog_size = 500M
binlog_format = ROW
For AWS RDS
you need to set following settings in Parameter groups
:
binlog_format ROW
binlog_expire_logs_seconds 86400
override.xml
should include following settings (it makes clickhouse apply final keyword automatically to handle updates correctly):<clickhouse>
<!-- ... other settings ... -->
<profiles>
<default>
<!-- ... other settings ... -->
<final>1</final>
</default>
</profiles>
</clickhouse>
!!! Double check final setting is applied !!!
Execute the following command in clickhouse:
SELECT name, value, changed FROM system.settings WHERE name = 'final'
Setting should be set to 1. If not, you should:
override.xml
is appliedusers.xml
insteadmysql_ch_replicator --config config.yaml run_all
This will keep data in ClickHouse updating as you update data in MySQL. It will always be in sync.
If you just need to copy data once, and don't need continuous synchronization for all changes, you should do following:
example_config.yaml
as an example.mysql_ch_replicator --config config.yaml db_replicator --database mysql_db_name --initial_only=True
Where mysql_db_name
is the name of the database you want to copy.
Don't be afraid to interrupt process in the middle. It will save the state and continue copy after restart.
mysql_ch_replicator
can be configured through a configuration file. Here is the config example:
mysql:
host: 'localhost'
port: 8306
user: 'root'
password: 'root'
clickhouse:
host: 'localhost'
port: 8323
user: 'default'
password: 'default'
connection_timeout: 30 # optional
send_receive_timeout: 300 # optional
binlog_replicator:
data_dir: '/home/user/binlog/'
records_per_file: 100000
databases: 'database_name_pattern_*'
tables: '*'
# OPTIONAL SETTINGS
exclude_databases: ['database_10', 'database_*_42'] # optional
exclude_tables: ['meta_table_*'] # optional
log_level: 'info' # optional
optimize_interval: 86400 # optional
auto_restart_interval: 3600 # optional
indexes: # optional
- databases: '*'
tables: ['test_table']
index: 'INDEX name_idx name TYPE ngrambf_v1(5, 65536, 4, 0) GRANULARITY 1'
mysql
MySQL connection settingsclickhouse
ClickHouse connection settingsbinlog_replicator.data_dir
Create a new empty directory, it will be used by script to store it's statedatabases
Databases name pattern to replicate, e.g. db_*
will match db_1
db_2
db_test
, list is also supportedtables
- tables to filter, list is also supportedexclude_databases
- databases to exclude, string or list, eg 'table1*'
or ['table2', 'table3*']
. If same database matches databases
and exclude_databases
, exclude has higher priority.exclude_tables
- databases to exclude, string or list. If same table matches tables
and exclude_tables
, exclude has higher priority.log_level
- log level, default is info
, you can set to debug
to get maximum information (allowed values are debug
, info
, warning
, error
, critical
)optimize_interval
- interval (seconds) between automatic OPTIMIZE table FINAL
calls. Default 86400 (1 day). This is required to perform all merges guaranteed and avoid increasing of used storage and decreasing performance.auto_restart_interval
- interval (seconds) between automatic db_replicator restart. Default 3600 (1 hour). This is done to reduce memory usage.indexes
- you may want to add some indexes to accelerate performance, eg. ngram index for full-test search, etc. To apply indexes you need to start replication from scratch.Few more tables / dbs examples:
databases: ['my_database_1', 'my_database_2']
tables: ['table_1', 'table_2*']
mysql_ch_replicator
supports the following:
In case of a failure or during the initial replication, mysql_ch_replicator
will preserve old data and continue syncing new data seamlessly. You could remove the state and restart replication from scratch.
To contribute to mysql_ch_replicator
, clone the repository and install the required dependencies:
git clone https://github.com/your-repo/mysql_ch_replicator.git
cd mysql_ch_replicator
pip install -r requirements.txt
sudo docker compose -f docker-compose-tests.yaml up
sudo docker exec -w /app/ -it mysql_ch_replicator-replicator-1 python3 -m pytest -v -s test_mysql_ch_replicator.py
Contributions are welcome! Please open an issue or submit a pull request for any bugs or features you would like to add.
mysql_ch_replicator
is licensed under the MIT License. See the LICENSE file for more details.
Thank you to all the contributors who have helped build and improve this tool.
FAQs
Tool for replication of MySQL databases to ClickHouse
We found that mysql-ch-replicator demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Security News
Research
Socket researchers have discovered malicious npm packages targeting crypto developers, stealing credentials and wallet data using spyware delivered through typosquats of popular cryptographic libraries.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.