citus + patroni coordinator setup issue

Question 1

We want to setup patroni HA setup with 3 coordinator, 2 worker node 1 & 2 worker node 2.

I am getting system id mismatch when i start 2nd coordinator. First one is running fine, only facing issue while running the 2nd coordinator while joining the cluster.

Below is my patroni.yml file.

I tried deleting data directory and also removing key from etcd and start again. But it is the same issue.

scope: demo
name: ${INSTANCE_NAME}
namespace: /service/
citus:
 group: ${CITUS_GROUP} # 0 for coordinator and 1, 2, 3, etc for workers
 database: citus # must be the same on all nodes
restapi:
 listen: 0.0.0.0:8008
 connect_address: ${INVENTORY_HOSTNAME}:8008
etcd3:
 hosts: ${PATRONI_ETCD_HOSTS}
bootstrap:
 method: initdb
 dcs:
 ttl: 30
 loop_wait: 10
 retry_timeout: 10
 maximum_lag_on_failover: 1048576
 master_start_timeout: 300
 synchronous_mode: false
 synchronous_mode_strict: false
 synchronous_node_count: 1
 postgresql:
 use_pg_rewind: true
 use_slots: true
 parameters:
 max_connections: 1000
 superuser_reserved_connections: 5
 password_encryption: md5
 max_locks_per_transaction: 512
 max_prepared_transactions: 0
 huge_pages: try
 shared_buffers: 512MB
 effective_cache_size: 1024MB
 work_mem: 512MB
 maintenance_work_mem: 256MB
 checkpoint_timeout: 15min
 checkpoint_completion_target: 0.9
 min_wal_size: 2GB
 max_wal_size: 8GB
 wal_buffers: 16MB
 default_statistics_target: 1000
 seq_page_cost: 1
 random_page_cost: 1.1
 effective_io_concurrency: 200
 synchronous_commit: on
 autovacuum: on
 autovacuum_max_workers: 5
 autovacuum_vacuum_scale_factor: 0.01
 autovacuum_analyze_scale_factor: 0.01
 autovacuum_vacuum_cost_limit: 500
 autovacuum_vacuum_cost_delay: 2
 autovacuum_naptime: 1s
 max_files_per_process: 4096
 archive_mode: on
 archive_timeout: 1800s
 archive_command: cd .
 wal_level: logical
 wal_keep_size: 2GB
 max_wal_senders: 10
 max_replication_slots: 10
 hot_standby: on
 wal_log_hints: on
 wal_compression: on
 shared_preload_libraries: pg_stat_statements,auto_explain
 pg_stat_statements.max: 10000
 pg_stat_statements.track: all
 pg_stat_statements.track_utility: false
 pg_stat_statements.save: true
 auto_explain.log_min_duration: 10s
 auto_explain.log_analyze: true
 auto_explain.log_buffers: true
 auto_explain.log_timing: false
 auto_explain.log_triggers: true
 auto_explain.log_verbose: true
 auto_explain.log_nested_statements: true
 auto_explain.sample_rate: 0.01
 track_io_timing: on
 log_lock_waits: on
 log_temp_files: 0
 track_activities: on
 track_activity_query_size: 4096
 track_counts: on
 track_functions: all
 log_checkpoints: on
 logging_collector: on
 log_truncate_on_rotation: on
 log_rotation_age: 1d
 log_rotation_size: 0
 log_line_prefix: '%t [%p-%l] %r %q%u@%d '
 log_filename: postgresql-%a.log
 log_directory: /var/log/postgresql
 hot_standby_feedback: on
 max_standby_streaming_delay: 30s
 wal_receiver_status_interval: 10s
 idle_in_transaction_session_timeout: 10min
 jit: off
 max_worker_processes: 24
 max_parallel_workers: 10
 max_parallel_workers_per_gather: 2
 max_parallel_maintenance_workers: 2
 tcp_keepalives_count: 10
 tcp_keepalives_idle: 300
 tcp_keepalives_interval: 30
 citus.enable_change_data_capture: 'on'
 citus.max_client_connections: 300
 slots:
 demo_slot:
 type: logical
 database: demo
 plugin: pgoutput
 initdb:
 - encoding: UTF8
 - locale: en_US.UTF-8
 - data-checksums
 pg_hba:
 - host replication ${PATRONI_REPLICATION_USERNAME} 127.0.0.1/32 md5
 - host all all 0.0.0.0/0 md5
postgresql:
 listen: 0.0.0.0:5432
 connect_address: ${INVENTORY_HOSTNAME}:5432
 use_unix_socket: true
 data_dir: /data/postgresql
 bin_dir: /usr/lib/postgresql/16/bin
 config_dir: /etc/postgresql/16/main
 pgpass: /tmp/pgpass
 authentication:
 replication:
 username: ${PATRONI_REPLICATION_USERNAME}
 password: ${PATRONI_REPLICATION_PASSWORD}
 superuser:
 username: ${PATRONI_SUPERUSER_USERNAME}
 password: ${PATRONI_SUPERUSER_PASSWORD}
 parameters:
 unix_socket_directories: /var/run/postgresql
 remove_data_directory_on_rewind_failure: false
 remove_data_directory_on_diverged_timelines: false
 create_replica_methods:
 - basebackup
 basebackup:
 max-rate: '1000M'
 checkpoint: fast
watchdog:
 mode: automatic
 device: /dev/watchdog
 safety_margin: 5
tags:
 nosync: false
 noloadbalance: false
 nofailover: false
 clonefrom: false

On 2nd coordinator, i also tried remove method: initdb and its arguments as well.

Question 2

Are you configuring slave node using initdb?If yes,that is not correct way you will get system id mismatch,you should take backup from master node.

Question 3

No, while setting up slave node, i remove method and initdb config from patroni.yml. Still i get the same error. @manjunath

CollectivesTM on Stack Overflow

citus + patroni coordinator setup issue

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions