0

We want to setup patroni HA setup with 3 coordinator, 2 worker node 1 & 2 worker node 2.

I am getting system id mismatch when i start 2nd coordinator. First one is running fine, only facing issue while running the 2nd coordinator while joining the cluster.

Below is my patroni.yml file.

I tried deleting data directory and also removing key from etcd and start again. But it is the same issue.

scope: demo
name: ${INSTANCE_NAME}
namespace: /service/
citus:
 group: ${CITUS_GROUP} # 0 for coordinator and 1, 2, 3, etc for workers
 database: citus # must be the same on all nodes
restapi:
 listen: 0.0.0.0:8008
 connect_address: ${INVENTORY_HOSTNAME}:8008
etcd3:
 hosts: ${PATRONI_ETCD_HOSTS}
bootstrap:
 method: initdb
 dcs:
 ttl: 30
 loop_wait: 10
 retry_timeout: 10
 maximum_lag_on_failover: 1048576
 master_start_timeout: 300
 synchronous_mode: false
 synchronous_mode_strict: false
 synchronous_node_count: 1
 postgresql:
 use_pg_rewind: true
 use_slots: true
 parameters:
 max_connections: 1000
 superuser_reserved_connections: 5
 password_encryption: md5
 max_locks_per_transaction: 512
 max_prepared_transactions: 0
 huge_pages: try
 shared_buffers: 512MB
 effective_cache_size: 1024MB
 work_mem: 512MB
 maintenance_work_mem: 256MB
 checkpoint_timeout: 15min
 checkpoint_completion_target: 0.9
 min_wal_size: 2GB
 max_wal_size: 8GB
 wal_buffers: 16MB
 default_statistics_target: 1000
 seq_page_cost: 1
 random_page_cost: 1.1
 effective_io_concurrency: 200
 synchronous_commit: on
 autovacuum: on
 autovacuum_max_workers: 5
 autovacuum_vacuum_scale_factor: 0.01
 autovacuum_analyze_scale_factor: 0.01
 autovacuum_vacuum_cost_limit: 500
 autovacuum_vacuum_cost_delay: 2
 autovacuum_naptime: 1s
 max_files_per_process: 4096
 archive_mode: on
 archive_timeout: 1800s
 archive_command: cd .
 wal_level: logical
 wal_keep_size: 2GB
 max_wal_senders: 10
 max_replication_slots: 10
 hot_standby: on
 wal_log_hints: on
 wal_compression: on
 shared_preload_libraries: pg_stat_statements,auto_explain
 pg_stat_statements.max: 10000
 pg_stat_statements.track: all
 pg_stat_statements.track_utility: false
 pg_stat_statements.save: true
 auto_explain.log_min_duration: 10s
 auto_explain.log_analyze: true
 auto_explain.log_buffers: true
 auto_explain.log_timing: false
 auto_explain.log_triggers: true
 auto_explain.log_verbose: true
 auto_explain.log_nested_statements: true
 auto_explain.sample_rate: 0.01
 track_io_timing: on
 log_lock_waits: on
 log_temp_files: 0
 track_activities: on
 track_activity_query_size: 4096
 track_counts: on
 track_functions: all
 log_checkpoints: on
 logging_collector: on
 log_truncate_on_rotation: on
 log_rotation_age: 1d
 log_rotation_size: 0
 log_line_prefix: '%t [%p-%l] %r %q%u@%d '
 log_filename: postgresql-%a.log
 log_directory: /var/log/postgresql
 hot_standby_feedback: on
 max_standby_streaming_delay: 30s
 wal_receiver_status_interval: 10s
 idle_in_transaction_session_timeout: 10min
 jit: off
 max_worker_processes: 24
 max_parallel_workers: 10
 max_parallel_workers_per_gather: 2
 max_parallel_maintenance_workers: 2
 tcp_keepalives_count: 10
 tcp_keepalives_idle: 300
 tcp_keepalives_interval: 30
 citus.enable_change_data_capture: 'on'
 citus.max_client_connections: 300
 slots:
 demo_slot:
 type: logical
 database: demo
 plugin: pgoutput
 initdb:
 - encoding: UTF8
 - locale: en_US.UTF-8
 - data-checksums
 pg_hba:
 - host replication ${PATRONI_REPLICATION_USERNAME} 127.0.0.1/32 md5
 - host all all 0.0.0.0/0 md5
postgresql:
 listen: 0.0.0.0:5432
 connect_address: ${INVENTORY_HOSTNAME}:5432
 use_unix_socket: true
 data_dir: /data/postgresql
 bin_dir: /usr/lib/postgresql/16/bin
 config_dir: /etc/postgresql/16/main
 pgpass: /tmp/pgpass
 authentication:
 replication:
 username: ${PATRONI_REPLICATION_USERNAME}
 password: ${PATRONI_REPLICATION_PASSWORD}
 superuser:
 username: ${PATRONI_SUPERUSER_USERNAME}
 password: ${PATRONI_SUPERUSER_PASSWORD}
 parameters:
 unix_socket_directories: /var/run/postgresql
 remove_data_directory_on_rewind_failure: false
 remove_data_directory_on_diverged_timelines: false
 create_replica_methods:
 - basebackup
 basebackup:
 max-rate: '1000M'
 checkpoint: fast
watchdog:
 mode: automatic
 device: /dev/watchdog
 safety_margin: 5
tags:
 nosync: false
 noloadbalance: false
 nofailover: false
 clonefrom: false

On 2nd coordinator, i also tried remove method: initdb and its arguments as well.

asked Jun 3, 2024 at 11:07
2
  • Are you configuring slave node using initdb?If yes,that is not correct way you will get system id mismatch,you should take backup from master node. Commented Jun 3, 2024 at 11:20
  • No, while setting up slave node, i remove method and initdb config from patroni.yml. Still i get the same error. @manjunath Commented Jun 3, 2024 at 11:36

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.