Discussion:
on exiting maintenance mode
Ferenc Wagner
2014-08-22 00:37:44 UTC
Permalink
Hi,

While my Pacemaker cluster was in maintenance mode, resources were moved
(by hand) between the nodes as I rebooted each node in turn. In the end
the crm status output became perfectly empty, as the reboot of a given
node removed from the output the resources which were located on the
rebooted node at the time of entering maintenance mode. I expected full
resource discovery on exiting maintenance mode, but it probably did not
happen, as the cluster started up resources already running on other
nodes, which is generally forbidden. Given that all resources were
running (though possibly migrated during the maintenance), what would
have been the correct way of bringing the cluster out of maintenance
mode? This should have required no resource actions at all. Would
cleanup of all resources have helped? Or is there a better way?
--
Thanks,
Feri.
--
Linux-cluster mailing list
Linux-***@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Andrew Beekhof
2014-08-26 07:40:50 UTC
Permalink
Post by Ferenc Wagner
Hi,
While my Pacemaker cluster was in maintenance mode, resources were moved
(by hand) between the nodes as I rebooted each node in turn. In the end
the crm status output became perfectly empty, as the reboot of a given
node removed from the output the resources which were located on the
rebooted node at the time of entering maintenance mode. I expected full
resource discovery on exiting maintenance mode,
Version and logs?

The discovery usually happens at the point the cluster is started on a node.
Maintenance mode just prevents the cluster from doing anything about it.
Post by Ferenc Wagner
but it probably did not
happen, as the cluster started up resources already running on other
nodes, which is generally forbidden. Given that all resources were
running (though possibly migrated during the maintenance), what would
have been the correct way of bringing the cluster out of maintenance
mode? This should have required no resource actions at all. Would
cleanup of all resources have helped? Or is there a better way?
--
Thanks,
Feri.
--
Linux-cluster mailing list
https://www.redhat.com/mailman/listinfo/linux-cluster
Ferenc Wagner
2014-08-26 17:40:19 UTC
Permalink
Post by Andrew Beekhof
Post by Ferenc Wagner
While my Pacemaker cluster was in maintenance mode, resources were moved
(by hand) between the nodes as I rebooted each node in turn. In the end
the crm status output became perfectly empty, as the reboot of a given
node removed from the output the resources which were located on the
rebooted node at the time of entering maintenance mode. I expected full
resource discovery on exiting maintenance mode,
Version and logs?
(The more interesting part comes later, please skip to the theoretical
part if you're short on time. :)

I left those out, as I don't expect the actual behavior to be a bug.
But I experienced this with Pacemaker version 1.1.7. I know it's old
and it suffers from crmd segfault on entering maintenance mode (cf.
http://thread.gmane.org/gmane.linux.highavailability.user/39121), but
works well generally so I did not get to upgrade it yet. Now that I
mentioned the crmd segfault: I noted that it died on the DC when I
entered maintenance mode:

crmd: [7452]: info: te_rsc_command: Initiating action 64: cancel vm-tmvp_monitor_60000 on n01 (local)
crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
crmd: [7452]: ERROR: get_lrm_resource: Could not add resource vm-tmvp to LRM
crmd: [7452]: ERROR: do_lrm_invoke: Invalid resource definition
crmd: [7452]: WARN: do_lrm_invoke: bad input <create_request_adv origin="te_rsc_command" t="crmd" version="3.0.6" subt="request" reference="lrm_invoke-tengine-1408517719-30820" crm_task="lrm_invoke" crm_sys_to="lrmd" crm_sys_from="tengine" crm_host_to="n01" >
crmd: [7452]: WARN: do_lrm_invoke: bad input <crm_xml >
crmd: [7452]: WARN: do_lrm_invoke: bad input <rsc_op id="64" operation="cancel" operation_key="vm-tmvp_monitor_60000" on_node="n01" on_node_uuid="n01" transition-key="64:20579:0:1b0a6e79-af5a-41e4-8ced-299371e7922c" >
crmd: [7452]: WARN: do_lrm_invoke: bad input <primitive id="vm-tmvp" long-id="vm-tmvp" class="ocf" provider="niif" type="TransientDomain" />
crmd: [7452]: info: te_rsc_command: Initiating action 86: cancel vm-wfweb_monitor_60000 on n01 (local)
crmd: [7452]: ERROR: lrm_add_rsc(870): failed to send a addrsc message to lrmd via ch_cmd channel.
crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
corosync[6966]: [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0x1dc6ea0, async-conn=0x1dc6ea0) left
pacemakerd: [7443]: WARN: Managed crmd process 7452 killed by signal 11 [SIGSEGV - Segmentation violation].
pacemakerd: [7443]: notice: pcmk_child_exit: Child process crmd terminated with signal 11 (pid=7452, rc=0)

However, it got restarted seamlessly, without the node being fenced, so
I did not even notice this until now. Should this have resulted in the
node being fenced?

But back to the issue at hand. The Pacemaker shutdown seemed normal,
apart from the bunch of messages like:

crmd: [13794]: ERROR: verify_stopped: Resource vm-web5 was active at shutdown. You may ignore this error if it is unmanaged.

appearing twice and warnings like:

cib: [7447]: WARN: send_ipc_message: IPC Channel to 13794 is not connected
cib: [7447]: WARN: send_via_callback_channel: Delivery of reply to client 13794/bf6f43a2-70db-40ac-a902-eabc3c12e20d failed
cib: [7447]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
corosync[6966]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)

On reboot, corosync complained until the some Pacemaker components
started:

corosync[8461]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2)
corosync[8461]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)

Pacemaker then probed the resources on the local node (all was inactive):

lrmd: [8946]: info: rsc:stonith-n01 probe[5] (pid 9081)
lrmd: [8946]: info: rsc:dlm:0 probe[6] (pid 9082)
[...]
lrmd: [8946]: info: operation monitor[112] on vm-fir for client 8949: pid 12015 exited with return code 7
crmd: [8949]: info: process_lrm_event: LRM operation vm-fir_monitor_0 (call=112, rc=7, cib-update=130, confirmed=true) not running
attrd: [8947]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
attrd: [8947]: notice: attrd_perform_update: Sent update 4: probe_complete=true

Then I cleaned up some resources running on other nodes, which resulted
in those showing up in the crm status output providing log lines like eg.:

crmd: [8949]: WARN: status_from_rc: Action 4 (vm-web5_monitor_0) on n02 failed (target: 7 vs. rc: 0): Error

Finally, I exited maintenance mode, and Pacemaker started every resource
I did not clean up beforehand, concurrently with their already running
instances:

pengine: [8948]: notice: LogActions: Start vm-web9#011(n03)

I can provide more logs if this behavior is indeed unexpected, but it
looks more like I miss the exact concept of maintenance mode.
Post by Andrew Beekhof
The discovery usually happens at the point the cluster is started on a node.
A local discovery did happen, but it could not find anything, as the
cluster was started by the init scripts, well before any resource could
have been moved to the freshly rebooted node (manually, to free the next
node for rebooting).
Post by Andrew Beekhof
Maintenance mode just prevents the cluster from doing anything about it.
Fine. So I should have restarted Pacemaker on each node before leaving
maintenance mode, right? Or is there a better way? (Unfortunately, I
could not manage the rolling reboot through Pacemaker, as some DLM/cLVM
freeze made the cluster inoperable in its normal way.)
Post by Andrew Beekhof
Post by Ferenc Wagner
but it probably did not happen, as the cluster started up resources
already running on other nodes, which is generally forbidden. Given
that all resources were running (though possibly migrated during the
maintenance), what would have been the correct way of bringing the
cluster out of maintenance mode? This should have required no
resource actions at all. Would cleanup of all resources have helped?
Or is there a better way?
You say in the above thread that resource definitions can be changed:
http://thread.gmane.org/gmane.linux.highavailability.user/39121/focus=39437
Post by Andrew Beekhof
Post by Ferenc Wagner
I think it's a common misconception that you can modify cluster
No, you _should_ be able to. If that's not the case, its a bug.
So the end of maintenance mode starts with a "re-probe"?
No, but it doesn't need to.
The policy engine already knows if the resource definitions changed
and the recurring monitor ops will find out if any are not running.
My experiences show that you may not *move around* resources while in
maintenance mode. That would indeed require a cluster-wide re-probe,
which does not seem to happen (unless forced some way). Probably there
was some misunderstanding in the above discussion, I guess Ulrich meant
moving resources when he wrote "modifying cluster resources". Does this
make sense?
--
Thanks,
Feri.
--
Linux-cluster mailing list
Linux-***@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Andrew Beekhof
2014-08-27 04:54:33 UTC
Permalink
Post by Ferenc Wagner
Post by Andrew Beekhof
Post by Ferenc Wagner
While my Pacemaker cluster was in maintenance mode, resources were moved
(by hand) between the nodes as I rebooted each node in turn. In the end
the crm status output became perfectly empty, as the reboot of a given
node removed from the output the resources which were located on the
rebooted node at the time of entering maintenance mode. I expected full
resource discovery on exiting maintenance mode,
Version and logs?
(The more interesting part comes later, please skip to the theoretical
part if you're short on time. :)
I left those out, as I don't expect the actual behavior to be a bug.
But I experienced this with Pacemaker version 1.1.7. I know it's old
No kidding :)
Post by Ferenc Wagner
and it suffers from crmd segfault on entering maintenance mode (cf.
http://thread.gmane.org/gmane.linux.highavailability.user/39121), but
works well generally so I did not get to upgrade it yet. Now that I
mentioned the crmd segfault: I noted that it died on the DC when I
crmd: [7452]: info: te_rsc_command: Initiating action 64: cancel vm-tmvp_monitor_60000 on n01 (local)
crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
That looks like the lrmd died.
Post by Ferenc Wagner
crmd: [7452]: ERROR: get_lrm_resource: Could not add resource vm-tmvp to LRM
crmd: [7452]: ERROR: do_lrm_invoke: Invalid resource definition
crmd: [7452]: WARN: do_lrm_invoke: bad input <create_request_adv origin="te_rsc_command" t="crmd" version="3.0.6" subt="request" reference="lrm_invoke-tengine-1408517719-30820" crm_task="lrm_invoke" crm_sys_to="lrmd" crm_sys_from="tengine" crm_host_to="n01" >
crmd: [7452]: WARN: do_lrm_invoke: bad input <crm_xml >
crmd: [7452]: WARN: do_lrm_invoke: bad input <rsc_op id="64" operation="cancel" operation_key="vm-tmvp_monitor_60000" on_node="n01" on_node_uuid="n01" transition-key="64:20579:0:1b0a6e79-af5a-41e4-8ced-299371e7922c" >
crmd: [7452]: WARN: do_lrm_invoke: bad input <primitive id="vm-tmvp" long-id="vm-tmvp" class="ocf" provider="niif" type="TransientDomain" />
crmd: [7452]: info: te_rsc_command: Initiating action 86: cancel vm-wfweb_monitor_60000 on n01 (local)
crmd: [7452]: ERROR: lrm_add_rsc(870): failed to send a addrsc message to lrmd via ch_cmd channel.
crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
corosync[6966]: [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0x1dc6ea0, async-conn=0x1dc6ea0) left
pacemakerd: [7443]: WARN: Managed crmd process 7452 killed by signal 11 [SIGSEGV - Segmentation violation].
Which created a condition in the crmd that it couldn't handle so it crashed too.
Post by Ferenc Wagner
pacemakerd: [7443]: notice: pcmk_child_exit: Child process crmd terminated with signal 11 (pid=7452, rc=0)
However, it got restarted seamlessly, without the node being fenced, so
I did not even notice this until now. Should this have resulted in the
node being fenced?
Depends how fast the node can respawn.
Post by Ferenc Wagner
But back to the issue at hand. The Pacemaker shutdown seemed normal,
crmd: [13794]: ERROR: verify_stopped: Resource vm-web5 was active at shutdown. You may ignore this error if it is unmanaged.
In maintenance mode, everything is unmanaged. So that would be expected.
Post by Ferenc Wagner
cib: [7447]: WARN: send_ipc_message: IPC Channel to 13794 is not connected
cib: [7447]: WARN: send_via_callback_channel: Delivery of reply to client 13794/bf6f43a2-70db-40ac-a902-eabc3c12e20d failed
cib: [7447]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
corosync[6966]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
On reboot, corosync complained until the some Pacemaker components
corosync[8461]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2)
corosync[8461]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
lrmd: [8946]: info: rsc:stonith-n01 probe[5] (pid 9081)
lrmd: [8946]: info: rsc:dlm:0 probe[6] (pid 9082)
[...]
lrmd: [8946]: info: operation monitor[112] on vm-fir for client 8949: pid 12015 exited with return code 7
crmd: [8949]: info: process_lrm_event: LRM operation vm-fir_monitor_0 (call=112, rc=7, cib-update=130, confirmed=true) not running
attrd: [8947]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
attrd: [8947]: notice: attrd_perform_update: Sent update 4: probe_complete=true
Then I cleaned up some resources running on other nodes, which resulted
crmd: [8949]: WARN: status_from_rc: Action 4 (vm-web5_monitor_0) on n02 failed (target: 7 vs. rc: 0): Error
Finally, I exited maintenance mode, and Pacemaker started every resource
I did not clean up beforehand, concurrently with their already running
pengine: [8948]: notice: LogActions: Start vm-web9#011(n03)
I can provide more logs if this behavior is indeed unexpected, but it
looks more like I miss the exact concept of maintenance mode.
Post by Andrew Beekhof
The discovery usually happens at the point the cluster is started on a node.
A local discovery did happen, but it could not find anything, as the
cluster was started by the init scripts, well before any resource could
have been moved to the freshly rebooted node (manually, to free the next
node for rebooting).
Thats your problem then, you've started resources outside of the control of the cluster.
Two options... recurring monitor actions with role=Stopped would have caught this or you can run crm_resource --cleanup after you've moved resources around.
Post by Ferenc Wagner
Post by Andrew Beekhof
Maintenance mode just prevents the cluster from doing anything about it.
Fine. So I should have restarted Pacemaker on each node before leaving
maintenance mode, right? Or is there a better way?
See above
Post by Ferenc Wagner
(Unfortunately, I
could not manage the rolling reboot through Pacemaker, as some DLM/cLVM
freeze made the cluster inoperable in its normal way.)
Post by Andrew Beekhof
Post by Ferenc Wagner
but it probably did not happen, as the cluster started up resources
already running on other nodes, which is generally forbidden. Given
that all resources were running (though possibly migrated during the
maintenance), what would have been the correct way of bringing the
cluster out of maintenance mode? This should have required no
resource actions at all. Would cleanup of all resources have helped?
Or is there a better way?
http://thread.gmane.org/gmane.linux.highavailability.user/39121/focus=39437
Post by Andrew Beekhof
Post by Ferenc Wagner
I think it's a common misconception that you can modify cluster
No, you _should_ be able to. If that's not the case, its a bug.
So the end of maintenance mode starts with a "re-probe"?
No, but it doesn't need to.
The policy engine already knows if the resource definitions changed
and the recurring monitor ops will find out if any are not running.
My experiences show that you may not *move around* resources while in
maintenance mode.
Correct
Post by Ferenc Wagner
That would indeed require a cluster-wide re-probe,
which does not seem to happen (unless forced some way). Probably there
was some misunderstanding in the above discussion, I guess Ulrich meant
moving resources when he wrote "modifying cluster resources". Does this
make sense?
No, I've reasonably sure he meant changing their definitions in the cib.
Or at least thats what I thought he meant at the time.
Post by Ferenc Wagner
--
Thanks,
Feri.
--
Linux-cluster mailing list
https://www.redhat.com/mailman/listinfo/linux-cluster
Ferenc Wagner
2014-08-27 17:09:45 UTC
Permalink
Post by Andrew Beekhof
Post by Ferenc Wagner
Post by Andrew Beekhof
Post by Ferenc Wagner
While my Pacemaker cluster was in maintenance mode, resources were moved
(by hand) between the nodes as I rebooted each node in turn. In the end
the crm status output became perfectly empty, as the reboot of a given
node removed from the output the resources which were located on the
rebooted node at the time of entering maintenance mode. I expected full
resource discovery on exiting maintenance mode,
I experienced this with Pacemaker version 1.1.7. I know it's old
and it suffers from crmd segfault on entering maintenance mode (cf.
http://thread.gmane.org/gmane.linux.highavailability.user/39121), but
works well generally so I did not get to upgrade it yet. Now that I
mentioned the crmd segfault: I noted that it died on the DC when I
crmd: [7452]: info: te_rsc_command: Initiating action 64: cancel vm-tmvp_monitor_60000 on n01 (local)
crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
That looks like the lrmd died.
It did not die, at least not fully. After entering maintenance mode
crmd asked lrmd to cancel the recurring monitor ops for all resources:

08:40:18 crmd: [7452]: info: do_te_invoke: Processing graph 20578 (ref=pe_calc-dc-1408516818-30681) derived from /var/lib/pengine/pe-input-848.bz2
08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 17: cancel dlm:0_monitor_120000 on n04
08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 84: cancel dlm:0_cancel_120000 on n01 (local)
08:40:18 lrmd: [7449]: info: cancel_op: operation monitor[194] on dlm:0 for client 7452, its parameters: [...] cancelled
08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 50: cancel dlm:2_monitor_120000 on n02

The stream of monitor op cancellation messages ended with:

08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 71: cancel vm-mdssq_monitor_60000 on n01 (local)
08:40:18 lrmd: [7449]: info: cancel_op: operation monitor[329] on vm-mdssq for client 7452, its parameters: [...] cancelled
08:40:18 crmd: [7452]: info: process_lrm_event: LRM operation vm-mdssq_monitor_60000 (call=329, status=1, cib-update=0, confirmed=true) Cancelled
08:40:18 crmd: [7452]: notice: run_graph: ==== Transition 20578 (Complete=87, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-848.bz2): Complete
08:40:18 crmd: [7452]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
08:40:18 pengine: [7451]: notice: process_pe_message: Transition 20578: PEngine Input stored in: /var/lib/pengine/pe-input-848.bz2
08:41:28 crmd: [7452]: WARN: action_timer_callback: Timer popped (timeout=10000, abort_level=0, complete=true)
08:41:28 crmd: [7452]: WARN: action_timer_callback: Ignoring timeout while not in transition
[these two lines repeated several times]
08:41:28 crmd: [7452]: WARN: action_timer_callback: Timer popped (timeout=10000, abort_level=0, complete=true)
08:41:28 crmd: [7452]: WARN: action_timer_callback: Ignoring timeout while not in transition
08:41:38 crmd: [7452]: WARN: action_timer_callback: Timer popped (timeout=20000, abort_level=0, complete=true)
08:41:38 crmd: [7452]: WARN: action_timer_callback: Ignoring timeout while not in transition
08:48:05 cib: [7447]: info: cib_stats: Processed 159 operations (23207.00us average, 0% utilization) in the last 10min
08:55:18 crmd: [7452]: info: crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped (900000ms)
08:55:18 crmd: [7452]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
08:55:18 crmd: [7452]: info: do_state_transition: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
08:55:19 pengine: [7451]: notice: stage6: Delaying fencing operations until there are resources to manage
08:55:19 crmd: [7452]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
08:55:19 crmd: [7452]: info: do_te_invoke: Processing graph 20579 (ref=pe_calc-dc-1408517718-30802) derived from /var/lib/pengine/pe-input-849.bz2
08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 17: cancel dlm:0_monitor_120000 on n04
08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 84: cancel dlm:0_cancel_120000 on n01 (local)
08:55:19 crmd: [7452]: info: cancel_op: No pending op found for dlm:0:194
08:55:19 lrmd: [7449]: info: on_msg_cancel_op: no operation with id 194

Interestingly, monitor[194], lastly mentioned by lrmd, was the very
first cancelled operation.

08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 50: cancel dlm:2_monitor_120000 on n02
08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 83: cancel vm-cedar_monitor_60000 on n01 (local)
08:55:19 crmd: [7452]: ERROR: lrm_get_rsc(673): failed to receive a reply message of getrsc.
08:55:19 crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
08:55:19 crmd: [7452]: ERROR: lrm_add_rsc(870): failed to send a addrsc message to lrmd via ch_cmd channel.
08:55:19 crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
08:55:19 crmd: [7452]: ERROR: get_lrm_resource: Could not add resource vm-cedar to LRM
08:55:19 crmd: [7452]: ERROR: do_lrm_invoke: Invalid resource definition
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <create_request_adv origin="te_rsc_command" t="crmd" version="3.0.6" subt="request" reference="lrm_invoke-tengine-1408517719-30807" crm_task="lrm_invoke" crm_sys_to="lrmd" crm_sys_from="tengine" crm_host_to="n01" >
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <crm_xml >
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <rsc_op id="83" operation="cancel" operation_key="vm-cedar_monitor_60000" on_node="n01" on_node_uuid="n01" transition-key="83:20579:0:1b0a6e79-af5a-41e4-8ced-299371e7922c" >
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <primitive id="vm-cedar" long-id="vm-cedar" class="ocf" provider="niif" type="TransientDomain" />
08:55:19 crmd: [7452]: ERROR: log_data_element: Output truncated: available=727, needed=1374
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <attributes CRM_meta_call_id="195" [really very long]
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input </rsc_op>
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input </crm_xml>
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input </create_request_adv>

Blocks of messages like the above repeat a couple of times for other
resources, then crmd kicks the bucket and gets restarted:

08:55:19 corosync[6966]: [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0x1dc6ea0, async-conn=0x1dc6ea0) left
08:55:19 pacemakerd: [7443]: WARN: Managed crmd process 7452 killed by signal 11 [SIGSEGV - Segmentation violation].
08:55:19 pacemakerd: [7443]: notice: pcmk_child_exit: Child process crmd terminated with signal 11 (pid=7452, rc=0)
08:55:19 pacemakerd: [7443]: notice: pcmk_child_exit: Respawning failed child process: crmd
08:55:19 pacemakerd: [7443]: info: start_child: Forked child 13794 for process crmd
08:55:19 corosync[6966]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
08:55:19 crmd: [13794]: info: Invoked: /usr/lib/pacemaker/crmd

Anyway, no further logs from lrmd after this point until hours later I
rebooted the machine:

14:37:06 pacemakerd: [7443]: notice: stop_child: Stopping lrmd: Sent -15 to process 7449
14:37:06 lrmd: [7449]: info: lrmd is shutting down
14:37:06 pacemakerd: [7443]: info: pcmk_child_exit: Child process lrmd exited (pid=7449, rc=0)

So lrmd was alive all the time.
Post by Andrew Beekhof
Which created a condition in the crmd that it couldn't handle so it crashed too.
Maybe their connection got severed somehow.
Post by Andrew Beekhof
Post by Ferenc Wagner
However, it got restarted seamlessly, without the node being fenced, so
I did not even notice this until now. Should this have resulted in the
node being fenced?
Depends how fast the node can respawn.
You mean how fast crmd can respawn? How much time does it have to
respawn to avoid being fenced?
Post by Andrew Beekhof
Post by Ferenc Wagner
crmd: [13794]: ERROR: verify_stopped: Resource vm-web5 was active at shutdown. You may ignore this error if it is unmanaged.
In maintenance mode, everything is unmanaged. So that would be expected.
Is maintenance mode the same as unmanaging all resources? I think the
latter does not cancel the monitor operations here...
Post by Andrew Beekhof
Post by Ferenc Wagner
Post by Andrew Beekhof
The discovery usually happens at the point the cluster is started on a node.
A local discovery did happen, but it could not find anything, as the
cluster was started by the init scripts, well before any resource could
have been moved to the freshly rebooted node (manually, to free the next
node for rebooting).
Thats your problem then, you've started resources outside of the control of the cluster.
Some of them, yes, and moved the rest between the nodes. All this
circumventing the cluster.
Post by Andrew Beekhof
Two options... recurring monitor actions with role=Stopped would have
caught this
Even in maintenance mode? Wouldn't they have been cancelled just like
the ordinary recurring monitor actions?

I guess adding them would run a recurring monitor operation for every
resource on every node, only with different expectations, right?
Post by Andrew Beekhof
or you can run crm_resource --cleanup after you've moved resources around.
I actually ran some crm resource cleanups for a couple of resources, and
those really were not started on exiting maintenance mode.
Post by Andrew Beekhof
Post by Ferenc Wagner
Post by Andrew Beekhof
Maintenance mode just prevents the cluster from doing anything about it.
Fine. So I should have restarted Pacemaker on each node before leaving
maintenance mode, right? Or is there a better way?
See above
So crm_resource -r whatever -C is the way, for each resource separately.
Is there no way to do this for all resources at once?
Post by Andrew Beekhof
Post by Ferenc Wagner
http://thread.gmane.org/gmane.linux.highavailability.user/39121/focus=39437
Post by Andrew Beekhof
Post by Ferenc Wagner
I think it's a common misconception that you can modify cluster
No, you _should_ be able to. If that's not the case, its a bug.
So the end of maintenance mode starts with a "re-probe"?
No, but it doesn't need to.
The policy engine already knows if the resource definitions changed
and the recurring monitor ops will find out if any are not running.
My experiences show that you may not *move around* resources while in
maintenance mode.
Correct
Post by Ferenc Wagner
That would indeed require a cluster-wide re-probe, which does not
seem to happen (unless forced some way). Probably there was some
misunderstanding in the above discussion, I guess Ulrich meant moving
resources when he wrote "modifying cluster resources". Does this
make sense?
No, I've reasonably sure he meant changing their definitions in the cib.
Or at least thats what I thought he meant at the time.
Nobody could blame you for that, because that's what it means. But then
he inquired about a "re-probe", which fits more the problem of changing
the status of resources, not their definition. Actually, I was so
firmly stuck in this mind set, that first I wanted to ask you to
reconsider, your response felt so much out of place. That's all about
history for now...

After all this, I suggest to clarify this issue in the fine manual.
I've read it a couple of times, and still got the wrong impression.
--
Regards,
Feri.
--
Linux-cluster mailing list
Linux-***@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Andrew Beekhof
2014-08-27 22:57:26 UTC
Permalink
Post by Ferenc Wagner
Post by Andrew Beekhof
Post by Ferenc Wagner
However, it got restarted seamlessly, without the node being fenced, so
I did not even notice this until now. Should this have resulted in the
node being fenced?
Depends how fast the node can respawn.
You mean how fast crmd can respawn? How much time does it have to
respawn to avoid being fenced?
Until a new node can be elected DC, invoke the policy engine and start fencing.
Post by Ferenc Wagner
Post by Andrew Beekhof
Post by Ferenc Wagner
crmd: [13794]: ERROR: verify_stopped: Resource vm-web5 was active at shutdown. You may ignore this error if it is unmanaged.
In maintenance mode, everything is unmanaged. So that would be expected.
Is maintenance mode the same as unmanaging all resources? I think the
latter does not cancel the monitor operations here...
Right. One cancels monitor operations too.
Post by Ferenc Wagner
Post by Andrew Beekhof
Post by Ferenc Wagner
Post by Andrew Beekhof
The discovery usually happens at the point the cluster is started on a node.
A local discovery did happen, but it could not find anything, as the
cluster was started by the init scripts, well before any resource could
have been moved to the freshly rebooted node (manually, to free the next
node for rebooting).
Thats your problem then, you've started resources outside of the control of the cluster.
Some of them, yes, and moved the rest between the nodes. All this
circumventing the cluster.
Post by Andrew Beekhof
Two options... recurring monitor actions with role=Stopped would have
caught this
Even in maintenance mode? Wouldn't they have been cancelled just like
the ordinary recurring monitor actions?
Good point. Perhaps they wouldn't.
Post by Ferenc Wagner
I guess adding them would run a recurring monitor operation for every
resource on every node, only with different expectations, right?
Post by Andrew Beekhof
or you can run crm_resource --cleanup after you've moved resources around.
I actually ran some crm resource cleanups for a couple of resources, and
those really were not started on exiting maintenance mode.
Post by Andrew Beekhof
Post by Ferenc Wagner
Post by Andrew Beekhof
Maintenance mode just prevents the cluster from doing anything about it.
Fine. So I should have restarted Pacemaker on each node before leaving
maintenance mode, right? Or is there a better way?
See above
So crm_resource -r whatever -C is the way, for each resource separately.
Is there no way to do this for all resources at once?
I think you can just drop the -r
Post by Ferenc Wagner
Post by Andrew Beekhof
Post by Ferenc Wagner
http://thread.gmane.org/gmane.linux.highavailability.user/39121/focus=39437
Post by Andrew Beekhof
Post by Ferenc Wagner
I think it's a common misconception that you can modify cluster
No, you _should_ be able to. If that's not the case, its a bug.
So the end of maintenance mode starts with a "re-probe"?
No, but it doesn't need to.
The policy engine already knows if the resource definitions changed
and the recurring monitor ops will find out if any are not running.
My experiences show that you may not *move around* resources while in
maintenance mode.
Correct
Post by Ferenc Wagner
That would indeed require a cluster-wide re-probe, which does not
seem to happen (unless forced some way). Probably there was some
misunderstanding in the above discussion, I guess Ulrich meant moving
resources when he wrote "modifying cluster resources". Does this
make sense?
No, I've reasonably sure he meant changing their definitions in the cib.
Or at least thats what I thought he meant at the time.
Nobody could blame you for that, because that's what it means. But then
he inquired about a "re-probe", which fits more the problem of changing
the status of resources, not their definition. Actually, I was so
firmly stuck in this mind set, that first I wanted to ask you to
reconsider, your response felt so much out of place. That's all about
history for now...
After all this, I suggest to clarify this issue in the fine manual.
I've read it a couple of times, and still got the wrong impression.
Which specific section do you suggest?
Ferenc Wagner
2014-08-29 00:54:57 UTC
Permalink
Post by Andrew Beekhof
Post by Ferenc Wagner
So crm_resource -r whatever -C is the way, for each resource separately.
Is there no way to do this for all resources at once?
I think you can just drop the -r
Unfortunately, that does not work under version 1.1.7:

$ sudo crm_resource -C
Error performing operation: The object/attribute does not exist
Post by Andrew Beekhof
Post by Ferenc Wagner
Post by Andrew Beekhof
Post by Ferenc Wagner
My experiences show that you may not *move around* resources while in
maintenance mode.
Correct
Post by Ferenc Wagner
That would indeed require a cluster-wide re-probe, which does not
seem to happen (unless forced some way).
After all this, I suggest to clarify this issue in the fine manual.
I've read it a couple of times, and still got the wrong impression.
Which specific section do you suggest?
5.7.1. Monitoring Resources for Failure

Some points worth adding/emphasizing would be:
1. documentation of the role property (role=Master is mentioned later,
but role=Stopped never)
2. In maintenance mode, monitor operations don't run
3. If management of a resource is switched off, its role=Started monitor
operation continues running until failure, then the role=Stopped
kicks in (I'm guessing here; also, what about the other nodes?)
4. When management is enabled again, no re-probe happens, the cluster
expects the last state and location to be still valid
5. so don't even move unmanaged resources
6. unless you started a resource somewhere before starting the cluster
on that node, or you cleaned up the resource
7. same is true for maintenance mode, but for all resources.

I have to agree that most of this is evident once you know it.
Unfortunately, it's also easy to get wrong while learning the ropes.
For example, hastexo has some good information online:
http://www.hastexo.com/resources/hints-and-kinks/maintenance-active-pacemaker-clusters
But from the sentence "in maintenance mode, you can stop or restart
cluster resources at will" I still miss the constraint of not moving the
resource between the nodes. Also, setting enabled="false" works funny,
it did not get rid of the monitor operation before I set the resource to
managed, and deleting the setting or changing it to true did bring it
back. I had to restart the resource to have monitor ops again. Why?
--
Thanks,
Feri.
--
Linux-cluster mailing list
Linux-***@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Andrew Beekhof
2014-08-29 02:32:50 UTC
Permalink
Post by Ferenc Wagner
Post by Andrew Beekhof
Post by Ferenc Wagner
So crm_resource -r whatever -C is the way, for each resource separately.
Is there no way to do this for all resources at once?
I think you can just drop the -r
You know what I'm going to say here right?
Post by Ferenc Wagner
$ sudo crm_resource -C
Error performing operation: The object/attribute does not exist
Post by Andrew Beekhof
Post by Ferenc Wagner
Post by Andrew Beekhof
Post by Ferenc Wagner
My experiences show that you may not *move around* resources while in
maintenance mode.
Correct
Post by Ferenc Wagner
That would indeed require a cluster-wide re-probe, which does not
seem to happen (unless forced some way).
After all this, I suggest to clarify this issue in the fine manual.
I've read it a couple of times, and still got the wrong impression.
Which specific section do you suggest?
5.7.1. Monitoring Resources for Failure
Ok, I'll endeavour to improve that section :)
Post by Ferenc Wagner
1. documentation of the role property (role=Master is mentioned later,
but role=Stopped never)
2. In maintenance mode, monitor operations don't run
3. If management of a resource is switched off, its role=Started monitor
operation continues running until failure, then the role=Stopped
kicks in (I'm guessing here; also, what about the other nodes?)
4. When management is enabled again, no re-probe happens, the cluster
expects the last state and location to be still valid
5. so don't even move unmanaged resources
6. unless you started a resource somewhere before starting the cluster
on that node, or you cleaned up the resource
7. same is true for maintenance mode, but for all resources.
I have to agree that most of this is evident once you know it.
Unfortunately, it's also easy to get wrong while learning the ropes.
http://www.hastexo.com/resources/hints-and-kinks/maintenance-active-pacemaker-clusters
But from the sentence "in maintenance mode, you can stop or restart
cluster resources at will" I still miss the constraint of not moving the
resource between the nodes. Also, setting enabled="false" works funny,
it did not get rid of the monitor operation before I set the resource to
managed, and deleting the setting or changing it to true did bring it
back. I had to restart the resource to have monitor ops again. Why?
--
Thanks,
Feri.
--
Linux-cluster mailing list
https://www.redhat.com/mailman/listinfo/linux-cluster
Loading...