Post by Andrew BeekhofPost by Ferenc WagnerPost by Andrew BeekhofPost by Ferenc WagnerWhile my Pacemaker cluster was in maintenance mode, resources were moved
(by hand) between the nodes as I rebooted each node in turn. In the end
the crm status output became perfectly empty, as the reboot of a given
node removed from the output the resources which were located on the
rebooted node at the time of entering maintenance mode. I expected full
resource discovery on exiting maintenance mode,
I experienced this with Pacemaker version 1.1.7. I know it's old
and it suffers from crmd segfault on entering maintenance mode (cf.
http://thread.gmane.org/gmane.linux.highavailability.user/39121), but
works well generally so I did not get to upgrade it yet. Now that I
mentioned the crmd segfault: I noted that it died on the DC when I
crmd: [7452]: info: te_rsc_command: Initiating action 64: cancel vm-tmvp_monitor_60000 on n01 (local)
crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
That looks like the lrmd died.
It did not die, at least not fully. After entering maintenance mode
crmd asked lrmd to cancel the recurring monitor ops for all resources:
08:40:18 crmd: [7452]: info: do_te_invoke: Processing graph 20578 (ref=pe_calc-dc-1408516818-30681) derived from /var/lib/pengine/pe-input-848.bz2
08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 17: cancel dlm:0_monitor_120000 on n04
08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 84: cancel dlm:0_cancel_120000 on n01 (local)
08:40:18 lrmd: [7449]: info: cancel_op: operation monitor[194] on dlm:0 for client 7452, its parameters: [...] cancelled
08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 50: cancel dlm:2_monitor_120000 on n02
The stream of monitor op cancellation messages ended with:
08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 71: cancel vm-mdssq_monitor_60000 on n01 (local)
08:40:18 lrmd: [7449]: info: cancel_op: operation monitor[329] on vm-mdssq for client 7452, its parameters: [...] cancelled
08:40:18 crmd: [7452]: info: process_lrm_event: LRM operation vm-mdssq_monitor_60000 (call=329, status=1, cib-update=0, confirmed=true) Cancelled
08:40:18 crmd: [7452]: notice: run_graph: ==== Transition 20578 (Complete=87, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-848.bz2): Complete
08:40:18 crmd: [7452]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
08:40:18 pengine: [7451]: notice: process_pe_message: Transition 20578: PEngine Input stored in: /var/lib/pengine/pe-input-848.bz2
08:41:28 crmd: [7452]: WARN: action_timer_callback: Timer popped (timeout=10000, abort_level=0, complete=true)
08:41:28 crmd: [7452]: WARN: action_timer_callback: Ignoring timeout while not in transition
[these two lines repeated several times]
08:41:28 crmd: [7452]: WARN: action_timer_callback: Timer popped (timeout=10000, abort_level=0, complete=true)
08:41:28 crmd: [7452]: WARN: action_timer_callback: Ignoring timeout while not in transition
08:41:38 crmd: [7452]: WARN: action_timer_callback: Timer popped (timeout=20000, abort_level=0, complete=true)
08:41:38 crmd: [7452]: WARN: action_timer_callback: Ignoring timeout while not in transition
08:48:05 cib: [7447]: info: cib_stats: Processed 159 operations (23207.00us average, 0% utilization) in the last 10min
08:55:18 crmd: [7452]: info: crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped (900000ms)
08:55:18 crmd: [7452]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
08:55:18 crmd: [7452]: info: do_state_transition: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
08:55:19 pengine: [7451]: notice: stage6: Delaying fencing operations until there are resources to manage
08:55:19 crmd: [7452]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
08:55:19 crmd: [7452]: info: do_te_invoke: Processing graph 20579 (ref=pe_calc-dc-1408517718-30802) derived from /var/lib/pengine/pe-input-849.bz2
08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 17: cancel dlm:0_monitor_120000 on n04
08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 84: cancel dlm:0_cancel_120000 on n01 (local)
08:55:19 crmd: [7452]: info: cancel_op: No pending op found for dlm:0:194
08:55:19 lrmd: [7449]: info: on_msg_cancel_op: no operation with id 194
Interestingly, monitor[194], lastly mentioned by lrmd, was the very
first cancelled operation.
08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 50: cancel dlm:2_monitor_120000 on n02
08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 83: cancel vm-cedar_monitor_60000 on n01 (local)
08:55:19 crmd: [7452]: ERROR: lrm_get_rsc(673): failed to receive a reply message of getrsc.
08:55:19 crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
08:55:19 crmd: [7452]: ERROR: lrm_add_rsc(870): failed to send a addrsc message to lrmd via ch_cmd channel.
08:55:19 crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
08:55:19 crmd: [7452]: ERROR: get_lrm_resource: Could not add resource vm-cedar to LRM
08:55:19 crmd: [7452]: ERROR: do_lrm_invoke: Invalid resource definition
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <create_request_adv origin="te_rsc_command" t="crmd" version="3.0.6" subt="request" reference="lrm_invoke-tengine-1408517719-30807" crm_task="lrm_invoke" crm_sys_to="lrmd" crm_sys_from="tengine" crm_host_to="n01" >
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <crm_xml >
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <rsc_op id="83" operation="cancel" operation_key="vm-cedar_monitor_60000" on_node="n01" on_node_uuid="n01" transition-key="83:20579:0:1b0a6e79-af5a-41e4-8ced-299371e7922c" >
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <primitive id="vm-cedar" long-id="vm-cedar" class="ocf" provider="niif" type="TransientDomain" />
08:55:19 crmd: [7452]: ERROR: log_data_element: Output truncated: available=727, needed=1374
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <attributes CRM_meta_call_id="195" [really very long]
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input </rsc_op>
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input </crm_xml>
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input </create_request_adv>
Blocks of messages like the above repeat a couple of times for other
resources, then crmd kicks the bucket and gets restarted:
08:55:19 corosync[6966]: [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0x1dc6ea0, async-conn=0x1dc6ea0) left
08:55:19 pacemakerd: [7443]: WARN: Managed crmd process 7452 killed by signal 11 [SIGSEGV - Segmentation violation].
08:55:19 pacemakerd: [7443]: notice: pcmk_child_exit: Child process crmd terminated with signal 11 (pid=7452, rc=0)
08:55:19 pacemakerd: [7443]: notice: pcmk_child_exit: Respawning failed child process: crmd
08:55:19 pacemakerd: [7443]: info: start_child: Forked child 13794 for process crmd
08:55:19 corosync[6966]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
08:55:19 crmd: [13794]: info: Invoked: /usr/lib/pacemaker/crmd
Anyway, no further logs from lrmd after this point until hours later I
rebooted the machine:
14:37:06 pacemakerd: [7443]: notice: stop_child: Stopping lrmd: Sent -15 to process 7449
14:37:06 lrmd: [7449]: info: lrmd is shutting down
14:37:06 pacemakerd: [7443]: info: pcmk_child_exit: Child process lrmd exited (pid=7449, rc=0)
So lrmd was alive all the time.
Post by Andrew BeekhofWhich created a condition in the crmd that it couldn't handle so it crashed too.
Maybe their connection got severed somehow.
Post by Andrew BeekhofPost by Ferenc WagnerHowever, it got restarted seamlessly, without the node being fenced, so
I did not even notice this until now. Should this have resulted in the
node being fenced?
Depends how fast the node can respawn.
You mean how fast crmd can respawn? How much time does it have to
respawn to avoid being fenced?
Post by Andrew BeekhofPost by Ferenc Wagnercrmd: [13794]: ERROR: verify_stopped: Resource vm-web5 was active at shutdown. You may ignore this error if it is unmanaged.
In maintenance mode, everything is unmanaged. So that would be expected.
Is maintenance mode the same as unmanaging all resources? I think the
latter does not cancel the monitor operations here...
Post by Andrew BeekhofPost by Ferenc WagnerPost by Andrew BeekhofThe discovery usually happens at the point the cluster is started on a node.
A local discovery did happen, but it could not find anything, as the
cluster was started by the init scripts, well before any resource could
have been moved to the freshly rebooted node (manually, to free the next
node for rebooting).
Thats your problem then, you've started resources outside of the control of the cluster.
Some of them, yes, and moved the rest between the nodes. All this
circumventing the cluster.
Post by Andrew BeekhofTwo options... recurring monitor actions with role=Stopped would have
caught this
Even in maintenance mode? Wouldn't they have been cancelled just like
the ordinary recurring monitor actions?
I guess adding them would run a recurring monitor operation for every
resource on every node, only with different expectations, right?
Post by Andrew Beekhofor you can run crm_resource --cleanup after you've moved resources around.
I actually ran some crm resource cleanups for a couple of resources, and
those really were not started on exiting maintenance mode.
Post by Andrew BeekhofPost by Ferenc WagnerPost by Andrew BeekhofMaintenance mode just prevents the cluster from doing anything about it.
Fine. So I should have restarted Pacemaker on each node before leaving
maintenance mode, right? Or is there a better way?
See above
So crm_resource -r whatever -C is the way, for each resource separately.
Is there no way to do this for all resources at once?
Post by Andrew BeekhofPost by Ferenc Wagnerhttp://thread.gmane.org/gmane.linux.highavailability.user/39121/focus=39437
Post by Andrew BeekhofPost by Ferenc WagnerI think it's a common misconception that you can modify cluster
No, you _should_ be able to. If that's not the case, its a bug.
So the end of maintenance mode starts with a "re-probe"?
No, but it doesn't need to.
The policy engine already knows if the resource definitions changed
and the recurring monitor ops will find out if any are not running.
My experiences show that you may not *move around* resources while in
maintenance mode.
Correct
Post by Ferenc WagnerThat would indeed require a cluster-wide re-probe, which does not
seem to happen (unless forced some way). Probably there was some
misunderstanding in the above discussion, I guess Ulrich meant moving
resources when he wrote "modifying cluster resources". Does this
make sense?
No, I've reasonably sure he meant changing their definitions in the cib.
Or at least thats what I thought he meant at the time.
Nobody could blame you for that, because that's what it means. But then
he inquired about a "re-probe", which fits more the problem of changing
the status of resources, not their definition. Actually, I was so
firmly stuck in this mind set, that first I wanted to ask you to
reconsider, your response felt so much out of place. That's all about
history for now...
After all this, I suggest to clarify this issue in the fine manual.
I've read it a couple of times, and still got the wrong impression.
--
Regards,
Feri.
--
Linux-cluster mailing list
Linux-***@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster