VCS event triggers let you invoke user-defined scripts for specified events in a cluster. Triggers can be broadly categories into 2 categories:
1. Internal triggers: Internal triggers are non-configurable and always enabled. These triggers reside in $VCS_HOME/bin/internal_triggers directory. By default, $VCS_HOME = /opt/VRTSvcs/
# ls -l /opt/VRTSvcs/bin/internal_triggers/
-rwxr-x--- 1 root root 2301 Oct 17 2014 cpuusage
-rwxr-x--- 1 root root 2343 Oct 17 2014 dump_tunables
-rwxr-x--- 1 root root 2349 Oct 17 2014 globalcounter_not_updated
-rwxr-x--- 1 root root 7551 Oct 17 2014 violation
2. Custom triggers: Custom triggers are configurable at different level. With installation, VCS provides a sample Perl script for each event trigger in $VCS_HOME/bin/sample_triggers/VRTSvcs directory.
# ls -l /opt/VRTSvcs/bin/sample_triggers/VRTSvcs/
-rwxr--r-- 1 root root 2471 Oct 17 2014 cpuusage
-rwxr--r-- 1 root root 3026 Oct 17 2014 injeopardy
-rwxr--r-- 1 root root 2836 Oct 17 2014 loadwarning
-rwxr--r-- 1 root root 2458 Oct 17 2014 nofailover
-rwxr--r-- 1 root root 2496 Oct 17 2014 postoffline
-rwxr--r-- 1 root root 2483 Oct 17 2014 postonline
-rwxr--r-- 1 root root 3324 Oct 17 2014 postonline_rhev
-rwxr--r-- 1 root root 5109 Oct 17 2014 preonline
-rwxr----- 1 root root 10099 Oct 17 2014 preonline_ipc
-rwxr--r-- 1 root root 2841 Oct 17 2014 preonline_rhev
-rwxr----- 1 root root 5377 Oct 17 2014 preonline_vvr
-rwxr--r-- 1 root root 2865 Oct 17 2014 resadminwait
-rwxr--r-- 1 root root 2605 Oct 17 2014 resfault
-rwxr--r-- 1 root root 2744 Oct 17 2014 resnotoff
-rwxr--r-- 1 root root 3264 Oct 17 2014 resrestart
-rwxr--r-- 1 root root 3226 Oct 17 2014 resstatechange
-rwxr--r-- 1 root root 2605 Oct 17 2014 sysjoin
-rwxr--r-- 1 root root 2846 Oct 17 2014 sysoffline
-rwxr--r-- 1 root root 2592 Oct 17 2014 sysup
-rwxr--r-- 1 root root 2690 Oct 17 2014 unable_to_restart_agent
-rwxr--r-- 1 root root 4037 Oct 17 2014 unable_to_restart_had
You can tailor these sample triggers for customized actions according to your requirements. You may choose to write your own Perl scripts. Some custom triggers are configurable(e.g. preonline) while some are non-configurable(e.g. injeopardy). Move the modified trigger script to $VCS_HOME/bin/triggers on each node. To enable non-configurable custom triggers, place the script in $VCS_HOME/bin/triggers directory. To disable non-configurable custom triggers, remove the files associated with the trigger from the $VCS_HOME/bin/triggers directory. For configurable custom triggers, configure other attributes(e.g. TriggersEnabled) that may be required to enable the triggers.
It is advised to not put customized trigger scripts in the $VCS_HOME/bin/sample_triggers/VRTSvcs directory or in the $VCS_HOME/bin/internal_triggers directory. If you install customized triggers in these directories, you might face issues while upgrading VCS.
How triggers are enabled?
TriggersEnabled attribute is used to enable/disable by triggers. Triggers are disabled by default. You can enable specific triggers on all nodes or only on selected nodes. This attribute is available on Resource level and Service group level too. For Resource level TriggersEnabled attribute, valid values are RESFAULT, RESNOTOFF, RESSTATECHANGE, RESRESTART, and RESADMINWAIT. For Service Group level TriggersEnabled attribute, valid values are VIOLATION, NOFAILOVER, PREONLINE, POSTONLINE, POSTOFFLINE, RESFAULT, RESSTATECHANGE, and RESRESTART. This attribute is a string keylist. As same attribute is used on Resource and Service Group level, steps for enabling/disabling are similar for Resource/Service Group.
Enabling triggers using CLI
# hares -modify test_res TriggersEnabled RESFAULT RESNOTOFF RESSTATECHANGE RESADMINWAIT
# hares -display test_res -attribute TriggersEnabled
#Resource Attribute System Value
test_res TriggersEnabled localclus RESFAULT RESNOTOFF RESSTATECHANGE RESADMINWAIT
# hagrp -modify test_sg TriggersEnabled PREONLINE POSTONLINE POSTOFFLINE
# hagrp -display test_sg -attribute TriggersEnabled
#Group Attribute System Value
test_sg TriggersEnabled localclus PREONLINE POSTONLINE POSTOFFLINE
Enabling triggers using main.cf
Application test_res (
.
TriggersEnabled = { RESFAULT, RESNOTOFF, RESSTATECHANGE, RESADMINWAIT }
.
)
group test_sg (
.
TriggersEnabled = { PREONLINE, POSTONLINE, POSTOFFLINE }
.
)
Custom trigger location
If a trigger is enabled but the trigger path is "" (default), VCS invokes the trigger from the $VCS_HOME/bin/triggers directory. You can also relocate this triggers and update TriggerPath accordingly. If you specify an alternate directory, VCS invokes the trigger from that path.
How triggers are invoked?
Triggers are executed by hatrigger script located at $VCS_HOME/bin/hatrigger. VCS determines if the event is enabled and invokes the hatrigger script, and also passes the name of the event trigger and associated parameters.
E.g. Preonline trigger in invoked before bringing a service group online. For executing preonline trigger, VCS invokes following command:
hatrigger preonline system service_group whyonlining [system_where_group_faulted]
Arguments’ details is also available in sample trigger. E.g. snippet from sample preonline script.
# Usage:
# preonline <system> <group> <whyonlining> <systemwheregroupfaulted>
#
# <system>: is the name of the system where group is to be onlined.
# <group>: is the name of the group that is to be onlined.
# <whyonlining>: is "SYSFAULT" or "FAULT" or "MANUAL".
# "SYSFAULT" corresponds to failover when system is faulted;
# "FAULT" corresponds to failover;
# "MANUAL" corresponds to manual online and switch;
# <systemwheregroupfaulted>: When preonline is invoked due to failover
# this argument is the name of the system where group
# was online before.
# When preonline is invoked due to group online
# command issued with -checkpartial option,
# this argument is the name of system specified
# for this option.
#
You can utilize the arguments passed for customizing actions. VCS does not wait for the trigger to complete execution. VCS calls the trigger and continues normal operation.
List of Internal event triggers
violation trigger
This trigger is invoked only on the system that caused the concurrency violation. Specifically, it takes the service group offline on the system where the trigger was invoked. Note that this trigger applies to failover groups only. The default trigger takes the service group offline on the system that caused the concurrency violation.
Arguments:
system — represents the name of the system.
service_group — represents the name of the service group that was fully or partially online.
dumptunables trigger
The dumptunables trigger is invoked when HAD goes into the RUNNING state. When this trigger is invoked, it uses the HAD environment variables that it inherited, and other environment variables to process the event. Depending on the value of the to_log parameter, the trigger then redirects the environment variables to either stdout or the engine log. This trigger is not invoked when HAD is restarted by hashadow.
Arguments:
system—represents the name of the system on which the trigger is invoked.
to_log—represents whether the output is redirected to engine log (to_log=1) or stdout (to_log=0).
globalcounter_not_updated trigger
On the system having lowest NodeId in the cluster, VCS periodically broadcasts an update of GlobalCounter. If a node does not receive the broadcast for an interval greater than CounterMissTolerance, it invokes the globalcounter_not_updated trigger if CounterMissAction is set to Trigger. This event is considered critical since it indicates a problem with underlying cluster communications or cluster interconnects. Use this trigger to notify administrators of the critical events.
Arguments:
system—represents the system which did not receive the update of GlobalCounter.
global_counter—represents the value of GlobalCounter.
cpuusage trigger
Invoked when CPU Usage of the system exceeds the ActionThreshold for a continuous time of ActionTimeLimit. cpuusage is invoked on the node for which CPU Usage has exceeded. If you want this trigger to be turned off specify Action = "NONE". Please refer to System level attribute CPUUsageMonitoring for details of ActionThreshold, ActionTimeLimit, Action.
Arguments:
system - is the name of the system where CPU Usage exceeded.
cpuusage - is the CPU percentage utilization of the system.
List of Custom event triggers
injeopardy trigger
Invoked when a system is in jeopardy. Specifically, this trigger is invoked when a system has only one remaining link to the cluster, and that link is a network link (LLT). This event is considered critical because if the system loses the remaining network link, VCS does not fail over the service groups that were online on the system. Use this trigger to notify the administrator of the critical event. The administrator can then take appropriate action to ensure that the system has at least two links to the cluster. This event trigger is non-configurable.
Arguments:
system — represents the name of the system.
system_state — represents the value of the State attribute.
loadwarning trigger
Invoked when a system becomes overloaded because the load of the system’s online groups exceeds the system’s LoadWarningLevel attribute for an interval exceeding the LoadTimeThreshold attribute. Use this trigger to notify the administrator of the critical event. The administrator can then switch some service groups to another system, ensuring that no one system is overloaded.
Arguments:
system — represents the name of the system.
available_capacity — represents the system’s AvailableCapacity attribute. (AvailableCapacity = Capacity - sum of Load for system’s online groups.)
nofailover trigger
Invoked from the lowest-numbered system in RUNNING state when a service group cannot fail over.
Arguments:
system — represents the name of the last system on which an attempt was made to bring the service group online.
service_group — represents the name of the service group.
postoffline trigger
Invoked on the system where the group went offline from a partial or fully online state. This trigger is invoked when the group faults, or is taken offline manually.
Arguments:
system — represents the name of the system.
service_group — represents the name of the service group that went offline.
preonline trigger
Invoked before bringing a service group online. If the trigger does not exist OR script returns 0 without an exit code, VCS continues to bring the group online. To enable the trigger, set the PreOnline attribute in the service group definition to 1(and vice versa to disable the trigger). You can set a local (per-system) value for the attribute to control behavior on each node in the cluster.
Arguments:
system — represents the name of the system.
service_group — represents the name of the service group on which the was attempted online.
whyonlining — represents three values:
FAULT: Indicates that the group was brought online in response to a group failover.
MANUAL: Indicates that group was brought online or switched manually on the system that is represented by the variable system.
SYSFAULT: Indicates that the group was brought online in response to a sytem fault.
system_where_group_faulted — represents the name of the system on which the group has faulted or switched. This variable is optional and set when the engine invokes the trigger during a failover or switch.
resadminwait trigger
Invoked when a resource enters ADMIN_WAIT state. When VCS sets a resource in the ADMIN_WAIT state, it invokes the
resadminwait trigger according to the reason the resource entered the state.
Arguments:
system—represents the name of the system.
resource—represents the name of the faulted resource.
adminwait_reason—represents the reason the resource entered the ADMIN_WAIT state. Values range from 0-5:
0 = The offline function did not complete within the expected time.
1 = The offline function was ineffective.
2 = The online function did not complete within the expected time.
3 = The online function was ineffective.
4 = The resource was taken offline unexpectedly.
5 = The monitor function consistently failed to complete within the expected time.
resfault trigger
Invoked on the system where a resource has faulted. Note that when a resource is faulted, resources within the upward path of the faulted resource are also brought down.
Arguments:
system — represents the name of the system.
resource — represents the name of the faulted resource.
previous_state — represents the resource’s previous state.
resnotoff trigger
Invoked on the system if a resource in a service group does not go offline even after issuing the offline command to the resource.
Arguments:
system — represents the system on which the resource is not going offline.
resource — represents the name of the resource.
resrestart trigger
This trigger is invoked when a resource is restarted by an agent because resource faulted and RestartLimit was greater than 0.
Arguments:
system—represents the name of the system.
resource—represents the name of the resource.
resstatechange trigger
This trigger is invoked under the following conditions:
Resource goes from OFFLINE to ONLINE.
Resource goes from ONLINE to OFFLINE.
Resource goes from ONLINE to FAULTED.
Resource goes from FAULTED to OFFLINE. (When fault is cleared on non-persistent resource.)
Resource goes from FAULTED to ONLINE. (When faulted persistent resource goes online or faulted non-persistent resource is brought online outside VCS control.)
Arguments:
system — represents the name of the system.
resource — represents the name of the resource.
previous_state — represents the resource’s previous state.
new_state — represents the resource’s new state.
sysoffline trigger
Invoked from the lowest-numbered system in RUNNING state when a system leaves the cluster.
Arguments:
system — represents the name of the system.
system_state — represents the value of the State attribute.
sysup trigger
The sysup trigger is invoked when the first node joins the cluster.
Arguments:
system — represents the system name.
systemstate — represents the state of the system.
sysjoin trigger
The sysjoin trigger is invoked when a peer node joins the cluster.
Arguments:
system — represents the system name.
systemstate — represents the state of the system.
unable_to_restart_agent event trigger
This trigger is invoked when an agent faults more than a predetermined number of times within an hour. When this occurs, VCS gives up trying to restart the agent. VCS invokes this trigger on the node where the agent faults. You can use this trigger to notify the administrators that an agent has faulted, and that VCS is unable to restart the agent. The administrator can then take corrective action.
Arguments:
system — represents the name of the system.
resource_type — represents the resource type associated with the agent.
unable_to_restart_had trigger
Invoked by hashadow when hashadow cannot restart HAD on a system. If HAD fails to restart after six attempts, hashadow invokes the trigger on the system. The default behavior of the trigger is to reboot the system. However, you can customize it as per your requirement. This event trigger is non-configurable and has no arguments.