Install and Configure: Configuring NotificationLesson 10:
Configuring Notification
Transcript:
BRAD WILLER: Welcome to VCS 6.0 Install and Config, Configuring
Notification.
Author's Original Notes:
*
Lesson 3: Preparing a Site for VCS
Lesson 4: Installing VCS
Lesson 5: VCS Operations
Lesson 7: Preparing Services for VCS
Lesson 8: Online Configuration
Lesson 9: Offline Configuration
Lesson 10: Configuring Notification
Lesson 13: Cluster Communications
Lesson 15: Coordination Point Server
Lesson introduction
Author's Original Notes:
*
Configuring notification
Using triggers for notification
Transcript:
In this lesson, we're going to talk about what the notification
process is, we're going to see how we set it up, and lastly we're
going to briefly talk about a really cool feature of VCS called
triggers.
Author's Original Notes:
*
After completing this topic, you will be able to describe how VCS
provides notification.
Notification overview
Author's Original Notes:
*
Notification overview
HAD sends a message to the notifier daemon when an event
occurs.
The notifier daemon:
Formats the event message
Sends an SNMP trap or e-mail message (or both) to designated
recipients
Replicated message queue
HAD
HAD
notifier
SMTP
SNMP
Transcript:
As you probably have a decent idea now, since we've looked at the
engine log so many times, is every time something happens in VCS,
we're going to log things to the engine log, everything that
happens. Now, with each of those events, there's a severity level
associated with that. So, if something comes up where HAD needs to
notify someone, either by Email or by SNMP traps, then what it does
is actually lets the notifier daemon know that, hey, there's
something to do. The notifier is a daemon that's going to be
running in the - or an agent I guess you can refer to as in the
ClusterService service group. And it will format the message as
appropriate, and then send it, either to the Email server or to the
SNMP console depending on which it is. The queue of messages that
the notifier has to deal with are actually replicated across nodes
because again we want to make sure it's highly available. So, if
one node crashes, the ClusterService service group would failover
to another node. The notifier agent would then deal with the
notifications, the messages that are on that queue communicating
with that HAD.
Author's Original Notes:
>>>
Send an e-mail message to designated recipients.
>>>
*
The notifier daemon is managed by a NotifierMngr type
resource.
The ClusterService group:
Is configured using the CLI, GUI, or during initial
configuration
Is online on one cluster node only
ClusterService
HAD
HAD
notifier
csgnic
notifier
Transcript:
Alright, the service group as I've mentioned where this happens is
going to be ClusterService. So, you're going to have a resource
actually called notifier, and it's going to be of type,
NotifierMngr. And the actual daemon out there, if you do PSF-CF is
actually called notifier as well. It's actually not a strict
requirement that the notifier is in ClusterService service group,
but that's where we really want to see it. Because, usually this
gets set up at install time, and the installer is going to put it
in the ClusterService service group, as well as this is something
that really makes sense to be in that service group because it's
something directly related to HAD. Remember, don't ever put any of
your stuff in ClusterService because we bend all the rules with
this service group. For example, remember when we do an
hastop-all-force, and we force HAD down but leave all the other
applications running, i.e., all the other service groups?
ClusterService always goes down, because if HAD's not running,
there's no reason for ClusterService to be running because it's all
HAD-related things. Notification, which we're talking about now,
cluster IP address, which again is related to HAD, if you had a
global cluster, there's things in there related to that, again
related to HAD. So, no sense for that application, that service
group, to still be running if HAD's not running. In the
ClusterService service group, when notification is set up, we're
going to have a notifier resource which is going to actually kick
off the notifier daemon, and we're going to have a NIC resource
called csgnic, ClusterService group NIC, and that's for the public
network. Because, again, if we're doing Email notification, or,
well, really, any notification, but especially Email notification,
typically we have to communicate out to public network.
Author's Original Notes:
>>>
ClusterService is created automatically during cluster
configuration if either SMTP or SNMP notification is
selected.
>>>
After configuring notification, >>>ClusterService contains
a notifier resource to manage the notifier daemon,
*
Information
HAD
HAD
notifier
SMTP
SNMP
Transcript:
When we do notification, there's severity levels, as I mentioned,
associated with this, and you really can see this when you look in
the engine log. Every single message in the engine log will have a
severity level associated with it. There's a subset of that that
are used with the notifier. And the four, there's actually five
levels, but the four that are used are shown, and that's Info,
Warning, Error, and SevereError. That's what the notifier calls
them. We'll see in the next slide how that translates to what
actually you see in the engine log. It's very similar. But the idea
is when an event occurs, like a resource faults, when we set up
notification, we'll basically say, at a certain severity level, we
want something to happen. And so, that something would be the
notification, and then the notifier would pick up that message and
send it to the appropriate place relative to that severity level.
We'll see that coming up.
Author's Original Notes:
>>>
and SevereError
*
From: Notifier
Entity Name: websg
Traps Origin: Veritas_Cluster_Server
System Name: S1
Entities Container Name: webclus
Entities Container Type: VCS
Notifier and log events
2011/07/06 15:19:37 VCS ERROR V-16-1-10205 Group websg is faulted
on system s1
Notifier e-mail
Log file
Information
INFO
NOTICE
Warning
WARNING
Error
ERROR
SevereError
CRITICAL
engine_A.log
Transcript:
Here's an example of the engine log, and you'll notice, in the
engine log, we have an error level message. So it says VCS Error,
and then it gives a unique number. And then, if we have
notification set up that's at least error level or above, that's
going to cause the notifier to kick that out, generate, in this
example, an Email message, and send that to the appropriate
recipients. You'll notice the notifiers levels, Information,
Warning, etc., versus what you see in the engine log. And again,
they directly map. The most unusual is that the notifier calls
critical level messages SevereError. And you'll also notice that
there isn't a severity level in the notifier for notice. Notice and
Info are very similar to each other, and it's extremely rare that
you would ever have the notifier deliver messages at that level, so
we didn't bother. When we added Notice in, we didn't bother to
enhance the notifier for that level. Usually, where you're looking
is at Warning, Error, and Critical. A lot of times, customers will
start with Warning as they're feeling out how things works with
VCS, what kind of notifications come out. And then, they may say,
you know that's a little bit too much, we'll go to Error and
Critical. But again, that'll be up to you.
Author's Original Notes:
>>>
This example shows a log entry for a VCS ERROR and the
corresponding e-mail message.
*
hamsg engine_A
VCS engine (had) started
Cluster logger started
Configuration must be ReadWrite
!
Transcript:
Remember our log file, the absolute number one log is the engine
log, and it's in /var/VRTSvcs/log/. You can use the hamsg command
to look at the logs. Not so common. Most customers tend to just
look at the log directly, maybe view the file, because usually you
want to jump around. Or maybe they'll grep for certain messages in
the file. It's really up to you. It's a time-stamped log file, and
so depending on what you're looking for, you'll access it in
different ways. When I'm doing troubleshooting, I actually like to
view the file, go to the end of the file, and then start searching
backwards in time for the name of the resource that had the issue.
And then I just keep doing an N and stepping through doing next,
next, and step through the different messages until I find the
specific information for that resource that I'm interested in. But
again, it's just a time-stamped log file just like syslog, and so
you deal with it how what makes the most sense to you. This slide
also shows that the format of the messages is always the time
stamp, then the severity level, then you get what we call a Unique
Identifier, the V- numbers. And then after that comes the actual
error message. One of the things you can do that's nice is when you
get the error message, if you're not sure what it means, you always
could go out to SORT, remember, sort.symantec.com, and one of the
things you can do is search for that number, and it'll come back
and give you a definition of what that error message means. And
then, depending on which it is, it may actually have some solutions
for you as to how to solve that issue.
Author's Original Notes:
>>>
>>>
Each entry includes a text code indicating severity, from CRITICAL
entries to INFO entries with status information.
CRITICAL entries indicate problems requiring immediate
attention—contact Customer Support immediately
ERROR entries indicate exceptions that need to be
investigated
*
After completing this topic, you will be able to configuring
notification using the NotifierMngr resource.
Configuring notification
Transcript:
This lesson, this topic now, let's look at how do we set this
up.
Author's Original Notes:
*
Set attributes
VOM
go.symantec.com/
vom
Transcript:
There's basically three ways. Probably the most common is just to
use the installer. As I mentioned earlier in an earlier lesson, the
installer, when you run through it, one of the questions it asks
you is do you want to set up either Email notification or SNMP
traps. Really, there's no reason not to say yes. If you're going to
do notification at the VCS level, there's really no reason to say
no at that point like we did and then do it later. The reason we
purposely did that is so that we can have this lesson and actually
talk about things in more detail. But in your production
environment, you would just go ahead and let the installer set
things up, then answer the questions appropriately, and now you'd
have the ClusterService service group with the appropriate notifier
resources in it. Another very common way these days, and a lot of
-- more and more customers are actually moving towards that, is to
do notification at VOM. VOM actually has a quite sophisticated
notification environment that's above and beyond what VCS can do.
Because remember, one of the beauties of VOM is not just an
individual cluster. You could use VOM for the hundred clusters you
have so you could actually get notifications for all your clusters
from one central place. And then, the third way to do it is through
the command line, and that's what we're going to do in this lesson,
just talk about how we manually add these in.
Author's Original Notes:
!
Modify the SmtpServer and SmtpRecipients attributes.
Modify SnmpConsoles, if using SNMP notification.
Optionally, modify ResourceOwner and GroupOwner.
Configure the SNMP console to receive VCS traps.
Add a NotifierMngr resource to only one service group,
ClusterService.
Transcript:
Here's the process to set up notification. The first thing we have
to do is actually add a resource in, it's going to be of type
NotifierMngr. The name of the resource should be Notifier, and we
want to put it into the ClusterService service group. Again, you
actually could have it somewhere else, but don't -- put everything
HAD related into ClusterService. Then, if we're doing Email
notification, we need to set up the Email server. So, that resource
attribute is called SMTP server. And then we're going to set up
SMTP recipients, and that's going to be a pairing of information.
You're going to have an Email address or alias, and then you're
going to have a severity level, and then you can have any number of
those. So, for example, let's say I have an Email alias that is
first line support, and I want them to receive any messages that
are Warning and above. And then I could have an Email alias that
was second line support, and I want them to receive anything that's
Error and above. And so again, you can do any kind of verities like
that that makes sense for your environment. Once you've got that,
that's if you're going to do Email notification, you absolutely
have to do that part. This next part is completely optional. For
each resource, you actually could set up an owner of that resource,
and for each service group you could set up an owner of that
service group. Again, it's going to be an Email address or alias.
Then, if anything happens to that resource, any event like even
onlined, faulted, anything like that, an Email would get sent to
that alias, same thing for the service group. You could set those
up, but they only take effect if you've got Email notification
properly configured. Along the same lines, you can also, you can do
an And or And/Or SNMP traps notification. Alright so, if I do want
to set that up, I have to say where's the console or consoles, and
then we'll see on another slide there's a MIB file that I need to
load in.
Author's Original Notes:
>>>
You can configure SMTP or SNMP, or both types of
notification,
by specifying applicable attributes.
Configure only one notifier manager resource in the cluster
and
place the resource in the ClusterService group.
*
Both can be specified.
Resource definition
Transcript:
Alright, here's an example. Again, this is the details of what you
want to see. Alright so, it's going to be in ClusterService, you
want it called notifier, and then here's an example of an Email
server, and then here we're just having a single recipient. So, the
way that slide shows it is that's how it's going to look in the
main.cf file, is it'll say Email address equals severity level. But
when you set it up, for example, I would do an hares-modify. My
resource name is notifier, my attribute is SmtpRecipients, and then
I would say
[email protected] Error,
[email protected] SevereError,
that type of thing, and then it would match up appropriately.
Author's Original Notes:
The NotifierMngr agent starts and monitors the notifier daemon. If
the agent detects failure, the NotifierMngr resources
>>>
You must take the resource offline and bring it back online any
time you change attributes.
>>>
*
Writes an entry in the log file
Requires notifier to be configured
hares –modify resource ResourceOwner kim
2011/12/03 11:23:48 VCS INFO V-16-1-10304 Resource file1
(Owner=kim, Group=websg) is offline on s1
ResourceStateUnknown
CLI
Transcript:
Here's an example of resource owner. Again, every single resource
has this attribute. So, you could set it at any time, you can set
it or unset it at any time. But it's only when an event occurs, and
only if you have Email notification actually set up does it send an
actual Email. So, here's an example of the events that could occur,
and you can see it's anything related to that resource. So, it
faults, or it gets started, things like that. Down at the bottom,
it shows you again the example of how you're setting it. Now, one
thing I would recommend, though, is you'll notice, in this case,
it's saying recent owner is Kim. That's actually should be an Email
address, and so I would recommend you fully qualify it. So, if you
don't, it's going to default to kim@ whatever your Email server is.
So, in most cases, I like to see fully qualified because then
there's no confusion as to where it's going. So, following with the
example we had here, it would be
[email protected].
Author's Original Notes:
>>>
*
Requires notifier to be configured
hagrp –modify group GroupOwner chris
From: Notifier
. . .
CLI
Transcript:
Same idea GroupOwner for a service group. So, this is a group level
attribute. So, if I were to do an hagrp-display for example, I
could see this guy. Again, it could be set at any time, or unset at
any time, and here's the events that will cause a message to get
sent. Again, as in the last slide, I would fully qualify the
GroupOwner attribute so that you know exactly where it's
going.
Author's Original Notes:
*
Transcript:
New in 6.0 is we took that idea we just talked about, and we took
it a little bit further. So now, we've added a number of other
attributes in that you can set just like the Email, the SMTP
recipients idea, except here, we're narrowing it down to specific
levels. So, you have, for the cluster, any cluster level messages
that occur that are Error and above would go to
[email protected].
At the same time, remember we had the SMTP recipients. That was
also Error and above. So, if something at a cluster level, a
cluster level error occurred, both of them are going to get
notifications. However, if a resource had an error level, the SMTP
recipients, it's going to get it, Jane Doe is not because that's
not a cluster level action. So, we've added these in to give you a
little more granularity on how you send out your
notifications.
Author's Original Notes:
>>>Additional attributes are added to enable broader
specification of users to be notified
of resource and service group events. These attributes are
configured at the corresponding
object level. For example, the GroupRecipients attribute is
configured within a service group
definition.
>>>These attributes are specified by a list of email ids
of users along with the severity level.
The registered users would get only those events which have
severity equal to or greater than
the severity requested. For example, if janedoe is configured in
the ClusterRecipients attribute with
a severity level “Warning”, she would get events of severity
“Warning”, “Error” and “SevereError”
but would not get events with severity “Information”. A cluster
event, such as a cluster fault, which
is Error level, would be sent to janedoe.
*
Configuring the SNMP console
Load the MIB for VCS traps into the SNMP management console.
For HP OpenView Network Node Manager, merge events:
xnmevents -merge vcs_trapd
/etc/VRTSvcs/snmp/vcs.mib
/etc/VRTSvcs/snmp/vcs_trapd
Transcript:
That was Email notification, here's SNMP traps. As I mentioned, we
have to configure the appropriate attributes to say where the
console is, and then here's the files we provide so you have to
load these into the appropriate console. And so, for example, if
you have HP OpenView, that's the vcs_trapd file that you would load
in.
Author's Original Notes:
>>>
*
After completing this topic, you will be able to use triggers to
provide notification.
Using triggers for notification
Transcript:
So that's how you set up notification. Again, like I said, usually
you just let the installer take care of that, done in one fell
swoop. Let's now take a look at what triggers are.
Author's Original Notes:
*
An alternative method of notification
Useful for customizing VCS behavior in response to events
The TriggersEnabled attribute:
Can be used to enable a resource trigger
for all resources in a service group
Can be localized per system
Using triggers
New
Transcript:
Triggers is a very cool feature of VCS. We're just briefly going to
touch on it in this class. We really don't have a lot of time to
talk about it, but it is a topic that we tend to talk about in our
advanced classes. What a trigger is, is there are certain events
that happen with VCS, and there's a trigger associated with it.
What a trigger really is, is code. So, you actually are the one
that writes the triggers. And so, what happens is when an event
occurs, like a service group goes online, well once that occurs,
there could be a trigger such as POSTONLINE that then would get
executed. And what it actually does is completely up to you. When
the event occurs, VCS will go look to see if the trigger exists. If
it does, it'll execute it and it immediately returns. It doesn't
wait for the trigger to finish because we have no control over what
it actually does. This is your code. But this is a great way that
you can customize VCS to do certain things if the product itself
doesn't have the features that you need, or you have a really
unusual service group that needs some additional things to happen
at different times. You could use this for anything you needed to
do. You could even have it do HA commands. So, for example, you
could have a trigger that does one thing that then causes maybe a
service group to get switched to another node. It's completely up
to you on what you need. Now, there's a number of triggers that are
relative to resources, and there's a number of triggers that are
relative to service groups, and this slide kind of gives you an
idea of the different names of the triggers. There's a
TriggersEnabled attribute that you have to set that will allow
these triggers to run for a particular resource or a particular
service group. And this is actually a change in the way we've done
it in the past. The concept is exactly the same from previous
versions, but we've enhanced it to be a little more granular.
Author's Original Notes:
>>>
>>>
*
TriggerPath = "bin/test/websg/"
/opt/VRTSvcs
$VCS_HOME
Transcript:
The location of the triggers, first off, everything is based off
$VCS_HOME, which is opt/VRTSvcs. And so, there's an attribute
called TriggerPath that basically says where is the trigger
located. So, this is something we've gotten much, much more
granular on. As I mentioned, it used to be that you could just have
one trigger for everything. So, for example, if a resource faulted,
there's a trigger called RESFAULT. And it used to be that you could
only have one of those in opt/VRTSvcs/bin, and then triggers, the
directory called triggers. And then so, any time a resource
faulted, if it was allowed to run the trigger, it had to execute
that guy. So, what it ended up meaning is that you had one trigger
for an entire system, and it in essence had to be a huge FNL
statement. So, if it's resource A, go do this, if it's resource B,
go do that, etc. Now, with this whole TriggerPath idea, we can get
very granular. So we could have a specific trigger for resource A,
we could have a specific trigger for resource B. So, the
TriggerPath is going to be the relative path to where the trigger
is located, and again, it's relative to opt/VRTSvcs. So, if you can
see in that example of a trigger called PREONLINE, we're having it
located in bin/test/websg. So, that's where we look to find a file
called /opt/VRTSvcs/bin/test/websg/preonline. If we see that, then
we're going to try to execute it. If that's not defined, then it
defaults back to the pre-6.0, which as I mentioned was
opt/VRTSvcs/bin/triggers. And so we look in there for the RESFAULT
trigger, for example.
Author's Original Notes:
Another enhancement to the trigger implementation is the
TriggerPath attribute,
which enables you to customize the location of trigger
scripts.
>>>If the TriggerPath attribute is specified at the
service group level, the full path
to the trigger script is
$VCS_HOME/path_in_TriggerPath_attribute/name_of_the_trigger.
As an example, if you enable the preonline trigger, and set
TriggerPath to bin/test/SG1, the path VCS uses to find the
trigger
script is $VCS_HOME/bin/test/SG1/preonline.
If you set the TriggerPath attribute for a resource, the full path
to the
trigger script is
$VCS_HOME/path_in_TriggerPath_attribute/resource_name/name_of_the_trigger.
For example, if you enable the resfault trigger and set TriggerPath
to bin/test/SG1, the path VCS uses to find the trigger
script is $VCS_HOME/bin/test/SG1/resource_name/resfault.
If TriggerPath is not defined, the legacy path of
$VCS_HOME/bin/triggers is used to locate the scripts.
This ensures VCS 6.0 is backward compatible to support any triggers
previously configured, as in the case
of an upgrade to 6.0.
*
Can be copied and modified
Are located in /opt/VRTSvcs/bin/sample_triggers
# <system>: is the name of the system where resource
faulted.
# <resource>: is the name of the resource that faulted.
# <oldstate>: is the previous state of the resource
that
# faulted.
. . .
Transcript:
Here's a sample trigger. What we actually provide you -- again,
it's your code, so you can come up with completely whatever you
want it to do. But what we do provide is examples. So, we provide
examples of all the different triggers. And if you look at them,
what it'll do is, towards the top, and it's a Perl script by
default, towards the top it'll show you the trigger name, which in
this example is RESFAULT, and then it'll show you the attributes
that get passed to that trigger. So, in this particular -- I should
say variables since it's a Shell script -- in this example you're
going to get three variables passed to it. The first one is the
system it's being executed on, the second is the resource it's
being executed on, and then the third is the oldstate; so, what the
state was before it faulted. And again, it has the definitions of
all those. Then, as you go down, you'll find a section that says
put your code here. So, we basically give you a wrapper, and then
you put the code that does whatever this thing needs to.
Author's Original Notes:
Sample trigger scripts are provided with VCS. These can be copied
to the triggers directory
and modified to your specifications.
You can use the hatrigger command to manually test a trigger before
copying it to all systems
in the cluster.
rc2.d, Tnumservicename
Scripts are executed as in ascending order based on num
Example:
Scripts are named T01backup, T02setenv, and so on
ls /opt/VRTSvcs/bin/test/websg/preonline/
T01backup
T02setenv
T03online
Transcript:
Another really nice enhancement with 6.0 is, as I mentioned, prior
to 6.0, not only was there only one trigger for an event for the
entire system, but you only could have one thing. You couldn't have
more than one piece of code get executed. So now, you can. So now,
we have a model similar to rc scripts, except instead of S&K,
uppercase S, uppercase K, these have to be uppercase T, and a
number, and then the name of it. And so, we're going to actually
execute these. So again, they have to be in the appropriate path,
however that's defined in TriggerPath. But then, within it, they
have to be T and a number, so T01, T02, etc., and then its name.
And so, we're going to execute them in number sequence. So, in this
particular example for preonline, we're going to execute the
T01backup script first, then we're going to execute the T02setenv
script, and then lastly we're going to execute the T03online
script. So, it gives you a lot of flexibility on how you enhance
these.
Author's Original Notes:
The VCS 6.0 trigger implementation now supports the use of
multiple
scripts for a single trigger. This enables you to break the logic
of a trigger
into components, rather than having all trigger logic in one
monolithic
script.
*
Reference materials
Veritas Cluster Server User’s Guide
Symantec Operations Readiness Tools
Transcript:
So, in this lesson, we looked at notification, and then we took a
brief look at triggers.
Author's Original Notes:
You should now be able to configuration notification and triggers
to enable VCS to alert you
when cluster events occur.
*