24
1 Veritas Cluster Server 6.0 for UNIX: Install and Configure Lesson 10: Configuring Notification

11vcs6unixconfignotify Trans

  • Upload
    gyan1

  • View
    12

  • Download
    0

Embed Size (px)

DESCRIPTION

vcs

Citation preview

Install and Configure: Configuring NotificationLesson 10: Configuring Notification
Transcript:
BRAD WILLER: Welcome to VCS 6.0 Install and Config, Configuring Notification.
Author's Original Notes:
*
Lesson 3: Preparing a Site for VCS
Lesson 4: Installing VCS
Lesson 5: VCS Operations
Lesson 7: Preparing Services for VCS
Lesson 8: Online Configuration
Lesson 9: Offline Configuration
Lesson 10: Configuring Notification
Lesson 13: Cluster Communications
Lesson 15: Coordination Point Server
Lesson introduction
Author's Original Notes:
*
Configuring notification
Using triggers for notification
Transcript:
In this lesson, we're going to talk about what the notification process is, we're going to see how we set it up, and lastly we're going to briefly talk about a really cool feature of VCS called triggers.
Author's Original Notes:
*
After completing this topic, you will be able to describe how VCS provides notification.
Notification overview
Author's Original Notes:
*
Notification overview
HAD sends a message to the notifier daemon when an event occurs.
The notifier daemon:
Formats the event message
Sends an SNMP trap or e-mail message (or both) to designated recipients
Replicated message queue
HAD
HAD
notifier
SMTP
SNMP
Transcript:
As you probably have a decent idea now, since we've looked at the engine log so many times, is every time something happens in VCS, we're going to log things to the engine log, everything that happens. Now, with each of those events, there's a severity level associated with that. So, if something comes up where HAD needs to notify someone, either by Email or by SNMP traps, then what it does is actually lets the notifier daemon know that, hey, there's something to do. The notifier is a daemon that's going to be running in the - or an agent I guess you can refer to as in the ClusterService service group. And it will format the message as appropriate, and then send it, either to the Email server or to the SNMP console depending on which it is. The queue of messages that the notifier has to deal with are actually replicated across nodes because again we want to make sure it's highly available. So, if one node crashes, the ClusterService service group would failover to another node. The notifier agent would then deal with the notifications, the messages that are on that queue communicating with that HAD.
Author's Original Notes:
>>>
Send an e-mail message to designated recipients.
>>>
*
The notifier daemon is managed by a NotifierMngr type resource.
The ClusterService group:
Is configured using the CLI, GUI, or during initial configuration
Is online on one cluster node only
ClusterService
HAD
HAD
notifier
csgnic
notifier
Transcript:
Alright, the service group as I've mentioned where this happens is going to be ClusterService. So, you're going to have a resource actually called notifier, and it's going to be of type, NotifierMngr. And the actual daemon out there, if you do PSF-CF is actually called notifier as well. It's actually not a strict requirement that the notifier is in ClusterService service group, but that's where we really want to see it. Because, usually this gets set up at install time, and the installer is going to put it in the ClusterService service group, as well as this is something that really makes sense to be in that service group because it's something directly related to HAD. Remember, don't ever put any of your stuff in ClusterService because we bend all the rules with this service group. For example, remember when we do an hastop-all-force, and we force HAD down but leave all the other applications running, i.e., all the other service groups? ClusterService always goes down, because if HAD's not running, there's no reason for ClusterService to be running because it's all HAD-related things. Notification, which we're talking about now, cluster IP address, which again is related to HAD, if you had a global cluster, there's things in there related to that, again related to HAD. So, no sense for that application, that service group, to still be running if HAD's not running. In the ClusterService service group, when notification is set up, we're going to have a notifier resource which is going to actually kick off the notifier daemon, and we're going to have a NIC resource called csgnic, ClusterService group NIC, and that's for the public network. Because, again, if we're doing Email notification, or, well, really, any notification, but especially Email notification, typically we have to communicate out to public network.
Author's Original Notes:
>>>
ClusterService is created automatically during cluster configuration if either SMTP or SNMP notification is selected.
>>>
After configuring notification, >>>ClusterService contains a notifier resource to manage the notifier daemon,
*
Information






HAD
HAD
notifier
SMTP
SNMP
Transcript:
When we do notification, there's severity levels, as I mentioned, associated with this, and you really can see this when you look in the engine log. Every single message in the engine log will have a severity level associated with it. There's a subset of that that are used with the notifier. And the four, there's actually five levels, but the four that are used are shown, and that's Info, Warning, Error, and SevereError. That's what the notifier calls them. We'll see in the next slide how that translates to what actually you see in the engine log. It's very similar. But the idea is when an event occurs, like a resource faults, when we set up notification, we'll basically say, at a certain severity level, we want something to happen. And so, that something would be the notification, and then the notifier would pick up that message and send it to the appropriate place relative to that severity level. We'll see that coming up.
Author's Original Notes:
>>>
and SevereError
*
From: Notifier
Entity Name: websg
Traps Origin: Veritas_Cluster_Server
System Name: S1
Entities Container Name: webclus
Entities Container Type: VCS
Notifier and log events
2011/07/06 15:19:37 VCS ERROR V-16-1-10205 Group websg is faulted on system s1
Notifier e-mail
Log file
Information
INFO
NOTICE
Warning
WARNING
Error
ERROR
SevereError
CRITICAL
engine_A.log
Transcript:
Here's an example of the engine log, and you'll notice, in the engine log, we have an error level message. So it says VCS Error, and then it gives a unique number. And then, if we have notification set up that's at least error level or above, that's going to cause the notifier to kick that out, generate, in this example, an Email message, and send that to the appropriate recipients. You'll notice the notifiers levels, Information, Warning, etc., versus what you see in the engine log. And again, they directly map. The most unusual is that the notifier calls critical level messages SevereError. And you'll also notice that there isn't a severity level in the notifier for notice. Notice and Info are very similar to each other, and it's extremely rare that you would ever have the notifier deliver messages at that level, so we didn't bother. When we added Notice in, we didn't bother to enhance the notifier for that level. Usually, where you're looking is at Warning, Error, and Critical. A lot of times, customers will start with Warning as they're feeling out how things works with VCS, what kind of notifications come out. And then, they may say, you know that's a little bit too much, we'll go to Error and Critical. But again, that'll be up to you.
Author's Original Notes:
>>>
This example shows a log entry for a VCS ERROR and the corresponding e-mail message.
*
hamsg engine_A
VCS engine (had) started
Cluster logger started
Configuration must be ReadWrite
!
Transcript:
Remember our log file, the absolute number one log is the engine log, and it's in /var/VRTSvcs/log/. You can use the hamsg command to look at the logs. Not so common. Most customers tend to just look at the log directly, maybe view the file, because usually you want to jump around. Or maybe they'll grep for certain messages in the file. It's really up to you. It's a time-stamped log file, and so depending on what you're looking for, you'll access it in different ways. When I'm doing troubleshooting, I actually like to view the file, go to the end of the file, and then start searching backwards in time for the name of the resource that had the issue. And then I just keep doing an N and stepping through doing next, next, and step through the different messages until I find the specific information for that resource that I'm interested in. But again, it's just a time-stamped log file just like syslog, and so you deal with it how what makes the most sense to you. This slide also shows that the format of the messages is always the time stamp, then the severity level, then you get what we call a Unique Identifier, the V- numbers. And then after that comes the actual error message. One of the things you can do that's nice is when you get the error message, if you're not sure what it means, you always could go out to SORT, remember, sort.symantec.com, and one of the things you can do is search for that number, and it'll come back and give you a definition of what that error message means. And then, depending on which it is, it may actually have some solutions for you as to how to solve that issue.
Author's Original Notes:
>>>
>>>
Each entry includes a text code indicating severity, from CRITICAL entries to INFO entries with status information.
CRITICAL entries indicate problems requiring immediate attention—contact Customer Support immediately
ERROR entries indicate exceptions that need to be investigated
*
After completing this topic, you will be able to configuring notification using the NotifierMngr resource.
Configuring notification
Transcript:
This lesson, this topic now, let's look at how do we set this up.
Author's Original Notes:
*
Set attributes
VOM
go.symantec.com/
vom
Transcript:
There's basically three ways. Probably the most common is just to use the installer. As I mentioned earlier in an earlier lesson, the installer, when you run through it, one of the questions it asks you is do you want to set up either Email notification or SNMP traps. Really, there's no reason not to say yes. If you're going to do notification at the VCS level, there's really no reason to say no at that point like we did and then do it later. The reason we purposely did that is so that we can have this lesson and actually talk about things in more detail. But in your production environment, you would just go ahead and let the installer set things up, then answer the questions appropriately, and now you'd have the ClusterService service group with the appropriate notifier resources in it. Another very common way these days, and a lot of -- more and more customers are actually moving towards that, is to do notification at VOM. VOM actually has a quite sophisticated notification environment that's above and beyond what VCS can do. Because remember, one of the beauties of VOM is not just an individual cluster. You could use VOM for the hundred clusters you have so you could actually get notifications for all your clusters from one central place. And then, the third way to do it is through the command line, and that's what we're going to do in this lesson, just talk about how we manually add these in.
Author's Original Notes:
!
Modify the SmtpServer and SmtpRecipients attributes.
Modify SnmpConsoles, if using SNMP notification.
Optionally, modify ResourceOwner and GroupOwner.
Configure the SNMP console to receive VCS traps.
Add a NotifierMngr resource to only one service group, ClusterService.
Transcript:
Here's the process to set up notification. The first thing we have to do is actually add a resource in, it's going to be of type NotifierMngr. The name of the resource should be Notifier, and we want to put it into the ClusterService service group. Again, you actually could have it somewhere else, but don't -- put everything HAD related into ClusterService. Then, if we're doing Email notification, we need to set up the Email server. So, that resource attribute is called SMTP server. And then we're going to set up SMTP recipients, and that's going to be a pairing of information. You're going to have an Email address or alias, and then you're going to have a severity level, and then you can have any number of those. So, for example, let's say I have an Email alias that is first line support, and I want them to receive any messages that are Warning and above. And then I could have an Email alias that was second line support, and I want them to receive anything that's Error and above. And so again, you can do any kind of verities like that that makes sense for your environment. Once you've got that, that's if you're going to do Email notification, you absolutely have to do that part. This next part is completely optional. For each resource, you actually could set up an owner of that resource, and for each service group you could set up an owner of that service group. Again, it's going to be an Email address or alias. Then, if anything happens to that resource, any event like even onlined, faulted, anything like that, an Email would get sent to that alias, same thing for the service group. You could set those up, but they only take effect if you've got Email notification properly configured. Along the same lines, you can also, you can do an And or And/Or SNMP traps notification. Alright so, if I do want to set that up, I have to say where's the console or consoles, and then we'll see on another slide there's a MIB file that I need to load in.
Author's Original Notes:
>>>
You can configure SMTP or SNMP, or both types of notification,
by specifying applicable attributes.
Configure only one notifier manager resource in the cluster and
place the resource in the ClusterService group.
*
Both can be specified.
Resource definition
Transcript:
Alright, here's an example. Again, this is the details of what you want to see. Alright so, it's going to be in ClusterService, you want it called notifier, and then here's an example of an Email server, and then here we're just having a single recipient. So, the way that slide shows it is that's how it's going to look in the main.cf file, is it'll say Email address equals severity level. But when you set it up, for example, I would do an hares-modify. My resource name is notifier, my attribute is SmtpRecipients, and then I would say [email protected] Error, [email protected] SevereError, that type of thing, and then it would match up appropriately.
Author's Original Notes:
The NotifierMngr agent starts and monitors the notifier daemon. If the agent detects failure, the NotifierMngr resources
>>>
You must take the resource offline and bring it back online any time you change attributes.
>>>
*
Writes an entry in the log file
Requires notifier to be configured
hares –modify resource ResourceOwner kim
2011/12/03 11:23:48 VCS INFO V-16-1-10304 Resource file1 (Owner=kim, Group=websg) is offline on s1
ResourceStateUnknown
CLI
Transcript:
Here's an example of resource owner. Again, every single resource has this attribute. So, you could set it at any time, you can set it or unset it at any time. But it's only when an event occurs, and only if you have Email notification actually set up does it send an actual Email. So, here's an example of the events that could occur, and you can see it's anything related to that resource. So, it faults, or it gets started, things like that. Down at the bottom, it shows you again the example of how you're setting it. Now, one thing I would recommend, though, is you'll notice, in this case, it's saying recent owner is Kim. That's actually should be an Email address, and so I would recommend you fully qualify it. So, if you don't, it's going to default to kim@ whatever your Email server is. So, in most cases, I like to see fully qualified because then there's no confusion as to where it's going. So, following with the example we had here, it would be [email protected].
Author's Original Notes:
>>>
*
Requires notifier to be configured
hagrp –modify group GroupOwner chris
From: Notifier
. . .
CLI
Transcript:
Same idea GroupOwner for a service group. So, this is a group level attribute. So, if I were to do an hagrp-display for example, I could see this guy. Again, it could be set at any time, or unset at any time, and here's the events that will cause a message to get sent. Again, as in the last slide, I would fully qualify the GroupOwner attribute so that you know exactly where it's going.
Author's Original Notes:
*
Transcript:
New in 6.0 is we took that idea we just talked about, and we took it a little bit further. So now, we've added a number of other attributes in that you can set just like the Email, the SMTP recipients idea, except here, we're narrowing it down to specific levels. So, you have, for the cluster, any cluster level messages that occur that are Error and above would go to [email protected]. At the same time, remember we had the SMTP recipients. That was also Error and above. So, if something at a cluster level, a cluster level error occurred, both of them are going to get notifications. However, if a resource had an error level, the SMTP recipients, it's going to get it, Jane Doe is not because that's not a cluster level action. So, we've added these in to give you a little more granularity on how you send out your notifications.
Author's Original Notes:
>>>Additional attributes are added to enable broader specification of users to be notified
of resource and service group events. These attributes are configured at the corresponding
object level. For example, the GroupRecipients attribute is configured within a service group
definition.
>>>These attributes are specified by a list of email ids of users along with the severity level.
The registered users would get only those events which have severity equal to or greater than
the severity requested. For example, if janedoe is configured in the ClusterRecipients attribute with
a severity level “Warning”, she would get events of severity “Warning”, “Error” and “SevereError”
but would not get events with severity “Information”. A cluster event, such as a cluster fault, which
is Error level, would be sent to janedoe.
*
Configuring the SNMP console
Load the MIB for VCS traps into the SNMP management console.
For HP OpenView Network Node Manager, merge events:
xnmevents -merge vcs_trapd
/etc/VRTSvcs/snmp/vcs.mib
/etc/VRTSvcs/snmp/vcs_trapd
Transcript:
That was Email notification, here's SNMP traps. As I mentioned, we have to configure the appropriate attributes to say where the console is, and then here's the files we provide so you have to load these into the appropriate console. And so, for example, if you have HP OpenView, that's the vcs_trapd file that you would load in.
Author's Original Notes:
>>>
*
After completing this topic, you will be able to use triggers to provide notification.
Using triggers for notification
Transcript:
So that's how you set up notification. Again, like I said, usually you just let the installer take care of that, done in one fell swoop. Let's now take a look at what triggers are.
Author's Original Notes:
*
An alternative method of notification
Useful for customizing VCS behavior in response to events
The TriggersEnabled attribute:
Can be used to enable a resource trigger
for all resources in a service group
Can be localized per system
Using triggers
New
Transcript:
Triggers is a very cool feature of VCS. We're just briefly going to touch on it in this class. We really don't have a lot of time to talk about it, but it is a topic that we tend to talk about in our advanced classes. What a trigger is, is there are certain events that happen with VCS, and there's a trigger associated with it. What a trigger really is, is code. So, you actually are the one that writes the triggers. And so, what happens is when an event occurs, like a service group goes online, well once that occurs, there could be a trigger such as POSTONLINE that then would get executed. And what it actually does is completely up to you. When the event occurs, VCS will go look to see if the trigger exists. If it does, it'll execute it and it immediately returns. It doesn't wait for the trigger to finish because we have no control over what it actually does. This is your code. But this is a great way that you can customize VCS to do certain things if the product itself doesn't have the features that you need, or you have a really unusual service group that needs some additional things to happen at different times. You could use this for anything you needed to do. You could even have it do HA commands. So, for example, you could have a trigger that does one thing that then causes maybe a service group to get switched to another node. It's completely up to you on what you need. Now, there's a number of triggers that are relative to resources, and there's a number of triggers that are relative to service groups, and this slide kind of gives you an idea of the different names of the triggers. There's a TriggersEnabled attribute that you have to set that will allow these triggers to run for a particular resource or a particular service group. And this is actually a change in the way we've done it in the past. The concept is exactly the same from previous versions, but we've enhanced it to be a little more granular.
Author's Original Notes:
>>>
>>>
*
TriggerPath = "bin/test/websg/"
/opt/VRTSvcs
$VCS_HOME
Transcript:
The location of the triggers, first off, everything is based off $VCS_HOME, which is opt/VRTSvcs. And so, there's an attribute called TriggerPath that basically says where is the trigger located. So, this is something we've gotten much, much more granular on. As I mentioned, it used to be that you could just have one trigger for everything. So, for example, if a resource faulted, there's a trigger called RESFAULT. And it used to be that you could only have one of those in opt/VRTSvcs/bin, and then triggers, the directory called triggers. And then so, any time a resource faulted, if it was allowed to run the trigger, it had to execute that guy. So, what it ended up meaning is that you had one trigger for an entire system, and it in essence had to be a huge FNL statement. So, if it's resource A, go do this, if it's resource B, go do that, etc. Now, with this whole TriggerPath idea, we can get very granular. So we could have a specific trigger for resource A, we could have a specific trigger for resource B. So, the TriggerPath is going to be the relative path to where the trigger is located, and again, it's relative to opt/VRTSvcs. So, if you can see in that example of a trigger called PREONLINE, we're having it located in bin/test/websg. So, that's where we look to find a file called /opt/VRTSvcs/bin/test/websg/preonline. If we see that, then we're going to try to execute it. If that's not defined, then it defaults back to the pre-6.0, which as I mentioned was opt/VRTSvcs/bin/triggers. And so we look in there for the RESFAULT trigger, for example.
Author's Original Notes:
Another enhancement to the trigger implementation is the TriggerPath attribute,
which enables you to customize the location of trigger scripts.
>>>If the TriggerPath attribute is specified at the service group level, the full path
to the trigger script is $VCS_HOME/path_in_TriggerPath_attribute/name_of_the_trigger.
As an example, if you enable the preonline trigger, and set TriggerPath to bin/test/SG1, the path VCS uses to find the trigger
script is $VCS_HOME/bin/test/SG1/preonline.
If you set the TriggerPath attribute for a resource, the full path to the
trigger script is $VCS_HOME/path_in_TriggerPath_attribute/resource_name/name_of_the_trigger.
For example, if you enable the resfault trigger and set TriggerPath to bin/test/SG1, the path VCS uses to find the trigger
script is $VCS_HOME/bin/test/SG1/resource_name/resfault.
If TriggerPath is not defined, the legacy path of $VCS_HOME/bin/triggers is used to locate the scripts.
This ensures VCS 6.0 is backward compatible to support any triggers previously configured, as in the case
of an upgrade to 6.0.
*
Can be copied and modified
Are located in /opt/VRTSvcs/bin/sample_triggers
# <system>: is the name of the system where resource faulted.
# <resource>: is the name of the resource that faulted.
# <oldstate>: is the previous state of the resource that
# faulted.
. . .
Transcript:
Here's a sample trigger. What we actually provide you -- again, it's your code, so you can come up with completely whatever you want it to do. But what we do provide is examples. So, we provide examples of all the different triggers. And if you look at them, what it'll do is, towards the top, and it's a Perl script by default, towards the top it'll show you the trigger name, which in this example is RESFAULT, and then it'll show you the attributes that get passed to that trigger. So, in this particular -- I should say variables since it's a Shell script -- in this example you're going to get three variables passed to it. The first one is the system it's being executed on, the second is the resource it's being executed on, and then the third is the oldstate; so, what the state was before it faulted. And again, it has the definitions of all those. Then, as you go down, you'll find a section that says put your code here. So, we basically give you a wrapper, and then you put the code that does whatever this thing needs to.
Author's Original Notes:
Sample trigger scripts are provided with VCS. These can be copied to the triggers directory
and modified to your specifications.
You can use the hatrigger command to manually test a trigger before copying it to all systems
in the cluster.
rc2.d, Tnumservicename
Scripts are executed as in ascending order based on num
Example:
Scripts are named T01backup, T02setenv, and so on
ls /opt/VRTSvcs/bin/test/websg/preonline/
T01backup
T02setenv
T03online
Transcript:
Another really nice enhancement with 6.0 is, as I mentioned, prior to 6.0, not only was there only one trigger for an event for the entire system, but you only could have one thing. You couldn't have more than one piece of code get executed. So now, you can. So now, we have a model similar to rc scripts, except instead of S&K, uppercase S, uppercase K, these have to be uppercase T, and a number, and then the name of it. And so, we're going to actually execute these. So again, they have to be in the appropriate path, however that's defined in TriggerPath. But then, within it, they have to be T and a number, so T01, T02, etc., and then its name. And so, we're going to execute them in number sequence. So, in this particular example for preonline, we're going to execute the T01backup script first, then we're going to execute the T02setenv script, and then lastly we're going to execute the T03online script. So, it gives you a lot of flexibility on how you enhance these.
Author's Original Notes:
The VCS 6.0 trigger implementation now supports the use of multiple
scripts for a single trigger. This enables you to break the logic of a trigger
into components, rather than having all trigger logic in one monolithic
script.
*
Reference materials
Veritas Cluster Server User’s Guide
Symantec Operations Readiness Tools
Transcript:
So, in this lesson, we looked at notification, and then we took a brief look at triggers.
Author's Original Notes:
You should now be able to configuration notification and triggers to enable VCS to alert you
when cluster events occur.
*