17 February 2010

SCOM 2007 R2: monitor an Amazon EC2 Linux VM

What a strange idea, you may say...But having some VM in the cloud doesn't meant they must be monitored apart. That said, it's quite a challenge:

  • Management Packs from Microsoft don't support Fedora
  • Authentication on VM Amazon are certificate based by default

When SCOM detect Fedora, it stop straight forward:To get the Linux version, SCOM execute this shell script here:By the way, this folder also contains RPM package installed next (daemon). Getting on our goal will need works...

To keep you on, here is one result when it works:Performances SCOM pour une VM Amazon EC2You got it, it's about getting SCOM to believe we have a Red Hat. In fact we have 2 ways:

  • Create our own Management Pack for Fedora.
  • Disguise Fedora as if it was a Red Hat by fooling SCOM.

The first one is the more elegant, but it will take lot of time to get it and then to keep up. So i preferred the second one, as a Fedora is not that far from a Red Hat...

Changes on Linux Amazon EC2


Allow SSH connection without certificate
In /etc/ssh/sshd_config:
PasswordAuthentication yes PermitRootLogin no #Remove also the last line of the file that allow root login without password

Then you need to set a root password:

passwd root

Then reload configuration:

/etc/init.d/sshd reload

Then create a SCOM account:

useradd scom passwd scom
hostname check

By default, the VM got it's internal IP as hostname. SCOM will deploy an RPM to get a daemon running, and will generate a certificate to authenticate.
It's mandatory that the hostname match the name used by SCOM to reach it's agent. The easiest is to change the hostname to a name reachable by SCOM, like you www.mydomain.com:

hostname www.lotp.fr

To keep after reboot, you need to update /etc/sysconfig/network by adding:
HOSTNAME=www.mydomain.com

Dependency

The RPM package from Microsoft depend at least on this 2 libraries:

  • /usr/lib/libssl.so.6
  • /usr/lib/libcrypto.so.6

If your system is more up to date (so.8 …), you will need to make symbolics links (ln -s target realfile):

If you don't, you will have these errors messages when installing RPM (first libssl then libcrypto once the first is resolved):

At the end, i have the following symbolics links:
lrwxrwxrwx  1 root root       16 Feb 16 16:19 libssl.so.6 -> libssl.so.0.9.8k lrwxrwxrwx  1 root root       12 Feb 16 16:23 libcrypto.so.6 -> libcrypto.so

I recommend you to copy over the rpm (scx-1.0.4-248.rhel.5.x86.rpm), to check for the good installation result. Else, it will be SCOM that do it during discovery, and error messages are not that explicit.
Moreover, it shows up the name with which the certificate is made:

You can always remove it:

rpm -e scx-1.0.4-248.i386
TCP Ports
SCOM will connect by 2 ways:
  • SSH
  • port 1270 (scom agent)

You will have to set up the Amazon firewall to get these 2 ports reachable. If port 1270 is not reachable, SCOM raise an error about time out, which doesn't help a lot. Traffic to this ports is confirmed by a network trace:

/etc/redhat-release

By default, Fedora create a symbolic link from /etc/fedora-release to redhat-release. You must break it and create a true redhat-relase file with this text inside::

rm /etc/redhat-release echo "Red Hat Enterprise Linux Server release 5" > /etc/redhat-release

Changes on SCOM 2007 R2

Tasks to do:

  • Allow WinRM to get basic authentication
  • Import Management Packs Linux/Unix generic and Red Hat
  • Add a Basic Authentication account to log on Linux (not root)
  • Add a Basic Authentication account to log on Linux (Root)
  • Link them to profiles "Unix Action Account" and "Unix Privileged Account"
  • Discover the VM and sign certificate

Adding the VM is straight forward if you followed the tasks before. So i will cover the problems i met.

Allow WinRM to do nasic authentication:If you don't have the good hostname, the certificate is refused when it's time to sign it:If you don't add linux account to profiles, workflow don't work:

At the beginning, the VM is not monitored and seen as unknown:

Then it switch to green and known:

Here we go, your VM is now monitored by SCOM 2007 R2.

Here are some screenshots once it works:

1 comment:

Anonymous said...

Hi

I try install agent on Fedora 13.
But on all servers receive same error

WS-Management cannot process the request. The operation failed because of an HTTP error. The HTTP error (12152) is: The server returned an invalid or unrecognized response .

Install is fail.
Can you help?