Foreword
The following sections describe how I do Opensim Service Management for my Opensim regions running on my servers. I describe the directory structure I use, the updating process for Opensim and how I do service management using the Linux tool Monit (http://mmonit.com/monit/), as well as how I do regular backups.
Directory Structure
On my servers I have created a special user called “opensim”. Under this user’s home directory I have installed all Opensim files in a directory also called “opensim”. This folder contains a directory for each Opensim version installed.
The directory names use the following naming convention: “opensim_xxxx”, where “xxxx” is the Opensim SVN version number. I use a similar naming convention for the directory containing the search module: “ossearch_xxx” and similar naming conventions for other Opensim extensions I use. I compile the software in these directories and they also contain the standard configuration files.
The subdirectory “ServiceManagement” contains all scripts and pid (process id) files that I use for Opensim service management.
For each Opensim region, that use their own Opensim server processes, I have a separate sub folder under a subdirectory called “run”. The “run” subdirectory contains the Opensim version that is currently used. “run_old” is the previous version that I always keep to be able to do quick rollbacks. The directory “run_new” contains new Opensim versions, while I am still configuring them.
The run subdirectories mostly contain symbolic links to files in the “opensim_xxxx/bin” directory. Only the OpenSim.ini and region files are real copies. The OpenSim.ini files are adjusted manually based on the default file copied over from the “opensim_xxxx/bin” folder. There are independent Region and ScriptCache folders for each Opensim server instance, because they store region specific files.
Beside the mentioned folders, I have a “tmp” directory for downloading new software versions, before I rename the directory using the conventions mentioned above and before I move them to the main “opensim” directory. Additionally I have a “doc” subdirectory for documentation and one called “backup” for backups of OpenSim.ini and region files, as well as for oar backups.
.opensim
|-doc
|-backup
|-opensim_xxxx
|—bin
|—–Regions
|-ossearch_xxx
|—trunk
|-run
|—H1
|—–Regions
|—–ScriptEngines
|—H2
|—–Regions
|—–ScriptEngines
|—M3
|—–Regions
|—–ScriptEngines
|—M4
|—–Regions
|—–ScriptEngines
|—M5
|—–Regions
|—–ScriptEngines
|-run_old
|—H1
|—–Regions
|—–ScriptEngines
|—H2
|—–Regions
|—–ScriptEngines
|—M3
|—–Regions
|—–ScriptEngines
|—M4
|—–Regions
|—–ScriptEngines
|—M5
|—–Regions
|—–ScriptEngines
|-ServiceManagement
|-tmp
Internal Region Names
I use the following naming convention for directories, MySQL databases for the regions and process id files (pid files) on my Opensim servers. These names are independent of the real Opensim region names that you see within OSGrid:
- first a letter: H = high traffic, M = medium traffic, L = low traffic
- then the number of the region server process on that server, starting with 1
Examples: “H1” is the 1st Opensim process on that server, running a high traffic region. “M3” is the 3rd Opensim process on that server, running a medium traffic region.
In general these naming conventions are just my personal preference. For sure you can use any naming you like.
How to Update Opensim – Process and Scripts
I have developed a set of scripts and a process to be able to update many Opensim regions on a server efficiently, while being able to do quick rollbacks, if necessary. To update my Opensim installation I do the following steps:
1. I go in the “tmp” subdirectory and download the lastest version of Opensim:
$ svn co http://opensimulator.org/svn/opensim/trunk opensim
Or I download a specific Opensim version, for example the lastest recommended version:
$ svn co -r <version> http://opensimulator.org/svn/opensim/trunk opensim
2. I rename the resulting Opensim folder to “opensim_xxxx” and move it up:
$ mv opensim opensim_<opensim version>
$ mv opensim_<opensim version> ..
3. I download the lastest Opensim Search (ossearch) version, rename the directory and move it up:
$ svn checkout http://forge.opensimulator.org/svn/ossearch
$ mv ossearch ossearch_<ossearch version>
$ mv ossearch_<ossearch version> ..
4. I go to the directory of the new Opensim version and clean it up:
$ cd ../opensim_<opensim version>
$ ./runprebuild.sh
$ nant clean
If you see an error message start “nant clean” again.
5. I install the lastest Opensim search module:
$ cp -r ../ossearch_<ossearch version>/trunk/* .
6. I compile Opensim:
$ ./runprebuild.sh
$ nant
If you see an error message start “nant” again.
7. To configure Opensim you can use the OpenSim.ini.example file in the “bin” subdirectory to create a new, customized OpenSim.ini file from scratch. But usually you will prefer to use your previous version of the OpenSim.ini file to make the required changes for the new Opensim version.
I do such updates by comparing the OpenSim.ini.example files of the old and new version. Then I edit a copy of the old OpenSim.ini file to make the required changes for the new Opensim version. At the end I have a new, updated OpenSim.ini file in the “bin” subfolder.
$ cd bin
$ cp ../../opensim_<opensim old version>/bin/OpenSim.ini .
$ diff ../../opensim_<opensim old version>/bin/OpenSim.ini.example OpenSim.ini.example
$ vi OpenSim.ini
In the generic OpenSim.ini file in the bin subdirectory I use the following symbols. These symbols will later be replaced with the correct values for each region. This simplifies managing many Opensim regions, because you only need to update one master OpenSim.ini file and the individual OpenSim.ini files are automatically created by a script that I will describe later.
- REGION_NAME
- HTTP_PORT
- DATABASE_NAME
- DATABASE_PASSWORD
- SERVER_IP
- VOICE_IP
- AV_CAPSULE (only required for 64 bit servers)
8. Finally I check if some important files have been created properly and then I go back to the main Opensim directory:
$ ls *.ini libode* *Sea*
$ cd ../..
9. Now, I automatically create a “run_new” directory for the new Opensim version. This new directory is based on the given Opensim version and the region files of the regions in the current “run” directory. The very first time you have to setup a “run” directory yourself manually.
$ updateos opensim_<opensim version (without slash at the end!)>
This script creates the specific OpenSim.ini versions for each region automatically.
10. Finally I stop all running Opensim processes (see Service Management section), clean pid and Mono files (see “rmpiddsos” and “clearos” scripts in next section) and switch to the new Opensim version using the following commands:
$ rm -fr run_old
$ mv run run_old
$ mv run_new run
11. After that I restart the Opensim processes (see Monit in next section) or I reboot the whole server after doing additional Linux software updates.
12. Finally I log in and check if all my regions work well. For that I check if my regions rez properly. Then I test various scripted objects on my land and I test sim border crossings and teleports between my regions and to/from OSGrid plazas.
I have found out, that sometimes it is necessary to reset certain scripts to get them working again. Usually this is the case for the same scripts after each Opensim update.
If there are serious problems with an Opensim version, I do a rollback by simply stopping all Opensim processes, renaming the directories “run” to “run_broken” and “run_old” to “run”, and then I restart all Opensim processes.
If I need to change the OpenSim.ini file of the current Opensim version, I do these changes in the master OpenSim.ini file and run the following script, that updates all configuration files in the run subdirectory.
$ refreshos opensim_<opensim version (without slash at the end!)>
Service Management Scripts
The process that I have described previously, uses some scripts that I have stored in the user’s ~/bin directory. You might like to use similar scripts.
The following two scripts replace the symbols used in the generic OpenSim.ini file (REGION_NAME, HTTP_PORT, DATABASE_NAME, DATABASE_PASSWORD, SERVER_IP, VOICE_IP and AV_CAPSULE). You need to adjust the following scripts to set the proper values for each region.
#!/bin/sh
# updateos
echo Updating OpenSim…
cd /home/opensim/opensim/
mkdir run_new
cd run_new
mkdir M1 M2 M3 M4 M5 M6
mkdir M1/Regions M2/Regions M3/Regions M4/Regions M5/Regions M6/Regions
mkdir M1/ScriptEngines M2/ScriptEngines M3/ScriptEngines M4/ScriptEngines M5/ScriptEngines M6/ScriptEngines
cp ../run/M1/Regions/* M1/Regions
cp ../run/M2/Regions/* M2/Regions
cp ../run/M3/Regions/* M3/Regions
cp ../run/M4/Regions/* M4/Regions
cp ../run/M5/Regions/* M5/Regions
cp ../run/M6/Regions/* M6/Regions
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M1/g’ -e ‘s/HTTP_PORT/9010/g’ -e ‘s/DATABASE_NAME/M1/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M1/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M2/g’ -e ‘s/HTTP_PORT/9011/g’ -e ‘s/DATABASE_NAME/M2/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M2/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M3/g’ -e ‘s/HTTP_PORT/9012/g’ -e ‘s/DATABASE_NAME/M3/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M3/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M4/g’ -e ‘s/HTTP_PORT/9013/g’ -e ‘s/DATABASE_NAME/M4/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M4/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M5/g’ -e ‘s/HTTP_PORT/9014/g’ -e ‘s/DATABASE_NAME/M5/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M5/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M6/g’ -e ‘s/HTTP_PORT/9015/g’ -e ‘s/DATABASE_NAME/M6/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M6/OpenSim.ini
cd M1
ln -s ../../$1/bin/* .
ln -s ../../$1/bin/.* .
cd ../M2
ln -s ../../$1/bin/* .
ln -s ../../$1/bin/.* .
cd ../M3
ln -s ../../$1/bin/* .
ln -s ../../$1/bin/.* .
cd ../M4
ln -s ../../$1/bin/* .
ln -s ../../$1/bin/.* .
cd ../M5
ln -s ../../$1/bin/* .
ln -s ../../$1/bin/.* .
cd ../M6
ln -s ../../$1/bin/* .
ln -s ../../$1/bin/.* .
cd ../..
#!/bin/sh
# refreshos
echo Refreshing OpenSim INI Files…
cd /home/opensim/opensim/run/
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M1/g’ -e ‘s/HTTP_PORT/9010/g’ -e ‘s/DATABASE_NAME/M1/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M1/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M2/g’ -e ‘s/HTTP_PORT/9011/g’ -e ‘s/DATABASE_NAME/M2/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M2/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M3/g’ -e ‘s/HTTP_PORT/9012/g’ -e ‘s/DATABASE_NAME/M3/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M3/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M4/g’ -e ‘s/HTTP_PORT/9013/g’ -e ‘s/DATABASE_NAME/M4/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M4/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M5/g’ -e ‘s/HTTP_PORT/9014/g’ -e ‘s/DATABASE_NAME/M5/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M5/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M6/g’ -e ‘s/HTTP_PORT/9015/g’ -e ‘s/DATABASE_NAME/M6/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M6/OpenSim.ini
cd ..
Service Management with Monit
For continuous service monitoring I use Monit (http://mmonit.com/monit/). For many Linux versions Monit is available as software package that can be installed from a repository using the package manager of your Linux distribution.
After installing Monit, it is necessary to configure Monit in the file /etc/monitrc as root user. After that check your changes of /etc/monitrc by executing “monit -t”. If the new file is OK, restart Monit by executing “/etc/init.d/monit stop” and “/etc/init.d/monit start” as root user.
Monit needs pid (process id) files that store the Linux process numbers of the processes it has to supervise. In OpenSim.ini you can define the location where Opensim stores a pid file. I use the following setting, that creates different pid files for each region in the /tmp directory:
PIDFile = “/tmp/REGION_NAME.pid”
The Monit user interface is a web based user interface. It can be accessed using the following URL: http://<server name>:2812/ For sure port 2812 needs to be reachable from the Internet, if you intend to provide the Opensim service management to external users.
Monit is password protected and can also use SSL. Especially if you intend to manage servers over the Internet you should use SSL. To setup SSL execute the following commands as root user:
$ apt-get install ssl-cert
$ mkdir /etc/apache2/ssl
$ /usr/bin/sbin/make-ssl-cert /usr/bin/ssl-cert/ssleay.cnf /etc/apache2/ssl/apache.pem
Then do the following changes in /etc/monit/monitrc and restart Monit:
ssl enable
pemfile /etc/apache2/ssl/apache.pem
After this Monit can only be accessed using SSL: https://<server name>:2812/
All Opensim and Freeswitch processes run within a Screen environment (http://linux.die.net/man/1/screen), which allows to run Opensim processes like a server, while the user is still able to connect a terminal to the process to read outputs and to execute commands. “screen -ls” lists all sessions the user can connect to. “screen -r <session>” connect the current terminal with the given session. The session can be left without terminating the process by pressing ctrl-a ctrl-d. ctrl-c kills the process and should be avoided. Use the Opensim command “shutdown” to shutdown an Opensim process instead.
The following scripts are required by Monit and are stored in the “ServiceManagement” directory. You need to adjust the directory paths for your installation.
startbos – Script to start Opensim in the terminal window for testing (without Monit and Screen). The region name must be provided as parameter.
#!/bin/sh
cd /home/opensim/opensim/run/$1/
mono ./OpenSim.32BitLaunch.exe -gridmode=true -smtag=$1
startqos – Script used by Monit to start Opensim (creates pid file). The region name must be provided as parameter.
#!/bin/sh
export PATH=”/home/opensim/bin/mono/bin:$PATH”
export PKG_CONFIG_PATH=”/home/opensim/bin/mono/lib/pkgconfig:$PKG_CONFIG_PATH”
export MANPATH=”/home/opensim/bin/mono/share/man:$MANPATH”
export MONO_THREADS_PER_CPU=80
cd /home/opensim/opensim/run/$1/
screen -S $1 -d -m mono ./OpenSim.32BitLaunch.exe -gridmode=true -smtag=$1 &
Comment: Because Monit does not start the process using a bash shell, it is necessary to specify the Mono settings.
stopos – Script that is used by Monit to stop an Opensim process. The region name and the http port number must be provided as parameters. This script uses the “stopsoftos” script shown afterwards.
#!/bin/sh
echo $1: stopping process
[ -e /tmp/$1.pid ] || exit 0
OPID=`cat /tmp/$1.pid`
/home/opensim/opensim/ServiceManagement/stopsoftos $2 &
sleep 90
PID=`cat /tmp/$1.pid`
if [ “$PID” = “$OPID” ]; then
kill -KILL $PID
rm /tmp/$1.pid
fi
Comment: A hard process kill is not done if the process has been restarted in the meantime since that script was invoked.
stopsoftos – Script that is used by the “stopos” script to shutdown Opensim processes softly. The http port number must be provided as parameter. This script uses the “broadcastos” and “shutdownos” Python scripts to send warning messages to users and to shut down Opensim.
#!/bin/sh
/home/opensim/opensim/ServiceManagement/broadcastos -s http://localhost:$1 -p <password> -m “This region will restart in 1 minute! Please leave now!” &
sleep 30
/home/opensim/opensim/ServiceManagement/broadcastos -s http://localhost:$1 -p <password> -m “This region will restart in 30 seconds! Please leave now!” &
sleep 30
/home/opensim/opensim/ServiceManagement/shutdownos -s http://localhost:$1 -p <password> &
broadcastos – Python script that sends messages to Opensim users. The http port number must be provided as parameter.
#!/usr/bin/python
# -*- encoding: utf-8 -*-
import ConfigParser
import xmlrpclib
import optparse
import os.path
if __name__ == ‘__main__’:
parser = optparse.OptionParser()
parser.add_option(‘-s’, ‘–server’, dest = ‘server’, help = ‘URI of the region server’, metavar = ‘SERVER’)
parser.add_option(‘-p’, ‘–password’, dest = ‘password’, help = ‘password of the region server’, metavar = ‘PASSWD’)
parser.add_option(‘-m’, ‘–message’, dest = ‘message’, help = ‘message to broadcast’, metavar = ‘MSG’)
(options, args) = parser.parse_args()
server = options.server
password = options.password
message = options.message
gridServer = xmlrpclib.Server(server)
res = gridServer.admin_broadcast({‘password’: password, ‘message’: message})
if res[‘success’] == ‘true’:
print ‘message was sent to %s’ % server
else:
print ‘sending message to %s failed’ % server
shutdown – Python script that shuts down Opensim server processes. The http port number must be provided as parameter.
#!/usr/bin/python
# -*- encoding: utf-8 -*-
import ConfigParser
import xmlrpclib
import optparse
import os.path
if __name__ == ‘__main__’:
parser = optparse.OptionParser()
parser.add_option(‘-s’, ‘–server’, dest = ‘server’, help = ‘URI of the region server’, metavar = ‘SERVER’)
parser.add_option(‘-p’, ‘–password’, dest = ‘password’, help = ‘password of the region server’, metavar = ‘PASSWD’)
(options, args) = parser.parse_args()
server = options.server
password = options.password
gridServer = xmlrpclib.Server(server)
res = gridServer.admin_shutdown({‘password’: password})
if res[‘success’] == ‘true’:
print ‘shutdown of %s initiated’ % server
else:
print ‘shutdown of %s failed’ % server
rmpidsos – Script to clean up pid files after shutting down all Opensim server processes.
#!/bin/sh
rm -f /tmp/*.pid
clearos – Script to clean the ~/.wapi/ directory and the ScriptEngines caches of the current Opensim installation. This fixes problems with Mono and cached scripts. It is a good practice to execute this command after each update.
#!/bin/sh
rm -r /home/opensim/.wapi/
rm -r /home/opensim/opensim/run/*/ScriptEngines/*
Monit Configuration File
Finally here is an example of my /etc/monit/monitrc file that I use for monitoring Opensim and Freeswitch processes.
If you change /etc/monit/monitrc, always run “monit -t” afterwards to check the file for errors. If the file is correct, restart Monit by executing “/etc/init.d/monit restart”.
The memory limits depend on the kind of region (high, medium or low traffic). Beside processor utilization and memory consumption each Opensim process is checked regularly by sending requests to the http port of that Opensim server process. Only in seldom cases crashes cannot be detected this way.
If all limits are optimized for each sim, the regions should run very smoothly and restart only about every 3 or 4 days automatically, most often because the memory limit has been reached. This way Opensim service monitoring is done mostly automatically.
As you can also recognize, Opensim is run on my servers under a special user “opensim”. This is good practice to reduce security risks.
###############################################################################
## Monit control file
###############################################################################
##
## Comments begin with a ‘#’ and extend through the end of the line. Keywords
## are case insensitive. All path’s MUST BE FULLY QUALIFIED, starting with ‘/’.
##
## Bellow is the example of some frequently used statements. For information
## about the control file, a complete list of statements and options please
## have a look in the monit manual.
##
##
###############################################################################
## Global section
###############################################################################
##
## Start monit in background (run as daemon) and check the services at 1-minute
## intervals.
#
set daemon 60
#
#
## Set syslog logging with the ‘daemon’ facility. If the FACILITY option is
## omited, monit will use ‘user’ facility by default. You can specify the
## path to the file for monit native logging.
#
# set logfile syslog facility log_daemon
#
#
## Set list of mailservers for alert delivery. Multiple servers may be
## specified using comma separator. By default monit uses port 25 – it is
## possible to override it with the PORT option.
#
# set mailserver mail.bar.baz, # primary mailserver
# backup.bar.baz port 10025, # backup mailserver on port 10025
# localhost # fallback relay
#
#
## By default monit will drop the event alert, in the case that there is no
## mailserver available. In the case that you want to keep the events for
## later delivery retry, you can use the EVENTQUEUE statement. The base
## directory where undelivered events will be stored is specified by the
## BASEDIR option. You can limit the maximal queue size using the SLOTS
## option (if omited then the queue is limited just by the backend filesystem).
#
# set eventqueue
# basedir /var/monit # set the base directory where events will be stored
# slots 100 # optionaly limit the queue size
#
#
## Monit by default uses the following alert mail format:
##
## –8<–
## From: monit@$HOST # sender
## Subject: monit alert — $EVENT $SERVICE # subject
##
## $EVENT Service $SERVICE #
## #
## Date: $DATE #
## Action: $ACTION #
## Host: $HOST # body
## Description: $DESCRIPTION #
## #
## Your faithful employee, #
## monit #
## –8<–
##
## You can override the alert message format or its parts such as subject
## or sender using the MAIL-FORMAT statement. Macros such as $DATE, etc.
## are expanded on runtime. For example to override the sender:
#
# set mail-format { from: monit@foo.bar }
#
#
## You can set the alert recipients here, which will receive the alert for
## each service. The event alerts may be restricted using the list.
#
# set alert sysadm@foo.bar # receive all alerts
# set alert manager@foo.bar only on { timeout } # receive just service-
# # timeout alert
#
#
## Monit has an embedded webserver, which can be used to view the
## configuration, actual services parameters or manage the services using the
## web interface.
#
set httpd port 2812
ssl enable
pemfile /etc/apache2/ssl/apache.pem
allow admin:password # require user ‘admin’ with password ‘monit’
#
#
###############################################################################
## Services
###############################################################################
##
## Check the general system resources such as load average, cpu and memory
## usage. Each rule specifies the tested resource, the limit and the action
## which will be performed in the case that the test failed.
#
check system ubuntu823294.aspadmin.net
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if memory usage > 75% then alert
if cpu usage (user) > 70% then alert
if cpu usage (system) > 30% then alert
if cpu usage (wait) > 20% then alert
#
#
## Check a file for existence, checksum, permissions, uid and gid. In addition
## to the recipients in the global section, customized alert will be send to
## the additional recipient. The service may be grouped using the GROUP option.
#
# check file apache_bin with path /usr/local/apache/bin/httpd
# if failed checksum and
# expect the sum 8f7f419955cefa0b33a2ba316cba3659 then unmonitor
# if failed permission 755 then unmonitor
# if failed uid root then unmonitor
# if failed gid root then unmonitor
# alert security@foo.bar on {
# checksum, permission, uid, gid, unmonitor
# } with the mail-format { subject: Alarm! }
# group server
#
#
## Check that a process is running, responding on the HTTP and HTTPS request,
## check its resource usage such as cpu and memory, number of childrens.
## In the case that the process is not running, monit will restart it by
## default. In the case that the service was restarted very often and the
## problem remains, it is possible to disable the monitoring using the
## TIMEOUT statement. The service depends on another service (apache_bin) which
## is defined in the monit control file as well.
#
# Monitor Apache 2 Service
#check process apache with pidfile /var/run/apache2.pid
#start program “/etc/init.d/apache2 start”
#stop program “/etc/init.d/apache2 stop”
#if cpu > 60% for 2 cycles then alert
#if cpu > 80% for 5 cycles then restart
#if totalmem > 200.0 MB for 5 cycles then restart
#if children > 250 then restart
#if loadavg(5min) greater than 10 for 8 cycles then stop
#if failed host metaverse.getmyip.com port 80 protocol http
# then restart
#if failed port 443 type tcpssl protocol http
# with timeout 15 seconds
#then restart
#if 3 restarts within 5 cycles then timeout
#group server
#
# Monitor MySQL Service
check process mysql with pidfile /var/run/mysqld/mysqld.pid
group database
start program “/etc/init.d/mysql start”
stop program “/etc/init.d/mysql stop”
if failed host 127.0.0.1 port 3306 then restart
if 5 restarts within 5 cycles then timeout
#
# Monitor ssh Service
check process sshd with pidfile /var/run/sshd.pid
start program “/etc/init.d/ssh start”
stop program “/etc/init.d/ssh stop”
if failed port 22 protocol ssh then restart
if 5 restarts within 5 cycles then timeout
#
# Freeswitch
check process freeswitch with pidfile “/usr/local/freeswitch/log/freeswitch.pid”
start program “/usr/bin/screen -S freeswitch -d -m /usr/local/freeswitch/bin/freeswitch -nf”
stop program “/usr/local/freeswitch/bin/freeswitch -stop”
if totalmem > 40.0 MB then alert
if totalmem > 50.0 MB for 3 cycles then restart
# Checks sip port on localhost, not always suitable
# if failed port 5060 type UDP then restart
# Checks mod_event_socket on localhost. Maybe more suitable
if failed port 8021 type TCP then restart
if 5 restarts within 5 cycles then timeout
#
# Monitor mono opensim Service for H1
check process opensim_H1 with pidfile /home/opensim/opensim/ServiceManagement/H1.pid
start program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/startqos H1”
stop program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/stopos H1 9010”
if totalmem > 900 Mb then alert
if totalmem > 1100 Mb then restart
if cpu usage > 20% then alert
if cpu usage > 24% for 3 cycles then restart
if failed host localhost port 9010 send “GET /SStats/ HTTP/1.0\r\nHost: localhost\r\n\r\n” expect “<!DOCTYPE html .*” within 5 cycles then restart
if 5 restarts within 5 cycles then timeout
#
# Monitor mono opensim Service for H2
check process opensim_H2 with pidfile /home/opensim/opensim/ServiceManagement/H2.pid
start program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/startqos H2”
stop program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/stopos H2 9011”
if totalmem > 800 Mb then alert
if totalmem > 1000 Mb then restart
if cpu usage > 20% then alert
if cpu usage > 24% for 3 cycles then restart
if failed host localhost port 9011 send “GET /SStats/ HTTP/1.0\r\nHost: localhost\r\n\r\n” expect “<!DOCTYPE html .*” within 5 cycles then restart
if 5 restarts within 5 cycles then timeout
#
# Monitor mono opensim Service for M3
check process opensim_M3 with pidfile /home/opensim/opensim/ServiceManagement/M3.pid
start program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/startqos M3”
stop program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/stopos M3 9012”
if totalmem > 700 Mb then alert
if totalmem > 900 Mb then restart
if cpu usage > 20% then alert
if cpu usage > 24% for 3 cycles then restart
if failed host localhost port 9012 send “GET /SStats/ HTTP/1.0\r\nHost: localhost\r\n\r\n” expect “<!DOCTYPE html .*” within 5 cycles then restart
if 5 restarts within 5 cycles then timeout
#
# Monitor mono opensim Service for M4
check process opensim_M4 with pidfile /home/opensim/opensim/ServiceManagement/M4.pid
start program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/startqos M4”
stop program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/stopos M4 9013”
if totalmem > 600 Mb then alert
if totalmem > 800 Mb then restart
if cpu usage > 20% then alert
if cpu usage > 24% for 3 cycles then restart
if failed host localhost port 9013 send “GET /SStats/ HTTP/1.0\r\nHost: localhost\r\n\r\n” expect “<!DOCTYPE html .*” within 5 cycles then restart
if 5 restarts within 5 cycles then timeout
#
# Monitor mono opensim Service for M5
check process opensim_M5 with pidfile /home/opensim/opensim/ServiceManagement/M5.pid
start program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/startqos M5”
stop program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/stopos M5 9014”
if totalmem > 900 Mb then alert
if totalmem > 1100 Mb then restart
if cpu usage > 20% then alert
if cpu usage > 24% for 3 cycles then restart
if failed host localhost port 9014 send “GET /SStats/ HTTP/1.0\r\nHost: localhost\r\n\r\n” expect “<!DOCTYPE html .*” within 5 cycles then restart
if 5 restarts within 5 cycles then timeout
#
# Monitor mono opensim Service for M6
# check process opensim_M6 with pidfile /home/opensim/opensim/ServiceManagement/M6.pid
# start program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/startqos M6”
# stop program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/stopos M6 9015”
# if totalmem > 600 Mb then alert
# if totalmem > 800 Mb then restart
# if cpu usage > 20% then alert
# if cpu usage > 24% for 3 cycles then restart
# if failed host localhost port 9015 send “GET /SStats/ HTTP/1.0\r\nHost: localhost\r\n\r\n” expect “<!DOCTYPE html .*” within 5 cycles then restart
# if 5 restarts within 5 cycles then timeout
#
## Check the device permissions, uid, gid, space and inode usage. Other
## services such as databases may depend on this resource and automatical
## graceful stop may be cascaded to them before the filesystem will become
## full and the data will be lost.
#
# check device datafs with path /dev/sdb1
# start program = “/bin/mount /data”
# stop program = “/bin/umount /data”
# if failed permission 660 then unmonitor
# if failed uid root then unmonitor
# if failed gid disk then unmonitor
# if space usage > 80% for 5 times within 15 cycles then alert
# if space usage > 99% then stop
# if inode usage > 30000 then alert
# if inode usage > 99% then stop
# group server
#
#
## Check a file’s timestamp: when it becomes older then 15 minutes, the
## file is not updated and something is wrong. In the case that the size
## of the file exceeded given limit, perform the script.
#
# check file database with path /data/mydatabase.db
# if failed permission 700 then alert
# if failed uid data then alert
# if failed gid data then alert
# if timestamp > 15 minutes then alert
# if size > 100 MB then exec “/my/cleanup/script”
#
#
## Check the directory permission, uid and gid. An event is triggered
## if the directory does not belong to the user with the uid 0 and
## the gid 0. In the addition the permissions have to match the octal
## description of 755 (see chmod(1)).
#
# check directory bin with path /bin
# if failed permission 755 then unmonitor
# if failed uid 0 then unmonitor
# if failed gid 0 then unmonitor
#
#
## Check the remote host network services availability and the response
## content. One of three pings, a successfull connection to a port and
## application level network check is performed.
#
# check host myserver with address 192.168.1.1
# if failed icmp type echo count 3 with timeout 3 seconds then alert
# if failed port 3306 protocol mysql with timeout 15 seconds then alert
# if failed url
# http://user:password@www.foo.bar:8080/?querystring
# and content == ‘action=”j_security_check”‘
# then alert
#
#
###############################################################################
## Includes
###############################################################################
##
## It is possible to include the configuration or its parts from other files or
## directories.
#
# include /etc/monit.d/*
#
#
Practical Hints using Monit
To shut down a region I usually disable monitoring of that region in the Monit user interface. Then I open a console window and connect to that Opensim server process using “screen -r <region name>”. Then I check if there are people using “show users”. If there are users, I send a warning message using “alert general <message>”, until finally I shut down the region using the “shutdown” command. That also closes the Screen session.
This gives me more control of the shutdown process and is faster if nobody is in that region. Otherwise Monit sends warning messages and waits with the shutdown to give people time to leave the region. But for sure you can also use the Monit “stop” and “restart” buttons, what definitively is more convenient.
To restart a region I simply click the “start” button in Monit. Often I check how Opensim restarts by opening the corresponding Screen session in a terminal window. At the end I disconnect from the Screen session by pressing ctrl-a ctrl-d.
If Freeswitch runs as root user, you need to use Screen as root user to be able to connect to it.
Database Backups using AutoMySQLBackup
I use AutoMySQLBackup (http://www.debianhelp.co.uk/mysqlscript.htm) for daily database backups for the last 7 days. In the script that this tool uses you need to specify the MySQL user name, password and names of the databases to back up. I use the directory ~/backups of my opensim user to store database backups. Finally add that script to your user’s crontab:
$ crontab -e
If a region has serious problems and if it looks like that the database contents of a region has been damaged, I restore the database contents of that region. For sure the corresponding regions needs to be shut down while a database backup is installed.