How to Keep Physics Engine Load Low

What is ODE?

Today most OpenSim installations use Open Dynamics Engine “ODE” as physics engine. It is delivered as integrated part of current OpenSim versions. It is used to simulate a (more or less) realistic physical environment in the virtual world. The physical behavior of physical objects, like some vehicle types, and avatars is calculated by ODE.

Under certain circumstances the physics engine can be very busy and that physics engine load can be experiences as server lag. This article tries to describe how to identify physics engine lag and how to improve your regions to cause less such lag.

Additionally to the server lag you experience much physics engine activity might cause that OpenSim consumes more and more memory over time, due to memory leaks. This may cause lower stability and more frequent restarts of your OpenSim installation.

How to identify physics engine lag?

Enter “show stats” on your OpenSim console window. After that you will see some output like this:

ASSET STATISTICS
Asset cache contains        0 assets
Latest asset request time after cache miss: 0s
Blocked client requests for missing textures: 0
Asset service request failures: 0

CONNECTION STATISTICS
Abnormal client thread terminations: 0

INVENTORY STATISTICS
Initial inventory caching failures: 0

FRAME STATISTICS
Dilatn  SimFPS  PhyFPS  AgntUp  RootAg  ChldAg  Prims   AtvPrm  AtvScr  ScrLPS
  1.00      55    45.5     0.0       0       0    3516       0     419      14

PktsIn  PktOut  PendDl  PendUl  UnackB  TotlFt  NetFt   PhysFt  OthrFt  AgntFt  ImgsFt
    27      10       0       0       0     5.3     0.0     4.3     0.0     0.0     0.0

MEMORY STATISTICS
Allocated to OpenSim : 152 MB

Have a look at the value shown under the label PhysFt. This number is the so called Physics Frame Time. If that value is high, above 20 or even higher,  your physics engine is quite busy. Best run “show stats” multiple times and check if you always get more or less the same PhysFt values or if you see peaks of physical engine activity.

If you see such peaks, even if nobody is currently visiting your region (use “show users” command on the OpenSim console to check that), then you probably have moving objects causing physics engine lag. Otherwise it is possible that moving avatars cause such lag, for example by walking on prim surfaces or by colliding with other objects or avatars.

The PhyFPS value shows the Physics Engine Frame Rate. Values above 45 fps are good. Usually high PhysFt values cause low PhyFPS frame rate values.

You can also display the Physics Engine Frame Rate in the viewer. Select the menu item “View > Statistics Bar”. The Statistics windows opens and shows the Physics FPS value under the label Simulator. If you cannot see it, simply click on the label Simulator.

Reasons for physics engine lag

In the following situations the physics engine has to do calculations:

  • Movements of physical objects
    • unscripted physical objects (i.e. a ball)
    • scripted physical objects (i.e. vehicles implemented as physical objects)
  • Movements of avatars
    • avatar walking over terrain
    • avatar walking over a prim surface (normal prims or sculpties)
  • Collisions between non phantom objects and avatars
    • avatars colliding with each other
    • non phantom objects colliding
    • avatar colliding with a non phantom object

Many physical movements require the detection of collisions. Collisions are detected based on so called Bounding Boxes, rectangular boxes using the X, Y and Z axis and that contain the whole object, avatar or segment of terrain.

You can toggle displaying these internal Bounding Boxes in the viewer using the following menu item in the Advanced menu (which can you open using ctrl alt D):  Advanced > Rendering > Info Displays > BBoxes

Bounding Boxes

How to reduce physics engine lag?

To reduce physics engine lag you should do the following:

  • Try to avoid physical objects (i.e. use vehicles that are not physical objects)
  • Never create huge physical objects, because they easily cause many collision events
  • Make objects phantom if possible (collision detection ignores phantom objects)
  • Make moving/rotating objects phantom
  • Use llTargetOmega and llSetTextureAnim for viewer side rotations/animations
  • Do not use hollow prims with the intention to let other objects or avatars move inside
  • Do not use too complex, rough sculpty prims and terrains as walking surface for avatars

The general rule is to try to avoid collisions of Bounding Boxes as much as possible to reduce physics engine load.

By the way, that is also the reason why avatar attachments are always phantom objects.

One of my renters could reduce Physics Engine Frame Rate values from about 50 with peaks up to 150 to much lower values of about 4. The main reason was a hollow prim with a scripted vehicle constantly moving inside. This did cause physics engine lag. But a bigger issue was the constantly growing memory consumption (memory leak) and thus more frequent region restarts than necessary.

Appendix

This is a short description of the more cryptic labels of the “show stats” console output:

  • Dilatn time dilation
  • SimFPS sim FPS
  • PhyFPS physics FPS
  • AgntUp # of agent updates
  • RootAg # of root agents
  • ChldAg # of child agents
  • Prims # of total prims
  • AtvPrm # of active prims
  • AtvScr # of active scripts
  • ScrLPS # of script lines per second
  • PktsIn # of in packets per second
  • PktOut # of out packets per second
  • PendDl # of pending downloads
  • PendUl # of pending uploads
  • UnackB # of unacknowledged bytes
  • TotlFt total frame time
  • NetFt net frame time
  • PhysFt physics frame time
  • OthrFt other frame time
  • AgntFt agent frame time
  • ImgsFt image frame time

Snoopy Pfeffer
Founder and CEO of Dreamland Metaverse

snoopy.pfeffer@yahoo.com
http://www.3dmetaverse.com/

Pro and Cons of Virtual Private Servers (VPS) for OpenSim

When you have a look at the OpenSim hosting market today, you quickly recognize two kinds of hosting providers. One group of providers offers OpenSim based on dedicated server hardware, while the other group of providers offer OpenSim on so called Virtual Private Servers (VPS), where many such virtual servers run on a single hardware server. The second group of providers mainly tries to sell OpenSim through low prices.

The question is, if Virtual Private Servers (VPS) can be a cheap alternative to dedicated servers for OpenSim or in which situations it might be an alternative.

Virtual Private Servers

VPS is a technology that was developed for cheap web hosting by being able to share expensive hardware resources by as many web servers are possible. Compared to basic web hosting services, it additionally allows the web site provider to use some advanced web hosting features, like server side scripts or to use of database storage. This are useful features for better looking and more feature rich web sites.

VPS bases on the assumption that web sites of smaller and medium sized institutions usually do not get traffic most of the time. During these times the processes of such web sites can be put asleep and the data is temporarily
offloaded on slow hard drives to free main memory for currently active web sites. This allows many web sites to share the same physical resources (CPU, memory, bandwidth), which keeps costs low.

VPS providers apply a so called Fair Use Principle. If one hosting customer uses too many ressources (CPU, memory or bandwidth) they take measures to ensure that the overuse of one customer does not affect the service quality of other customers. This can mean that the available bandwidth of that customer is cut back or that the processing power is reduced to ensure nobody else is affected negatively by that overuse.

Now the question is, if VPS as suitable for OpenSim or in which cases VPS might be an alternative to host OpenSim.

OpenSim processes, like processes of other real time applications like voice-over-ip or video conferencing never sleep like web server processes. Even if there is no visitor in an OpenSim region, a typical OpenSim process anyway constantly consumes between 0.5 and 3% processing power of a processor core.

This permanent use of processing power is untypical for web server applications VPS was mainly developed for. Depending on the kind of visualization software used, such processes are anyway put asleep from time to time, to avoid negative impact on other VPS instances, although these processes should be kept running. This is done to enforce the Fair Use Principly and to ensure that other VPS instances on the same hardware server get enough computing power and physical main memory.

As a consequence OpenSim users experience lag, because of regular storing and loading of data to slow hard drives and because of the overhead and delays of constantly having to start and stop processes.

These negative effects increases the more users are on one hardware server providing VPS services. And it gets even worse if there are more such processes overusing resources on the same server hardware. Such processes with permanent resource demands begin to fight for the sparse hardware resources, because the basic assumption that most (web server) processes can sleep and offload data most of the time is not true anymore.

Bandwidth can be another issue, because OpenSim creates peaks of data traffic. Depending on the virtualization technology used, this also causes issues, if the peak bandwidth is limited. That can cause crashes and instabilities during logins and teleports.

Because VPS tries to store data on hard drives as often as possible to safe main memory, the additional time needed to load data from that slow storage cause additional lag. The probability of such delays increases the more memory an OpenSim process uses. Bigger regions with many prims, scripts and visitors quickly become unusable because of the permanent swapping or paging of memory needed by these OpenSim process.

This effect also increases the probability that Mono threads block or crash completely, if OpenSim runs on top of Mono. This causes instabilities and more frequent crashes of OpenSim regions.

What a VPS Hosting Company Says

I had a long discussion with the CTO of a big VPS hosting company and he did clearly say: “VPS is not suitable for any real time application [voice, video streaming or virtual worlds].”

He said once they did host a voice conference on a VPS server for a customer, but that this was just done for a marketing show. In reality that customer was the only one on the hardware server used, to ensure high service quality. So in fact that customer did use a dedicated server for that marketing campaign, although it was said that it’s a virtual server.

Summary

VPS is not suitable for regions with much contents (prims and scripts) or if many visitors need to be supported. Beside that the service quality is substantially lower: Visitors experience more lag, less stability and more frequent region crashes.

On the other hand, if someone just needs a small region with not too many prims and scripts and if just 2 or maximum 3 visitors are enough, then VPS offers a cheap alternative for beginners. But most region owners will quickly reach the limit and will want to be able to add more contents to their region, without having to experience a quickly degrading service quality.

For professional users, like corporations, shop owners, dance club owners, etc. VPS is definitive no alternative for professional OpenSim hosting using dedicated server hardware.

Dedicated Servers

Dedicated servers are the best choice for bigger regions and regions that have to support many visitors.

The prices of OpenSim regions hosted on dedicated servers very much depend on the processing power, main memory and network bandwidth provided. Good offers for beginners start with 512 MB main memory per region, while the typical average users need up to 1 GB. High end users that use many prims, scripts and have many visitors need up to 2 GB of memory for one region.

General Advices

In any case be careful if an OpenSim hosting providers promises you very high numbers of prims, scripts and visitors, especially if no information about the underlying hardware platform is given or if you see that the amount of memory, the included network bandwidth or processor seem not to be sufficient for what they promise.

At the end it only counts what you get and not what they did promise you, despite how low their price is!

You should also carefully check what is included in the hosting service package. Do they ensure high availability with professional 7×24 service monitoring? What kind of web based or in-world management tools are provided? What about backups? Are optional features like voice, search, groups, off-line messages, money, HyperGrid and mega regions included?

Data backups are a must and should include daily backups and on demand backups to protect you from data loss. Database backups are the most important, because they store all important data, but customers should also be able to get backups of their region contents as OAR archive files.

Customers should also get web based and/or in-world tools to manage their regions (display status, restart, etc.) and in-world monitoring panels for regions.

Finally and most important check if they have good references for their hosting services and for customer service? Service quality very much depends on how professionally OpenSim is managed and run. Good references are an indication if a hosting provider does his job well.

Currently many OpenSim hosting companies out there have problems to deliver what the promise. With good references the chances are higher that you will be happy customer, using good OpenSim hosting services for a reasonable price.

Snoopy Pfeffer

CEO & Founder of Dreamland Metaverse
snoopy.pfeffer@yahoo.com
http://www.dreamlandmetaverse.com/

“Life 2.0” Documentary

PalmStar Entertainment and Andrew Lauren Productions did announce that “Life 2.0” (life2movie.com), directed by Jason Spingarn-Koff, has been accepted into the 2010 Sundance Film Festival. This film is a documentary about the impact of virtual communities on the world.

While most of the publications about virtual worlds focus on technology and business, this documentary puts a spotlight on the social and personal impacts virtual world communities can have on people.

“This feature length documentary follows a group of people whose lives are dramatically transformed by a virtual world, Second Life – reshaping relationships, identities, and ultimately the very notion of reality.”

Movie trailer:
http://www.life2movie.com/

Further information:
http://www.palmstar.com/2009/12/03/life-20-accepted-into-sundance-film-festival-2010/

Film Credits:
Presented by Andrew Lauren Productions and PalmStar Entertainment
Produced, Directed, and Edited by Jason Spingarn-Koff
Producers: Andrew Lauren and Stephan Paternot
Co-Producer: Jonathan Shukat
Director of Photography and Consulting Producer: Dan Krauss
Music by Justin Melland
Additional Editor: Shannon Kennedy

PayPal Money Module



After Adam Frisby did announce his DTL PayPal Money Module for OpenSim about a month ago, which got much positive feedback, I did test it intensively and I did extend it’s functionality. Last week Adam and I did decide, that it is the best if I continue to work on it using a fork of his GIT repository.

You can find that GIT repository with the latest version of the PayPal money module for OpenSim here:  http://github.com/SnoopyPfeffer/Mod-PayPal This fork has been renamed from DTL-PayPal to Mod-PayPal to avoid confusion.

Functional Extensions and Bug Fixes

  • User2User pay, User2Object pay, User2Object purchases and User2Land purchases work properly, now
  • User2User payments are confirmed with instant messages to senders and receivers
  • Support for group owned objects and land can be enabled (default: off); requirement: PayPal email accounts have to be defined per group
  • Underscores in email addresses do not cause error messages during OpenSim startup anymore
  • Object and land purchases for US$ 0 do not initiate a PayPal transaction anymore
  • Object purchases properly transfer the purchased contents to the buyer, now
  • User email addresses are loaded from the local OpenSim.ini file; invalid email adresses are ignored, but loading is not aborted anymore
  • User email addresses can also be fetched from the Users grid service, if that feature is enabled (default: off); fetched email addresses are cached by the region server until region restart; local email addresses always take precedence, to ensure an acceptable level of security
  • Locally defined user email addresses should be used for all users that receive bigger amounts of money within a region (i.e. shop owners); the reason is the much higher security, if these email addresses are not fetched from the Users grid service; beside that it is possible to locally use a special PalPal micropayment account for all shop transactions to save fees, instead of a standard PayPal account used elsewhere
  • Group email addresses can be defined per group UUID in the local ini file to support group owned objects and land as mentioned before
  • Non existent user or group email addresses are handled by showing warning messages, instead of error messages for crashed code
  • Added start and success log messages for all kinds of PayPal transactions
  • Rounding problems made it impossible to pay certain amounts (like US$ 1.23); now all amounts work and are displayed nicely

Interesting findings

  • PayPal can send money to all email accounts, even to email accounts not registered at PayPal yet; thus it is not necessary to handle that case
  • PayPal micropayment accounts are not available in all countries worldwide

To Do List

  • Web pages that are shown after successful or cancelled PayPal transactions
  • Locking for PayPal transactions while buying land or original objects
  • Events or functions that allow to lock vendors while a PayPal transaction is in progress
  • Too small amounts that do not cover the PayPal transaction fees produce a misleading error message on the PayPal web page; maybe it would be good to be able to define a minimum amount globally and/or per email address to avoid that
  • It would be good if a scripter using llGiveMoney gets a warning message, that this function is not supported by the PayPal money module
  • User2User payments are confirmed with instant messages to the sender and receiver of money, but only if these users are online at that time; maybe it is possible to improve that, so that even users offline get such confirmation messages as stored offline messages

Installation of the PayPal Money Module

  • Install the addon-modules/mod-paypal under OpenSim/Region/OptionalModules/
  • Install the files-for-bin-dir in the OpenSim bin folder
  • Add the additional config settings in the config-include subfolder or in the OpenSim.ini file in your bin folder
  • After that compile OpenSim

If you find bugs or if you have ideas for the future development of this module, please send me an email (snoopy.pfeffer@yahoo.com). Thank!

Hosted Regions in OSGrid

My company Dreamland Metaverse offers high quality regions in OSGrid and standalone regions based on open source OpenSim virtual world servers. Our goal is to provide our renters a good service for a fair price.

Since over 1.5 years our regions in OSGrid are known for their high reliability, using the latest stable OpenSim versions. Our good references show, that we are able to maintain high service quality.

Many residents have chosen our regions (like the shopping mall regions Samsara and Snoopies) as their login locations, because our regions run more reliably than the main OSGrid plazas.

Because we focus on quality, we have chosen the following server configuration:

  • We focus on best quality regions for OSGrid and as standalone regions
  • Dedicated, high performance Intel Quad-Core Xeon servers optimized for OpenSim
  • Enough computing power, memory and bandwidth for a low lag experience
  • Servers located close to the central OSGrid servers for higher performance
  • Service monitoring tools for 7×24 availability
  • Daily data backups and redundant RAID 1 hard drives
  • Import and export of region contents (OAR archives)
  • All region owners get estate manager rights
  • Land can be split, joined, subleased and sold
  • Web portal with service management tool for our customers
  • Search, groups, offline messages, voice and money (PayPal), HyperGrid

We offer the following region types, where the minimum renting duration is 3 months.

  • Basic – OSGrid or Standalone Region
    Regions mainly intended for landscapes and regions with low building density

    8’000 prims, 10 visitors, shared Intel quadcore, 512 MB memory
    no setup costs; monthly rent: US$ 30
  • Residential – OSGrid or Standalone Region
    Best solution for residential land – with the capacity you need for a typical region
    12’000 prims, 40 visitors, shared Intel quadcore, 1 GB memory
    no setup costs; monthly rent: US$ 45
  • Professional – OSGrid or Standalone Region
    Regions for business and event areas – allows many prims, scripts and visitors
    16’000 prims, 80+ visitors, dedicated Intel core, 2 GB memory
    no setup costs; monthly rent: US$ 90

If you are interested, you can order a region on our web site. I am happy to answer any questions.

Dreamland Metaverse
Region Hosting & Consulting

Snoopy Pfeffer
snoopy.pfeffer@yahoo.com
www.dreamlandmetaverse.com

The Costs of Free Resources

We all like it: When we get something for free!

But the non existence of costs has also disadvantages: We tend to ignore the resources that are associated with what we use. This is simply the case, because there are no costs for us, that otherwise would provide an incentive to make a more economical use of resources.

In OSGrid users like that it is possible to upload textures, animations and sound files for free. Additionally the users do not have to pay for activities like creating new groups. To make it clear: I fully back these policies, because they support the development of OSGrid.

And at this point we should all  thank the donators that cover the costs of the central infrastructure of OSGrid and that help keep it running so well every day. For sure you all know about the new “donate” button on the new Elgg based OSGrid web site.

But the missing incentives to save resources has negative effects, that over time increase the costs for central grid services unnecessarily, reduce the performance and lessen usability in certain areas.

In the following paragraphs I pick two examples to describe these issues and how these could be solved by implementing policies and tools. With this article I want to initiate a discussion about these issues, before we experience a future rapid grows of free Opensim grids.

A)  Asset Usage

People tend to upload textures, animations and sound files even if they exist within the world already. From the user point of view the costs – the time needed to check if for example a texture is available already – is higher than simply uploading it once more. Such duplicates inflate the space required for the asset database and asset caches. Additionally duplicates increase the data that needs to be transferred and thus reduces Opensim performance.

Additionally people tend to upload multiple versions of an asset to test it in world until they are happy with the result. At the end they just use the last version. Such unused assets, assets that are not referenced anymore, still consume space in the asset database that could be freed.

Possible Solution

One solution would be to introduce incentives for a more economical use of these resources. Second Life does this with their L$10 uploading fee.

But even better would be, if Opensim is able find duplicates and unused assets. Alternatively this can be done on the fly, by using some Opensim extension, or by using database scripts that are run regularly, as part of an asset database maintenance task. The goal is to remove all unreferenced assets and to replace duplicate assets with a single copy, which might include the replacement of the corresponding UUIDs.

To efficiently find duplicates, data block sizes and hash values can be used. For sure only assets that cannot be changed by individual users separately can be replaced by a single asset copy. This is especially true for uploaded textures, animations and sound files, that cannot be changed within the viewer. Notecards and scripts, that can be changed in world, and other changeable assets cannot be replaced by a single asset database copy.

In practice OSGrid uses Fragstore to do dupe compression based on 256 bit hash values for nearly 3 million assets, now. The compression achieved is about 25 percent. Currently no deletion of unused assets is done. The detection of unused assets seems to be impossible or at least very difficult with the current asset server architecture Opensim uses.

B) Creating New Groups

Groups have been introduced with Opensim revision 9215, but already today there exist nearly 600 groups in OSGrid. I have had a closer look at the existing groups and saw the following:

  • Most groups just have one member
  • There are even many groups with no members
  • Some groups have very similar names and it looks like that most of them have already been given up for newer groups with a slightly improved title

If the development continues like this, we soon will see many thousand groups, where it is difficult to identify which groups are trash and which are still used. For certain keywords group search already returns many groups with many similar names (search for “star” for example).

Possible Solution

To prevent an exaggerate use  of groups, that also reduces usability, policies and tools that support these policies should be used.

In Second Life people have to pay a fee to create a new group. Additionally the group owner has to recruit more group members within 3 days, otherwise the group gets deleted. And a group must maintain a membership of at least two members all the time in order to remain active.

In OSGrid I think we do not need a fee, but a policy for active groups similar like in Second Life. This means that empty groups and groups with less than 2 members would get deleted at latest 3 days after creation. This is the case for most groups that currently exist in OSGrid.

Additionally it might be useful to check when all group members have logged in the last time. If no group member has logged in during the last 90 days, it can be assumed, that all group members are not active OSGrid residents anymore.

A database script could check all existing groups regularly and delete groups that do not fulfill the previously mentioned conditions. For sure Opensim needs to behaves properly, if a group is encountered that was deleted and if such a group is used by land or objects shared by or deeded to that group.

In the case of shared objects and land, the group defined should simply be removed, making it unshared objects or land again. Deeded objects and land is more tricky, because in that case the group also needs to be deleted and the owner has to be reset to the previous owner before the object or land was deeded. All this should just happen if that group really was deleted. A disabled group module (maybe by accident) should not reset all group memberships.

Well, the last part was a bit technical, but it shows that implementing such a policy is something that needs to be well thought through.

Finally a little history story: As we all know, teleporting between regions also uses resources, especially processing power and bandwidth to transfer data. That is why during Second Life “stone age” they tried to introduce a fee for teleporting between regions. But understandably it did not take long, until users were demanding to abolish this teleporting fee.

I am sure that we will find a solution to prevent an exaggerate use ressources with policies and tools, without having to use money as incentive.

Group Permissions for Land and Objects

A while ago, mcortez and his flotsam project have provided Opensim the source code to add group functionality to Opensim. And after a short interruption, because of security issues that needed to get fixed, it is now part of all newer Opensim versions and available at most OSGrid regions.

The Groups Module allows people to define and join groups. So today we all can use fancy group tags like in Second Life, which are defined as part of the group roles. In addition it is possible to send and receive group instance messages and group notices. Voice conferences can be used if all attendees are on voice enabled regions using the same voice server.

What was still missing were group permissions for land and objects. These allow land and object owners to define the rights other people have to change the contents of their parcels of land or to change objects itself.

This functionality is essential for collaborative building projects, where you want to allow just a specified group of builders to be able to rez or change objects. Till now, you had to allow all people to build on your land and to modify common objects (by tagging “allow anyone to move”). This was quite risky, because griefers could rez objects or even destroy what you have built together.

With today’s Opensim trunk version 9817 group permissions for land and objects have been added to Opensim.

To get group permissions working, I had to add group permission checks to the Opensim Permissions Module. This new functionality is very similar like in Second Life, but currently there are still limitations, because some related functionality has not been implemented in Opensim yet.

For example it is not possible to define object inventory items as being shared. And currently it is not possible to allow group members to edit certain settings of land they do not own in the About Land window. Thus currently it is not wise to deed land to a group, but land shared by a group, with the corresponding rights set in About Land, is a very useful new feature.

Where to get Support for Opensim

Support for Opensim

These web pages describe where you can get support for Opensim:

The best sources for help are the Q&A Sessions each Saturday in OSGrid, the IRC channels #osgrid and #opensim, and the mailing list opensim-users.

For sure many people that you will meet in OSGrid, on the IRC channels or via the mailing lists like to help you to get your own Opensim regions running in OSGrid.

Finally you can find the latest OSGrid announcements on Twitter at http://twitter.com/osgrid, including planned down times, required Opensim updates and other urgent news for OSGrid users.

Samsara

Contact

I hope that this blog helps many people! Comments how to improve these articles and requests for additional topics to cover in the future are welcome. Additional articles will be added over time.

Snoopy Pfeffer
OSGrid and Second Life resident
snoopy.pfeffer@yahoo.com

p.s. I also offer professional services for Opensim and 3D Metaverse development for corporations. This includes strategic consulting, business development, marketing in 3D worlds, edutainment solutions, content creation and IT advisory services.

Viewer for OSGrid and Opensim Worlds

I personally prefer the Hippo 0.5.1 viewer for OSGrid, because it does not have some restrictions the standard Second Life viewer has and it allows you to easily connect to different Opensim and Second Life grids. You can find more details on the Hippo web site (http://opensim-viewer.sourceforge.net/).

If you are interested in using the standard Second Life viewer for OSGrid, you can find information about this topic on the OSGrid web site (http://osgrid.org/) under “Instructions”.

Opensim Service Management

Foreword

The following sections describe how I do Opensim Service Management for my Opensim regions running on my servers. I describe the directory structure I use, the updating process for Opensim and how I do service management using the Linux tool Monit (http://mmonit.com/monit/), as well as how I do regular backups.

Directory Structure

On my servers I have created a special user called “opensim”. Under this user’s home directory I have installed all Opensim files in a directory also called “opensim”. This folder contains a directory for each Opensim version installed.

The directory names use the following naming convention: “opensim_xxxx”, where “xxxx” is the Opensim SVN version number. I use a similar naming convention for the directory containing the search module: “ossearch_xxx” and similar naming conventions for other Opensim extensions I use. I compile the software in these directories and they also contain the standard configuration files.

The subdirectory “ServiceManagement” contains all scripts and pid (process id) files that I use for Opensim service management.

For each Opensim region, that use their own Opensim server processes, I have a separate sub folder under a subdirectory called “run”. The “run” subdirectory contains the Opensim version that is currently used. “run_old” is the previous version that I always keep to be able to do quick rollbacks. The directory “run_new” contains new Opensim versions, while I am still configuring them.

The run subdirectories mostly contain symbolic links to files in the “opensim_xxxx/bin” directory. Only the OpenSim.ini and region files are real copies. The OpenSim.ini files are adjusted manually based on the default file copied over from the “opensim_xxxx/bin” folder. There are independent Region and ScriptCache folders for each Opensim server instance, because they store region specific files.

Beside the mentioned folders, I have a “tmp” directory for downloading new software versions, before I rename the directory using the conventions mentioned above and before I move them to the main “opensim” directory. Additionally I have a “doc” subdirectory for documentation and one called “backup” for backups of OpenSim.ini and region files, as well as for oar backups.

.opensim
|-doc
|-backup
|-opensim_xxxx
|—bin
|—–Regions
|-ossearch_xxx
|—trunk
|-run
|—H1
|—–Regions
|—–ScriptEngines
|—H2
|—–Regions
|—–ScriptEngines
|—M3
|—–Regions
|—–ScriptEngines
|—M4
|—–Regions
|—–ScriptEngines
|—M5
|—–Regions
|—–ScriptEngines
|-run_old
|—H1
|—–Regions
|—–ScriptEngines
|—H2
|—–Regions
|—–ScriptEngines
|—M3
|—–Regions
|—–ScriptEngines
|—M4
|—–Regions
|—–ScriptEngines
|—M5
|—–Regions
|—–ScriptEngines
|-ServiceManagement
|-tmp

Internal Region Names

I use the following naming convention for directories, MySQL databases for the regions and process id files (pid files) on my Opensim servers. These names are independent of the real Opensim region names that you see within OSGrid:

  • first a letter: H = high traffic, M = medium traffic, L = low traffic
  • then the number of the region server process on that server, starting with 1

Examples: “H1” is the 1st Opensim process on that server, running a high traffic region. “M3” is the 3rd Opensim process on that server, running a medium traffic region.

In general these naming conventions are just my personal preference. For sure you can use any naming you like.

How to Update Opensim – Process and Scripts

I have developed a set of scripts and a process to be able to update many Opensim regions on a server efficiently, while being able to do quick rollbacks, if necessary. To update my Opensim installation I do the following steps:

1. I go in the “tmp” subdirectory and download the lastest version of Opensim:

$ svn co http://opensimulator.org/svn/opensim/trunk opensim

Or I download a specific Opensim version, for example the lastest recommended version:

$ svn co -r <version> http://opensimulator.org/svn/opensim/trunk opensim

2. I rename the resulting Opensim folder to “opensim_xxxx” and move it up:

$ mv opensim opensim_<opensim version>
$ mv opensim_<opensim version> ..

3. I download the lastest Opensim Search (ossearch) version, rename the directory and move it up:

$ svn checkout http://forge.opensimulator.org/svn/ossearch
$ mv ossearch ossearch_<ossearch version>
$ mv ossearch_<ossearch version> ..

4. I go to the directory of the new Opensim version and clean it up:

$ cd ../opensim_<opensim version>
$ ./runprebuild.sh
$ nant clean

If you see an error message start “nant clean” again.
5. I install the lastest Opensim search module:

$ cp -r ../ossearch_<ossearch version>/trunk/* .

6. I compile Opensim:

$ ./runprebuild.sh
$ nant

If you see an error message start “nant” again.
7. To configure Opensim you can use the OpenSim.ini.example file in the “bin” subdirectory to create a new, customized OpenSim.ini file from scratch. But usually you will prefer to use your previous version of the OpenSim.ini file to make the required changes for the new Opensim version.

I do such updates by comparing the OpenSim.ini.example files of the old and new version. Then I edit a copy of the old OpenSim.ini file to make the required changes for the new Opensim version. At the end I have a new, updated OpenSim.ini file in the “bin” subfolder.

$ cd bin
$ cp ../../opensim_<opensim old version>/bin/OpenSim.ini .
$ diff ../../opensim_<opensim old version>/bin/OpenSim.ini.example OpenSim.ini.example
$ vi OpenSim.ini

In the generic OpenSim.ini file in the bin subdirectory I use the following symbols. These symbols will later be replaced with the correct values for each region. This simplifies managing many Opensim regions, because you only need to update one master OpenSim.ini file and the individual OpenSim.ini files are automatically created by a script that I will describe later.

  • REGION_NAME
  • HTTP_PORT
  • DATABASE_NAME
  • DATABASE_PASSWORD
  • SERVER_IP
  • VOICE_IP
  • AV_CAPSULE (only required for 64 bit servers)

8. Finally I check if some important files have been created properly and then I go back to the main Opensim directory:

$ ls *.ini libode* *Sea*
$ cd ../..

9. Now, I automatically create a “run_new” directory for the new Opensim version. This new directory is based on the given Opensim version and the region files of the regions in the current “run” directory. The very first time you have to setup a “run” directory yourself manually.

$ updateos opensim_<opensim version (without slash at the end!)>

This script creates the specific OpenSim.ini versions for each region automatically.

10. Finally I stop all running Opensim processes (see Service Management section), clean pid and Mono files (see “rmpiddsos” and “clearos” scripts in next section) and switch to the new Opensim version using the following commands:

$ rm -fr run_old
$ mv run run_old
$ mv run_new run

11. After that I restart the Opensim processes (see Monit in next section) or I reboot the whole server after doing additional Linux software updates.

12. Finally I log in and check if all my regions work well. For that I check if my regions rez properly. Then I test various scripted objects on my land and I test sim border crossings and teleports between my regions and to/from OSGrid plazas.

I have found out, that sometimes it is necessary to reset certain scripts to get them working again. Usually this is the case for the same scripts after each Opensim update.

If there are serious problems with an Opensim version, I do a rollback by simply stopping all Opensim processes, renaming the directories “run” to “run_broken” and “run_old” to “run”, and then I restart all Opensim processes.

If I need to change the OpenSim.ini file of the current Opensim version, I do these changes in the master OpenSim.ini file and run the following script, that updates all configuration files in the run subdirectory.

$ refreshos opensim_<opensim version (without slash at the end!)>

Service Management Scripts

The process that I have described previously, uses some scripts that I have stored in the user’s ~/bin directory. You might like to use similar scripts.

The following two scripts replace the symbols used in the generic OpenSim.ini file (REGION_NAME, HTTP_PORT, DATABASE_NAME, DATABASE_PASSWORD, SERVER_IP, VOICE_IP and AV_CAPSULE). You need to adjust the following scripts to set the proper values for each region.

#!/bin/sh
# updateos
echo Updating OpenSim…
cd /home/opensim/opensim/
mkdir run_new
cd run_new
mkdir M1 M2 M3 M4 M5 M6
mkdir M1/Regions M2/Regions M3/Regions M4/Regions M5/Regions M6/Regions
mkdir M1/ScriptEngines M2/ScriptEngines M3/ScriptEngines M4/ScriptEngines M5/ScriptEngines M6/ScriptEngines
cp ../run/M1/Regions/* M1/Regions
cp ../run/M2/Regions/* M2/Regions
cp ../run/M3/Regions/* M3/Regions
cp ../run/M4/Regions/* M4/Regions
cp ../run/M5/Regions/* M5/Regions
cp ../run/M6/Regions/* M6/Regions
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M1/g’ -e ‘s/HTTP_PORT/9010/g’ -e ‘s/DATABASE_NAME/M1/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M1/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M2/g’ -e ‘s/HTTP_PORT/9011/g’ -e ‘s/DATABASE_NAME/M2/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M2/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M3/g’ -e ‘s/HTTP_PORT/9012/g’ -e ‘s/DATABASE_NAME/M3/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M3/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M4/g’ -e ‘s/HTTP_PORT/9013/g’ -e ‘s/DATABASE_NAME/M4/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M4/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M5/g’ -e ‘s/HTTP_PORT/9014/g’ -e ‘s/DATABASE_NAME/M5/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M5/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M6/g’ -e ‘s/HTTP_PORT/9015/g’ -e ‘s/DATABASE_NAME/M6/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M6/OpenSim.ini
cd M1
ln -s ../../$1/bin/* .
ln -s ../../$1/bin/.* .
cd ../M2
ln -s ../../$1/bin/* .
ln -s ../../$1/bin/.* .
cd ../M3
ln -s ../../$1/bin/* .
ln -s ../../$1/bin/.* .
cd ../M4
ln -s ../../$1/bin/* .
ln -s ../../$1/bin/.* .
cd ../M5
ln -s ../../$1/bin/* .
ln -s ../../$1/bin/.* .
cd ../M6
ln -s ../../$1/bin/* .
ln -s ../../$1/bin/.* .
cd ../..

#!/bin/sh
# refreshos
echo Refreshing OpenSim INI Files…
cd /home/opensim/opensim/run/
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M1/g’ -e ‘s/HTTP_PORT/9010/g’ -e ‘s/DATABASE_NAME/M1/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M1/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M2/g’ -e ‘s/HTTP_PORT/9011/g’ -e ‘s/DATABASE_NAME/M2/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M2/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M3/g’ -e ‘s/HTTP_PORT/9012/g’ -e ‘s/DATABASE_NAME/M3/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M3/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M4/g’ -e ‘s/HTTP_PORT/9013/g’ -e ‘s/DATABASE_NAME/M4/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M4/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M5/g’ -e ‘s/HTTP_PORT/9014/g’ -e ‘s/DATABASE_NAME/M5/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M5/OpenSim.ini
cat ../$1/bin/OpenSim.ini | sed -e ‘s/REGION_NAME/M6/g’ -e ‘s/HTTP_PORT/9015/g’ -e ‘s/DATABASE_NAME/M6/g’ -e ‘s/DATABASE_PASSWORD/password/g’ -e ‘s/SERVER_IP/71.6.217.139/g’ -e ‘s/VOICE_IP/66.240.232.99/g’ -e ‘s/AV_CAPSULE/1700000/g’ > M6/OpenSim.ini
cd ..

Service Management with Monit

For continuous service monitoring I use Monit (http://mmonit.com/monit/). For many Linux versions Monit is available as software package that can be installed from a repository using the package manager of your Linux distribution.

After installing Monit, it is necessary to configure Monit in the file /etc/monitrc as root user. After that check your changes of /etc/monitrc by executing “monit -t”. If the new file is OK, restart Monit by executing “/etc/init.d/monit stop” and “/etc/init.d/monit start” as root user.

Monit needs pid (process id) files that store the Linux process numbers of the processes it has to supervise. In OpenSim.ini you can define the location where Opensim stores a pid file. I use the following setting, that creates different pid files for each region in the /tmp directory:

PIDFile = “/tmp/REGION_NAME.pid”

The Monit user interface is a web based user interface. It can be accessed using the following URL: http://<server name>:2812/ For sure port 2812 needs to be reachable from the Internet, if you intend to provide the Opensim service management to external users.

Monit is password protected and can also use SSL. Especially if you intend to manage servers over the Internet you should use SSL. To setup SSL execute the following commands as root user:

$ apt-get install ssl-cert
$ mkdir /etc/apache2/ssl
$ /usr/bin/sbin/make-ssl-cert /usr/bin/ssl-cert/ssleay.cnf /etc/apache2/ssl/apache.pem

Then do the following changes in /etc/monit/monitrc and restart Monit:

ssl enable
pemfile /etc/apache2/ssl/apache.pem

After this Monit can only be accessed using SSL: https://<server name>:2812/
Monit1_cut
Monit2_cut

All Opensim and Freeswitch processes run within a Screen environment (http://linux.die.net/man/1/screen), which allows to run Opensim processes like a server, while the user is still able to connect a terminal to the process to read outputs and to execute commands. “screen -ls” lists all sessions the user can connect to. “screen -r <session>” connect the current terminal with the given session. The session can be left without terminating the process by pressing ctrl-a ctrl-d. ctrl-c kills the process and should be avoided. Use the Opensim command “shutdown” to shutdown an Opensim process instead.

The following scripts are required by Monit and are stored in the “ServiceManagement” directory. You need to adjust the directory paths for your installation.

startbos – Script to start Opensim in the terminal window for testing (without Monit and Screen). The region name must be provided as parameter.

#!/bin/sh
cd /home/opensim/opensim/run/$1/
mono ./OpenSim.32BitLaunch.exe -gridmode=true -smtag=$1

startqos – Script used by Monit to start Opensim (creates pid file). The region name must be provided as parameter.

#!/bin/sh
export PATH=”/home/opensim/bin/mono/bin:$PATH”
export PKG_CONFIG_PATH=”/home/opensim/bin/mono/lib/pkgconfig:$PKG_CONFIG_PATH”
export MANPATH=”/home/opensim/bin/mono/share/man:$MANPATH”
export MONO_THREADS_PER_CPU=80
cd /home/opensim/opensim/run/$1/
screen -S $1 -d -m mono ./OpenSim.32BitLaunch.exe -gridmode=true -smtag=$1 &

Comment: Because Monit does not start the process using a bash shell, it is necessary to specify the Mono settings.

stopos – Script that is used by Monit to stop an Opensim process. The region name and the http port number must be provided as parameters. This script uses the “stopsoftos” script shown afterwards.

#!/bin/sh
echo $1: stopping process
[ -e /tmp/$1.pid ] || exit 0
OPID=`cat /tmp/$1.pid`
/home/opensim/opensim/ServiceManagement/stopsoftos $2 &
sleep 90
PID=`cat /tmp/$1.pid`
if [ “$PID” = “$OPID” ]; then
kill -KILL $PID
rm /tmp/$1.pid
fi

Comment: A hard process kill is not done if the process has been restarted in the meantime since that script was invoked.

stopsoftos – Script that is used by the “stopos” script to shutdown Opensim processes softly. The http port number must be provided as parameter. This script uses the “broadcastos” and “shutdownos” Python scripts to send warning messages to users and to shut down Opensim.

#!/bin/sh
/home/opensim/opensim/ServiceManagement/broadcastos -s http://localhost:$1 -p <password> -m “This region will restart in 1 minute! Please leave now!” &
sleep 30
/home/opensim/opensim/ServiceManagement/broadcastos -s http://localhost:$1 -p <password> -m “This region will restart in 30 seconds! Please leave now!” &
sleep 30
/home/opensim/opensim/ServiceManagement/shutdownos -s http://localhost:$1 -p <password> &

broadcastos – Python script that sends messages to Opensim users. The http port number must be provided as parameter.

#!/usr/bin/python
# -*- encoding: utf-8 -*-
import ConfigParser
import xmlrpclib
import optparse
import os.path
if __name__ == ‘__main__’:
parser = optparse.OptionParser()
parser.add_option(‘-s’, ‘–server’, dest = ‘server’, help = ‘URI of the region server’, metavar = ‘SERVER’)
parser.add_option(‘-p’, ‘–password’, dest = ‘password’, help = ‘password of the region server’, metavar = ‘PASSWD’)
parser.add_option(‘-m’, ‘–message’, dest = ‘message’, help = ‘message to broadcast’, metavar = ‘MSG’)
(options, args) = parser.parse_args()
server = options.server
password = options.password
message = options.message
gridServer = xmlrpclib.Server(server)
res = gridServer.admin_broadcast({‘password’: password, ‘message’: message})
if res[‘success’] == ‘true’:
print ‘message was sent to %s’ % server
else:
print ‘sending message to %s failed’ % server

shutdown – Python script that shuts down Opensim server processes. The http port number must be provided as parameter.

#!/usr/bin/python
# -*- encoding: utf-8 -*-
import ConfigParser
import xmlrpclib
import optparse
import os.path
if __name__ == ‘__main__’:
parser = optparse.OptionParser()
parser.add_option(‘-s’, ‘–server’, dest = ‘server’, help = ‘URI of the region server’, metavar = ‘SERVER’)
parser.add_option(‘-p’, ‘–password’, dest = ‘password’, help = ‘password of the region server’, metavar = ‘PASSWD’)
(options, args) = parser.parse_args()
server = options.server
password = options.password
gridServer = xmlrpclib.Server(server)
res = gridServer.admin_shutdown({‘password’: password})
if res[‘success’] == ‘true’:
print ‘shutdown of %s initiated’ % server
else:
print ‘shutdown of %s failed’ % server

rmpidsos – Script to clean up pid files after shutting down all Opensim server processes.

#!/bin/sh
rm -f /tmp/*.pid

clearos – Script to clean the ~/.wapi/ directory and the ScriptEngines caches of the current Opensim installation. This fixes problems with Mono and cached scripts. It is a good practice to execute this command after each update.

#!/bin/sh
rm -r /home/opensim/.wapi/
rm -r /home/opensim/opensim/run/*/ScriptEngines/*

Monit Configuration File

Finally here is an example of my /etc/monit/monitrc file that I use for monitoring Opensim and Freeswitch processes.

If you change /etc/monit/monitrc, always run “monit -t” afterwards to check the file for errors. If the file is correct, restart Monit by executing “/etc/init.d/monit restart”.

The memory limits depend on the kind of region (high, medium or low traffic). Beside processor utilization and memory consumption each Opensim process is checked regularly by sending requests to the http port of that Opensim server process. Only in seldom cases crashes cannot be detected this way.

If all limits are optimized for each sim, the regions should run very smoothly and restart only about every 3 or 4 days automatically, most often because the memory limit has been reached. This way Opensim service monitoring is done mostly automatically.

As you can also recognize, Opensim is run on my servers under a special user “opensim”. This is good practice to reduce security risks.

###############################################################################
## Monit control file
###############################################################################
##
## Comments begin with a ‘#’ and extend through the end of the line. Keywords
## are case insensitive. All path’s MUST BE FULLY QUALIFIED, starting with ‘/’.
##
## Bellow is the example of some frequently used statements. For information
## about the control file, a complete list of statements and options please
## have a look in the monit manual.
##
##
###############################################################################
## Global section
###############################################################################
##
## Start monit in background (run as daemon) and check the services at 1-minute
## intervals.
#
set daemon  60
#
#
## Set syslog logging with the ‘daemon’ facility. If the FACILITY option is
## omited, monit will use ‘user’ facility by default. You can specify the
## path to the file for monit native logging.
#
# set logfile syslog facility log_daemon
#
#
## Set list of mailservers for alert delivery. Multiple servers may be
## specified using comma separator. By default monit uses port 25 – it is
## possible to override it with the PORT option.
#
# set mailserver mail.bar.baz,               # primary mailserver
#                backup.bar.baz port 10025,  # backup mailserver on port 10025
#                localhost                   # fallback relay
#
#
## By default monit will drop the event alert, in the case that there is no
## mailserver available. In the case that you want to keep the events for
## later delivery retry, you can use the EVENTQUEUE statement. The base
## directory where undelivered events will be stored is specified by the
## BASEDIR option. You can limit the maximal queue size using the SLOTS
## option (if omited then the queue is limited just by the backend filesystem).
#
# set eventqueue
#     basedir /var/monit  # set the base directory where events will be stored
#     slots 100           # optionaly limit the queue size
#
#
## Monit by default uses the following alert mail format:
##
## –8<–
## From: monit@$HOST                         # sender
## Subject: monit alert —  $EVENT $SERVICE  # subject
##
## $EVENT Service $SERVICE                   #
##                                           #
##      Date:        $DATE                   #
##      Action:      $ACTION                 #
##      Host:        $HOST                   # body
##      Description: $DESCRIPTION            #
##                                           #
## Your faithful employee,                   #
## monit                                     #
## –8<–
##
## You can override the alert message format or its parts such as subject
## or sender using the MAIL-FORMAT statement. Macros such as $DATE, etc.
## are expanded on runtime. For example to override the sender:
#
# set mail-format { from: monit@foo.bar }
#
#
## You can set the alert recipients here, which will receive the alert for
## each service. The event alerts may be restricted using the list.
#
# set alert sysadm@foo.bar                       # receive all alerts
# set alert manager@foo.bar only on { timeout }  # receive just service-
#                                                # timeout alert
#
#
## Monit has an embedded webserver, which can be used to view the
## configuration, actual services parameters or manage the services using the
## web interface.
#
set httpd port 2812
ssl enable
pemfile /etc/apache2/ssl/apache.pem
allow admin:password      # require user ‘admin’ with password ‘monit’
#
#
###############################################################################
## Services
###############################################################################
##
## Check the general system resources such as load average, cpu and memory
## usage. Each rule specifies the tested resource, the limit and the action
## which will be performed in the case that the test failed.
#
check system ubuntu823294.aspadmin.net
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if memory usage > 75% then alert
if cpu usage (user) > 70% then alert
if cpu usage (system) > 30% then alert
if cpu usage (wait) > 20% then alert
#
#
## Check a file for existence, checksum, permissions, uid and gid. In addition
## to the recipients in the global section, customized alert will be send to
## the additional recipient. The service may be grouped using the GROUP option.
#
#  check file apache_bin with path /usr/local/apache/bin/httpd
#    if failed checksum and
#       expect the sum 8f7f419955cefa0b33a2ba316cba3659 then unmonitor
#    if failed permission 755 then unmonitor
#    if failed uid root then unmonitor
#    if failed gid root then unmonitor
#    alert security@foo.bar on {
#           checksum, permission, uid, gid, unmonitor
#        } with the mail-format { subject: Alarm! }
#    group server
#
#
## Check that a process is running, responding on the HTTP and HTTPS request,
## check its resource usage such as cpu and memory, number of childrens.
## In the case that the process is not running, monit will restart it by
## default. In the case that the service was restarted very often and the
## problem remains, it is possible to disable the monitoring using the
## TIMEOUT statement. The service depends on another service (apache_bin) which
## is defined in the monit control file as well.
#
# Monitor Apache 2 Service
#check process apache with pidfile /var/run/apache2.pid
#start program “/etc/init.d/apache2 start”
#stop program “/etc/init.d/apache2 stop”
#if cpu > 60% for 2 cycles then alert
#if cpu > 80% for 5 cycles then restart
#if totalmem > 200.0 MB for 5 cycles then restart
#if children > 250 then restart
#if loadavg(5min) greater than 10 for 8 cycles then stop
#if failed host metaverse.getmyip.com port 80 protocol http
# then restart
#if failed port 443 type tcpssl protocol http
# with timeout 15 seconds
#then restart
#if 3 restarts within 5 cycles then timeout
#group server
#
# Monitor MySQL Service
check process mysql with pidfile /var/run/mysqld/mysqld.pid
group database
start program “/etc/init.d/mysql start”
stop program “/etc/init.d/mysql stop”
if failed host 127.0.0.1 port 3306 then restart
if 5 restarts within 5 cycles then timeout
#
# Monitor ssh Service
check process sshd with pidfile /var/run/sshd.pid
start program “/etc/init.d/ssh start”
stop program “/etc/init.d/ssh stop”
if failed port 22 protocol ssh then restart
if 5 restarts within 5 cycles then timeout
#
# Freeswitch
check process freeswitch with pidfile “/usr/local/freeswitch/log/freeswitch.pid”
start program “/usr/bin/screen -S freeswitch -d -m /usr/local/freeswitch/bin/freeswitch -nf”
stop program “/usr/local/freeswitch/bin/freeswitch -stop”
if totalmem > 40.0 MB then alert
if totalmem > 50.0 MB for 3 cycles then restart
# Checks sip port on localhost, not always suitable
# if failed port 5060 type UDP then restart
# Checks mod_event_socket on localhost. Maybe more suitable
if failed port 8021 type TCP then restart
if 5 restarts within 5 cycles then timeout
#
# Monitor mono opensim Service for H1
check process opensim_H1 with pidfile /home/opensim/opensim/ServiceManagement/H1.pid
start program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/startqos H1”
stop program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/stopos H1 9010”
if totalmem > 900 Mb then alert
if totalmem > 1100 Mb then restart
if cpu usage > 20% then alert
if cpu usage > 24% for 3 cycles then restart
if failed host localhost port 9010 send “GET /SStats/ HTTP/1.0\r\nHost: localhost\r\n\r\n” expect “<!DOCTYPE html .*” within 5 cycles then restart
if 5 restarts within 5 cycles then timeout
#
# Monitor mono opensim Service for H2
check process opensim_H2 with pidfile /home/opensim/opensim/ServiceManagement/H2.pid
start program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/startqos H2”
stop program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/stopos H2 9011”
if totalmem > 800 Mb then alert
if totalmem > 1000 Mb then restart
if cpu usage > 20% then alert
if cpu usage > 24% for 3 cycles then restart
if failed host localhost port 9011 send “GET /SStats/ HTTP/1.0\r\nHost: localhost\r\n\r\n” expect “<!DOCTYPE html .*” within 5 cycles then restart
if 5 restarts within 5 cycles then timeout
#
# Monitor mono opensim Service for M3
check process opensim_M3 with pidfile /home/opensim/opensim/ServiceManagement/M3.pid
start program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/startqos M3”
stop program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/stopos M3 9012”
if totalmem > 700 Mb then alert
if totalmem > 900 Mb then restart
if cpu usage > 20% then alert
if cpu usage > 24% for 3 cycles then restart
if failed host localhost port 9012 send “GET /SStats/ HTTP/1.0\r\nHost: localhost\r\n\r\n” expect “<!DOCTYPE html .*” within 5 cycles then restart
if 5 restarts within 5 cycles then timeout
#
# Monitor mono opensim Service for M4
check process opensim_M4 with pidfile /home/opensim/opensim/ServiceManagement/M4.pid
start program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/startqos M4”
stop program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/stopos M4 9013”
if totalmem > 600 Mb then alert
if totalmem > 800 Mb then restart
if cpu usage > 20% then alert
if cpu usage > 24% for 3 cycles then restart
if failed host localhost port 9013 send “GET /SStats/ HTTP/1.0\r\nHost: localhost\r\n\r\n” expect “<!DOCTYPE html .*” within 5 cycles then restart
if 5 restarts within 5 cycles then timeout
#
# Monitor mono opensim Service for M5
check process opensim_M5 with pidfile /home/opensim/opensim/ServiceManagement/M5.pid
start program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/startqos M5”
stop program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/stopos M5 9014”
if totalmem > 900 Mb then alert
if totalmem > 1100 Mb then restart
if cpu usage > 20% then alert
if cpu usage > 24% for 3 cycles then restart
if failed host localhost port 9014 send “GET /SStats/ HTTP/1.0\r\nHost: localhost\r\n\r\n” expect “<!DOCTYPE html .*” within 5 cycles then restart
if 5 restarts within 5 cycles then timeout
#
# Monitor mono opensim Service for M6
# check process opensim_M6 with pidfile /home/opensim/opensim/ServiceManagement/M6.pid
# start program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/startqos M6”
# stop program = “/usr/bin/sudo -u opensim /home/opensim/opensim/ServiceManagement/stopos M6 9015”
# if totalmem > 600 Mb then alert
# if totalmem > 800 Mb then restart
# if cpu usage > 20% then alert
# if cpu usage > 24% for 3 cycles then restart
# if failed host localhost port 9015 send “GET /SStats/ HTTP/1.0\r\nHost: localhost\r\n\r\n” expect “<!DOCTYPE html .*” within 5 cycles then restart
# if 5 restarts within 5 cycles then timeout
#
## Check the device permissions, uid, gid, space and inode usage. Other
## services such as databases may depend on this resource and automatical
## graceful stop may be cascaded to them before the filesystem will become
## full and the data will be lost.
#
#  check device datafs with path /dev/sdb1
#    start program  = “/bin/mount /data”
#    stop program  = “/bin/umount /data”
#    if failed permission 660 then unmonitor
#    if failed uid root then unmonitor
#    if failed gid disk then unmonitor
#    if space usage > 80% for 5 times within 15 cycles then alert
#    if space usage > 99% then stop
#    if inode usage > 30000 then alert
#    if inode usage > 99% then stop
#    group server
#
#
## Check a file’s timestamp: when it becomes older then 15 minutes, the
## file is not updated and something is wrong. In the case that the size
## of the file exceeded given limit, perform the script.
#
#  check file database with path /data/mydatabase.db
#    if failed permission 700 then alert
#    if failed uid data then alert
#    if failed gid data then alert
#    if timestamp > 15 minutes then alert
#    if size > 100 MB then exec “/my/cleanup/script”
#
#
## Check the directory permission, uid and gid.  An event is triggered
## if the directory does not belong to the user with the  uid 0 and
## the gid 0.  In the addition the permissions have to match the octal
## description of 755 (see chmod(1)).
#
#  check directory bin with path /bin
#    if failed permission 755 then unmonitor
#    if failed uid 0 then unmonitor
#    if failed gid 0 then unmonitor
#
#
## Check the remote host network services availability and the response
## content.  One of three pings, a successfull connection to a port and
## application level network check is performed.
#
#  check host myserver with address 192.168.1.1
#    if failed icmp type echo count 3 with timeout 3 seconds then alert
#    if failed port 3306 protocol mysql with timeout 15 seconds then alert
#    if failed url
#       http://user:password@www.foo.bar:8080/?querystring
#       and content == ‘action=”j_security_check”‘
#       then alert
#
#
###############################################################################
## Includes
###############################################################################
##
## It is possible to include the configuration or its parts from other files or
## directories.
#
#  include /etc/monit.d/*
#
#

Practical Hints using Monit

To shut down a region I usually disable monitoring of that region in the Monit user interface. Then I open a console window and connect to that Opensim server process using “screen -r <region name>”. Then I check if there are people using “show users”. If there are users, I send a warning message using “alert general <message>”, until finally I shut down the region using the “shutdown” command. That also closes the Screen session.

This gives me more control of the shutdown process and is faster if nobody is in that region. Otherwise Monit sends warning messages and waits with the shutdown to give people time to leave the region. But for sure you can also use the Monit “stop” and “restart” buttons, what definitively is more convenient.

To restart a region I simply click the “start” button in Monit. Often I check how Opensim restarts by opening the corresponding Screen session in a terminal window. At the end I disconnect from the Screen session by pressing ctrl-a ctrl-d.

If Freeswitch runs as root user, you need to use Screen as root user to be able to connect to it.

Database Backups using AutoMySQLBackup

I use AutoMySQLBackup (http://www.debianhelp.co.uk/mysqlscript.htm) for daily database backups for the last 7 days. In the script that this tool uses you need to specify the MySQL user name, password and names of the databases to back up. I use the directory ~/backups of my opensim user to store database backups. Finally add that script to your user’s crontab:

$ crontab -e

If a region has serious problems and if it looks like that the database contents of a region has been damaged, I restore the database contents of that region. For sure the corresponding regions needs to be shut down while a database backup is installed.


Blog Stats

  • 24,486 hits

Pages

May 2024
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031