Platform Open Cluster Stack (OCS) User Guide


Version 4.1.1-2.0

October 25 2006

Platform Computing

Contents

[ Top ]


What is Platform OCS?

Building a Linux® cluster is a challenging and time-consuming task. There are many tools in the community and on the Internet for building, configuring, and managing Linux clusters. However, these tools typically assume a familiarity with Beowulf clusters and the concepts of Linux clusters.

Platform Open Cluster Stack (OCS) is a pre-integrated, vendor certified, software stack that enables the consistent delivery of scale-out application clusters. Platform OCS enables a new class of users by simplifying Linux® cluster application, deployment and management. Backed by global 24x7 enterprise support, Platform OCS is a modular and hybrid stack that transparently integrates open source and commercial software into a single consistent cluster operating environment.

This product includes software developed by the Rocks Cluster Group at the San Diego Supercomputer Center at the University of California, San Diego and its contributors. For more information, visit http://www.rocksclusters.org.

Platform OCS is fully supported by Platform Computing Corporation and requires a Red Hat® based operating system such as Red Hat® Enterprise Linux or CentOS Enterprise Linux.

Where to get Platform OCS?

Platform OCS 4.1.1 is released as two different editions: Enterprise and Standard edition. Before installing, make sure you have the following documentation for your particular edition, and have reviewed them before starting your installation:

If you plan on installing other third-party rolls, obtain the CD or DVD containing those rolls.

You can download Platform OCS Standard Edition from the Platform web site at http://my.platform.com/products/platform-ocs.

Contact Platform Computing to purchase Platform OCS Enterprise Edition.

[ Top ]


Pre-installation

The following steps summarize the Platform OCS pre-installation process:

  1. Check the hardware configuration
  2. Check the network configuration

Check the hardware configuration

Before Platform OCS is installed, a set of minimal hardware requirements must be satisfied. A typical Platform OCS cluster uses a Beowulf-type cluster setup consisting of the following types of hosts:

Frontend node

The frontend node (or head node) is responsible for the following:

Minimal hardware requirements for a frontend are as follows:

Compute nodes

One or more compute nodes are responsible for the following:

Minimal hardware requirements for compute nodes are:

Optional hardware for compute nodes:

Cluster setup

The following is a diagram that illustrates the cluster setup:

Check the network configuration

In the figure above, the frontend node connects to both a private network through the Ethernet interface mapped to eth0, and to the public network through the Ethernet interface mapped to eth1. The public network refers to the main network in your company or organization. A network switch connects the frontend and compute nodes together to form a completely private network. Other cluster configurations are possible such as exposing all of the compute nodes to the public network by connecting them directly to the public network and not hidden behind the frontend node; however, this type of configuration is not supported at install time.

The private network connecting the frontend and compute nodes is typically a Gigabit or 100Mb Ethernet network. In this simple setup, the private network serves three purposes:

However, it is common practice to perform message passing over a much faster network using a high-speed interconnect such as Myrinet or Infiniband. A fast interconnect provides benefits such as higher throughput and lower latency. For more information about a particular interconnect, please contact the appropriate interconnect vendor.

Testing network configuration

To ensure a successful Platform OCS installation, the Ethernet switches need to be configured properly. There are some installation issues caused by specific switch configurations.

  1. If spanning tree is enabled on the switch it dramatically slows down PXE installation because each port in the switch is trying to determine where it fits in the Spanning Tree to avoid loops in the network. Caution should be used when changing the spanning tree configuration options on your switch. A Platform OCS cluster with a single network switch will not need spanning tree configured because there is no possibility of loops in the Ethernet network. However, if multiple switches are required in the cluster then spanning tree is needed to ensure that no loops are created in the Ethernet network topology. Platform recommends disabling spanning tree.
  2. Check if PortFast is disabled on the switch. Different switch manufacturers may use different names. It is the forwarding scheme the switch uses. For best installation performance the switch should begin forwarding the packets as it is receiving them. This will speed the PXE booting process. Platform recommends enabling PortFast if it is supported by the switch.
  3. Check if Multi-casting is disabled on the switch. Certain switches may need to be configured to allow multi-cast traffic on the private network. Certain tools in Platform OCS such as Ganglia (Cluster Monitoring Tool) require multi-casting enabled to collect information correctly. The switch(es) should be configured for multi-cast traffic for proper Ganglia data collection.
  4. Run diagnostics on the switch to ensure the switch is connected properly, and there are no bad ports or cables in the configuration.
Network information

Information about your network is required during installation. Collect the following items from your company or organization's IT department:

[ Top ]


Frontend node installation

The following steps summarize the installation of Platform OCS on your frontend node:

  1. Start the Platform OCS installer
  2. Configure your frontend
  3. Partition your frontend
  4. Test the frontend node

Start the Platform OCS installer

Perform the following steps to start the Platform OCS installer:

  1. Insert the Platform OCS DVD into your frontend

    After your hardware is setup and connected, you are ready to start installing your frontend.

  2. Power up the frontend node with the Platform OCS 4.1.1 DVD. If the DVD does not boot, you must configure the frontend's BIOS to boot from the DVD drive.
  3. You will see a splash screen, accompanied by a boot prompt. Type frontend and press Enter. You need to be quick because the installer will start automatically if you do not type anything in the boot prompt within 10 seconds. If you miss typing frontend, the installer assumes you are installing a compute node, and not a frontend. Simply power down the frontend and start again.

    After the splash screen, the installer loads the kernel and initial ramdisk. You can abort the loading process by pressing Ctrl-C when you see "Loading vmlinuz..." or "Loading initrd.img...". This returns you to the boot prompt.

  4. When you see the Available Rolls dialog, you are ready configure your frontend, as specified in Configure your frontend.
Optional steps for booting:

The Platform OCS installer can be booted with optional parameters. Some common boot parameters include:
Boot parameters Description
dd
This option prompts the user to enter a Driver Disk. When you have hardware that is not supported by the Linux kernel used by the Platform OCS installer, use a Driver Disk to load the kernel drivers for your hardware. Consult your hardware vendor for the Driver Disk.
Other boot parameters
Other boot parameters can be used to alter the boot process.
Examples:
  • Use the mem=XXXM parameter to specify the amount of physical memory to use for the installation (where XXX is the amount in MB).
  • Use the noacpi parameter to disable ACPI
For a full list of boot parameters, refer to the Red Hat Enterprise Linux documentation.

User input

Subsequent installer screens require you to input some values. The following is a list of general tips for navigating between screen elements:

If you encounter an issue during the installation, you can look for more information in the following locations:

Configure your frontend

At the Available Rolls dialog, select the rolls to install on your frontend, and add your cluster information.

About rolls

A roll groups together packages and configuration scripts that are used to install a specific component for a Platform OCS cluster. For example, a roll can install a batch job scheduler, a driver for an interconnect, or a cluster monitoring package. The DVD contains all of the Rolls you need to install a frontend. There are two types of Rolls:

Selecting the rolls to install

Complete the following steps to select the rolls to install on your frontend node:

  1. In the Available Rolls dialog, select all of the rolls you want to install on your frontend. We recommend selecting the Lava Roll (batch job scheduler). The table below is a summary of what rolls are included on the DVD, grouped by category. Choose what you require for your new cluster. Note that your DVD may contain other rolls depending on what Platform OCS edition you have (either Enterprise or Standard edition).
    Category Rolls
    Required Platform OCS rolls
    Base, HPC, Kernel, OS, Platform
    Batch job scheduling systems
    Lava, LSF HPC
    Interconnects
    CiscoTM Topspin®, Myrinet
    Cluster monitoring systems
    Clumon, Ganglia, Ntop
    Parallel file systems
    PVFS2
    Vendor Customizations
    Dell, Intel, HP
  2. Press OK when you have finished selecting the rolls you wish to install.
  3. Select Yes to install more rolls on the frontend or No if you are finished adding Rolls.

    The installer displays the list of rolls selected for installation from the DVD, and prompts you to enter additional CD or DVD to install more rolls. In most cases, the DVD contains all of the rolls you need for installation. However, if you have more rolls to install, select Yes. Otherwise, select No.

    Links to additional rolls can be found on Platform web site at: http://my.platform.com/products/platform-ocs.

  4. Insert the CDs or DVDs containing the additional Rolls:

    Skip this step if you selected No in the previous step. Otherwise, perform the following:

    1. Select Ok when prompted to insert a CD/DVD.
    2. A roll selection screen similar to the one for the boot DVD is displayed if the CD/DVD you inserted has multiple rolls. A roll selection screen is not shown if a CD/DVD contains one roll or has only required rolls. If so, the installer automatically selects all of the rolls.
    3. The installer will continue to prompt you for more CD/DVDs. Select No when you have added all the rolls.
  5. In the Cluster Information dialog, specify the details of your Platform OCS cluster.

    Enter a Fully Qualified Domain Name (FQDN) for the hostname. The domain name should match your company or organization's domain name.

  6. When you see the Disk Partitioning Setup dialog, you are ready to partition the hard disk in your frontend, as described in Partition your frontend.

Partition your frontend

Partition the hard disk in your frontend. You need to decide whether to auto-partition your hard disk or manually partition your hard disk.

Auto-partitioning quickly partitions the first disk on your frontend using a default Platform OCS partition scheme. You can select an alternate disk to partition. Auto-partition uses the following partition scheme:

Partition Mountpoint Filesystem tyoe Minimum size Default size
Root
/
ext3
6 GB
10 GB
Swap
None
swap
1 GB
4 GB
Export
/state/partition1
ext3
10 GB
Rest of disk

Manual partitioning requires you to manually set up the partition scheme. This includes setting the correct mount-points and specifying appropriate partition sizes.


We recommend Auto-partitioning for most users, You should only select Manual partitioning (Disk Druid) if you want more control over how the disk is partitioned.

At the Disk Partition Setup dialog, choose to auto-partition or manually partition your disk:

Auto-partition your hard disk

Auto-partition your hard disk using the following steps:

  1. At the Disk Partitioning Setup dialog, select Autopartition.
  2. Select the disk to partition and specify whether you want to preserve existing partitions.

    The installer supports three options for preserving partitions on the disk in which Platform OCS is installed.

    • Remove all Linux partitions.

      This preserves any non-Linux partitions, such as Windows partitions (e.g. FAT, FAT32, and NTFS partitions), and any data on those partitions.

    • Remove all partitions.

      This wipes out all partitions on the disk. All data on those partitions will be lost. Note: On Dell systems the Dell utility partition will be preserved.

    • Keep all partitions.

      This preserves all existing partitions, including the data on those partitions. Partitions for Platform OCS are added in the available free space.

      If you choose the option to preserve non-Linux partitions, or all partitions, the amount of free space on the disk must satisfy the minimum required disk space to install a Platform OCS frontend. Refer to the Install checklist for the minimum requirements.

      If there isn't enough space left on the disk, the Platform OCS installer will display an error message to indicate it "Could not allocate requested partitions". The installer will not let you proceed. You have to select Ok to reboot the machine.

      You can only select one disk to use for the installation. You have the option of selecting any disk currently attached to the machine, including any externally attached disks. If your machine has SCSI or SATA disks, the first disk is named "sda". If your machine has IDE disks, the first disk is named "hda". If you want to partition more than one disk, you have to select Manual Partitioning. You can select Back to return to the Disk Partitioning Setup dialog and select Manual Partitioning, and proceed to Manually partition your hard disk

      Select Ok to proceed to the next screen.

  3. Specify the sizes for the default Platform OCS partitions.

    The default partition scheme creates a root, swap, and export partition. These partitions are required for Platform OCS to function correctly. The root partition is where the Linux OS is installed, and the export partition is used to store the Platform OCS distribution and the Roll files that it uses.

    You must set the partition sizes in Megabytes (MB). You have the option of setting the export partition size to a fixed size, or make it grow to fill the remaining space on the disk.

    Select the Back to return to the Prepare Disk dialog. Select Ok to proceed to the next dialog.

  4. Review the automatically created partition layout
    • If you select No to review the partitions, you will advance to the Boot Loader Configuration dialog. Proceed to Manually partition your hard disk, but skip the first step.
    • If you select Yes, you are taken to the Disk Druid dialog to verify the partition scheme, and make changes if necessary. Proceed to Manually partition your hard disk.
Manually partition your hard disk

In the Disk Druid dialog, verify the partitioning scheme on your hard disk. If you did not choose to auto-partition your hard disk, you need to manually configure the partition scheme in this dialog.

  1. Update the partitioning layout with Disk Druid.

    There are two possible paths that can bring you to this screen:

    • You chose Auto-partitioning, and elected to review the partitioning scheme created
    • You chose Manual partitioning

    Disk Druid allows you to create, delete or modify partitions. If you are Auto-partitioning, you can augment the default scheme by creating new partitions. If you are Manually partitioning, you must create the minimum set of partitions required by Platform OCS. This includes the root, swap and export partitions. When you are satisfied with the partition layout, select Ok.

    Do not select RAID as Platform OCS does not support Software RAID partitioning.

  2. Select the default partition to boot for the GRUB boot loader

    The Platform OCS installer automatically adds boot entries to the GRUB boot menu for any operating systems it finds in any partitions that are preserved on the disk. This only occurs if you chose to preserve partitions. Only entries for non-Linux operating systems are added. If you like to add entries to the GRUB menu for Linux operating systems, you must add them manually after the frontend is installed.

    To change the default partition to boot, select the partition and press F2. You can also change the label for a partition by selecting it and pressing Edit.

Completing the Installation
  1. In the Network Configuration for eth0 and eth1 dialogs, specify the IP address for the Private (eth0) and Public (eth1) Ethernet interfaces

    A Platform OCS frontend requires two Ethernet interfaces to work correctly. The next two screens ask the user to enter the IP address and Netmask for the private and public interfaces.

    For the private interface, only class-based networks are supported. Classless Inter-Domain Routing (CIDR) is not supported (e.g. subnetting or supernetting). The following is a list of valid Netmask values and the number of hosts each Netmask value supports. Choose the Netmask value that is appropriate for your cluster size.

    Class Netmask value Number of hosts in the network
    A
    255.0.0.0
    16777214
    B
    255.255.0.0
    65534
    C
    255.255.255.0
    254

    For the public interface, you need to contact your IT department to obtain a static IP address for the frontend, and the corresponding Netmask value. You cannot configure the frontend to use an IP address obtained via DHCP.

  2. In the Miscellaneous Network Settings dialog, specify your gateway and DNS IP addresses.

    You may need to contact your IT department to obtain these addresses for your network.

  3. In the Time Configuration dialog, select your time zone from the list of servers and specify your network time server. If your node uses UTC time, select System clock uses UTC.
  4. In the Root Password dialog, select a root password that you will remember.

The installer will format the disk, copy the rolls from the DVD (and any other CD/DVDs you inserted) onto the disk, and install the packages.

After package installation completes, the boot loader is installed and the post-installation is executed. The machine then reboots. You have completed your installation, and are ready to test your frontend node as described in Test the frontend node.

Test the frontend node

Before installing the compute nodes, perform the following tests to verify your frontend is operational. Log in to your frontend as root with the password you used during the installation and perform the following steps:

  1. Check for hardware issues

    In some cases, you might have hardware that is not detected by the running kernel, or you have a kernel driver that fails to load. Look through the following logs to identify any hardware issues:

    1. Check the kernel logs for any hardware driver issues or other errors:
      # dmesg
      
    2. Check the system logs for any startup issues or other errors:
      # less /var/log/messages
      
  2. Check that the ethernet network is working:
    1. Check that both eth0 and eth1 interfaces are up:
      # ifconfig
      
    2. Verify the routing table is correct.
      # route
      

      When verifying the routing table, pay careful attention to the following:

      • Traffic for the private network is routed over eth0, while traffic for the public network is routed over eth1.
      • The default route will go through the gateway server you specified during installation.
      • Multicast packets will be routed over eth0 (using 224.0.0.0 network)
      • External hosts can be reached with the ping command
  3. Check that the High Performance Interconnect is working

    If you installed an interconnect, you should verify the driver for the interconnect hardware was loaded correctly. In addition, the interconnect vendor may provide diagnostic tools to determine if the interconnect is working. We suggest you refer to the documentation for your particular interconnect.

  4. Check the required services

    The frontend runs many services that are essential for cluster administration and installing compute nodes. You need to ensure all of the services listed below are running:

    Service How to check
    Web Server
    service httpd status
    DHCP
    service dhcpd status
    DNS
    service named status
    Xinetd
    service xinetd status
    MySQL database
    service mysqld status
    NFS
    service nfs status
    AutoFS
    service autofs status

  5. Check the Platform OCS infrastructure

    Run some basic Platform OCS commands, seen below, to verify the infrastructure is working. The commands should execute successfully.

    1. Login as root and start insert-ethers, select compute node, then press F11 to exit.
      # insert-ethers
      


      Important: If you run "insert-ethers", you might see a message that says "Rocks Distribution is not ready. Please wait for rocks-dist to complete". This is normal when you log into a frontend for the first time. A startup script runs rocks-dist in the background during the first boot-up. You have to wait for "rocks-dist" to finish running before you can run "insert-ethers".

    2. Test rebuilding the Platform OCS distribution
      # cd /home/install ; rocks-dist dist
      
  6. Check the added rolls

    Verify that all of the rolls you selected during the frontend installation are added to the frontend:

    # rollops -l
    

    You can use the "rollops" command to install other rolls from the DVD. Simply insert the DVD, and run the following command. This command will display a menu from which you can select the roll you want to install:

    # rollops -a
    
  7. Start up X Windows

    Run the following command to start X Windows:

    # startx
    

    This command will automatically probe for your video card, configure the settings for it, and start up X. It may be necessary to run system-config-display to configure the display correctly. You can configure Platform OCS to automatically start X every time you log in by changing the runlevel on the initdefault line from 3 to 5 in the /etc/inittab file.

  8. Check the Platform OCS Cluster home page

    Verify that you can access the cluster home page. The page will load automatically when you start the browser. This Homepage gives you access to all of the Cluster Monitoring tools, and Platform OCS documentation for all of the installed rolls.

    Follow the link near the bottom of the Homepage to register your Platform OCS cluster.

When all the above tests pass, you are ready to proceed with compute node installation. If you experience any issues or errors, contact Platform Support at support@platform.com.

[ Top ]


Compute node and appliance installation

Different types of nodes can be installed in a Platform OCS cluster. These different node types are referred to as appliances. The most common type is a compute node. The other appliances are listed in the table below. The set of available appliances will depend on what rolls you install. You can view the list of available appliances by running the insert-ethers command.

Appliance Type Installed by Description
Compute
Base Roll
Creates a standard compute node. Other Appliance types are based on this basic Compute appliance type.
LSF HPC Master
LSF HPC Roll
Creates an LSF HPC Master Candidate Host for fail-over of the master host in an LSF HPC cluster.
Pvfs2-meta-server
PVFS2 Roll
Creates a PVFS2 Meta Server th3 at maintains the distributed file system index for PVFS2, and a PVFS2 Data Server.
Ethernet Switches
Base Roll
Use this if you have a managed Ethernet switch. It is used to assign an IP address to a managed switch. This is done so that DHCP requests from a managed switch are not confused with DHCP requests from a compute node.

Note: Platform OCS provides an optional method to install compute nodes that involves pre-loading host information into the Platform OCS database to speed up the compute node installation process. This also allows system administrators to pre-configure the cluster naming and IP scheme making it independent of the order in which nodes are installed. This method requires a list of MAC addresses. To take advantage of this feature, you must obtain a list of MAC addresses for your compute nodes before installing the compute nodes.

The following steps summarize the installation of Platform OCS on your compute nodes and appliances:

  1. Prepare your compute node
  2. Install compute nodes
  3. Install other appliance types
  4. Test compute nodes and appliances
  5. Test the cluster installation

There are two methods for installing compute nodes and other appliance types: using the insert-ethers tool, or using the add-hosts tool. Choose the method that is appropriate for your cluster.

About insert-ethers

Insert-ethers is a tool you run on your frontend to capture the DHCP requests broadcasted by the compute nodes. For each DHCP request, insert-ethers generates a hostname and IP address for the node and adds the new information to the Platform OCS database.

The system is then updated to reflect the addition of the new host. Various system configuration files are updated, and DHCP and DNS services are restarted. Once DHCP is updated, a compute node can obtain an IP address, allowing it to network boot, and start the install process. Insert-ethers should be used if you are deploying a small to medium sized cluster ( less than 128 nodes). Insert-ethers uses a node naming convention based on the assumption that your nodes are assembled in racks. The convention is:

<appliance type>-<rack>-<rank>

where:

For example:

The insert-ethers command assigns IP addresses to nodes starting from the top-most IP address for your subnet, and iterates through the address space in descending order.

For example, given a frontend address of 10.1.1.1, and a netmask of 255.0.0.0, the first node is assigned 10.255.255.254, the second node is assigned 10.255.255.253, and so on.

About add-hosts

Add-hosts is a tool that pre-populates the Platform OCS database with host information. The tool enables the user to define their own hostnames and IP addresses for the compute nodes using an XML configuration file. This alleviates the need to run insert-ethers to capture DHCP requests, and auto-assign hostnames and IP addresses.

After the information is loaded into the database, the system is updated to reflect the addition of the new hosts, in the same way that insert-ethers updates the system. add-hosts should be used if you are deploying a large cluster of greater than 128 nodes. add-hosts requires a list of MAC addresses for your compute nodes. If you are purchasing new hardware for the cluster the hardware vendor can supply a list of MAC addresses for all nodes.

Prepare your compute node

Before installing your compute nodes, consider customizing them to suit your requirements. The most common customizations are:

To customize your compute nodes, you need to update the Platform OCS distribution. Customizations are specified using XML files. Every change to an XML file requires a rebuild of the Platform OCS distribution.

The Platform OCS distribution is located in /home/install/rocks-dist. To rebuild it login as root and, run:

# cd /home/install ; rocks-dist dist


Important: Always rebuild the distribution in the /home/install directory. Rebuilding the distribution in other directories may result in corruption of the permissions in the /home/install directory.

The XML files for compute node customization are located in /home/install/site-profiles/4.1.1/nodes. The XML files can be generated manually or generated using automated tools included with Platform OCS. Details are described in the next section.

  1. Changing the default partition layout

    You can change the default partition sizes, or create your own partition layout to override the default Platform OCS partition layout. The default partition layout for compute nodes is the same as the layout for the frontend. Only the first disk is partitioned, other disks are left as is.

    Partition Mountpoint Filesystem Type Minimum size Default size
    Root
    /
    Ext3
    6 GB
    10 GB
    Swap
    None
    Swap
    1 GB
    4 GB
    Export
    /state/partition1
    Ext3
    10 GB
    Rest of disk

    1. Changing the default partition sizes

      If you're satisfied with the default layout, but want to change the root and swap partition sizes, use the custom-partition tool:

      # custom-partition -r <root partition size in MB> -s
      <swap partition size in MB> -b
      

      For example, to change the root partition size to 20 GB, and swap partition size to 2 GB, run the following command:

      # custom-partition -r 20000 -s 2000 -b
      

      The "custom-partition" tool creates the /export/home/install/site-profiles/4.1.1/nodes/extend-a uto-partition.xml file and rebuilds the Platform OCS distribution.

      For more information about the custom-partition tool, refer to the manpage or the Readme for Platform OCS Rolls.

    2. Changing the default partition layout

      To setup more complex partitioning, you need to manually create a replace-auto-partition.xml file that will replace the default layout and rebuild the Platform OCS distribution.

      Run the following commands:

      # cd /home/install/site-profiles/4.1.1/nodes
      # cp skeleton.xml replace-auto-partition.xml
      

      Open replace-auto-partition.xml with a text editor and:

      1. Delete the <package> and <post> sections
      2. For each partition you want to define, create a line with the <part> tag in between the <main> and </main> tags
      3. Between the <part> and </part> tags, specify the parameters for your partition. The parameters used are the same as those used for the RedHat Kickstart "part" directive.
      4. For more information on the different partition parameters, please refer to the Advanced Partitioning section of this guide.
        # cd /home/install ; rocks-dist dist
        

      For example:

      Suppose you want to create a partition layout on the first SCSI disk consisting of a 15 GB root partition, 2 GB swap partition, 5 GB /var partition, and a /data partition that takes up the rest of the disk. Here is what the XML file will look like:

      <?xml version="1.0" standalone="no"?>
      <kickstart>
      
      <description>
      </description>
      
      <changelog>
      </changelog>
      
      <main>
      <!-- Put your partitioning directives here -->
        <part> / --size 15000 --ondisk sda </part>
        <part> swap --size 2000 --ondisk sda </part>
        <part> /var --size 5000 --ondisk sda </part>
        <part> /data --size 1 --grow --ondisk sda </part>
      </main>
      
      </kickstart>
      
      
  2. Adding additional RPM packages

    The rocks-compute tool can be used to update the Platform OCS distribution with the user's own RPM packages. The tool allows you to add, list, or remove packages. The tool creates the /export/home/install/site-profiles/4.1.1/nodes/extend-comp ute.xml file and rebuilds the Platform OCS distribution. You can add as many packages as needed.

    • To add a custom package to the Platform OCS distribution and rebuild the distribution, run the following:
      # rocks-compute -a -p <path to the RPM package> -b
      


      Important: the rocks-compute tool does not check for RPM package dependencies for a given package. Ensure that you also add run the command above for any package dependencies.

    • To list all of the packages you added:
      # rocks-compute -l p
      

      The above command will list a unique ID for each package. This ID is used to remove a package from the distribution.

    • To remove a package from the Platform OCS distribution and rebuild the distribution, run:
      # rocks-compute -d -p <package ID> -b
      

    Example: Adding your own RPM package

    # rocks-compute -a -p /myshare/package-1.0.0.x86_64.rpm -b
    

    Example: Adding an RPM from the OS roll

    The OS roll contains RPMs for the Linux operating system. There may be some RPMs in the OS roll that you want to install but didn't get installed on the compute node. The steps are:

    1. Look for the RPM you want to install. Let's install the "ncompress" package:
      # find
      /home/install/ftp.rocksclusters.org/pub/rocks/rocks-4.1.
      1/rocks-dist/rolls/os/4.1.1/x86_64/RedHat/RPMS/ -name
      'ncompress*'
      
    2. Add the ncompress package to the Platform OCS distribution
      # rocks-compute -a -p
      /home/install/ftp.rocksclusters.org/pub/rocks/rocks-4.1.
      1/rocks-dist/rolls/os/4.1.1/x86_64/RedHat/RPMS/ncompress
      -4.2.4-40.x86_64.rpm -b
      
  3. Adding additional post-installation configuration scripts

    You may want to add your own post-installation scripts to configure a compute node. Some examples include turning on/off services, creating or updating configuration files, creating init scripts, etc.. The scripts are executed during the post-installation after all RPM packages have been installed. Create the script in a text editor, and save it to a file. The script must be a bash shell script.

    The rocks-compute tool can be used to update the Platform OCS distribution with the user's post-installation scripts. The tool allows you to add, list, or post-installation scripts. The tool creates the /export/home/install/site-profiles/4.1.1/nodes/extend-c ompute.xml file and rebuilds the Platform OCS distribution. You can add as many scripts as needed.

    • To add a post-installation script and rebuild the Platform OCS distribution:
      # rocks-compute -a -s <path to script> -b
      
    • To list the post-installation script(s) you added:
      # rocks-compute -l s
      

    The above command will list a unique ID for each script. This ID is used to remove a script from the distribution.

    • To delete a post-installation script, and rebuilding the Platform OCS distribution:
      # rocks-compute -d -s <script ID> -b
      

      Example: Adding a post-install script

      Suppose you have a Bash script that appends a library path to the /etc/ld.so.conf file. You can create a script that looks as follows.:

      #!/bin/bash
      echo "Appending /mypath/lib to /etc/ld.so.conf" >> /root/compute.log
      echo "/mypath/lib" >> /etc/ld.so.conf
      
      echo "Running ldconfig" >> /root/compute.log
      /sbin/ldconfig >> /root/compute.log 2>&1
      

      Let's assume the script is saved in /home/user/myscript.sh. You can run add the script by running:

      # rocks-compute -a -s /home/user/myscript.sh -b
      

      For more information about the rocks-compute tool, refer to the manpage or the Readme for Platform OCS Rolls.

Install compute nodes

Install compute nodes using either insert-ethers or add-hosts as follows:

Installing compute nodes using insert-ethers
  1. Log in to the Platform OCS frontend as root, and run insert-ethers.
  2. If you have a managed ethernet switch that sends out DHCP requests, select Ethernet Switches. If you didn't, proceed to the next step.

    Choosing Ethernet Switches will assign an IP address to the switch. You may need to wait several minutes for the switch to broadcast a DHCP request. When done, press F9 to quit.

  3. If you are installing a small cluster and you are not worried about assigning hostnames and IP addresses in the same order as the physical host layout, just run insert-ethers.

    If you do care about order, you need to tell the insert-ethers command which rack you are installing by specifying the rack number on the command-line. Let's assume you want to start with the first rack:

    # insert-ethers --cabinet=0
    

    The nodes will be named compute-0-0, compute-0-1, compute-0-2, and so on.

  4. Choose Compute from the list of appliances.
  5. Once insert-ethers is waiting for the compute node, you can PXE boot the node by either physically rebooting the node from the console or remotely logging into the console using Vendor IPMI or management tools.

    To make sure that your compute nodes are assigned hostnames and IP addresses in the correct order, you will need to PXE boot each machine, one at a time, in order of their physical location in the current rack you are installing. In other words, power up the bottom-most node in the rack, then work your way up, one node at a time.

  6. If the node is successfully detected, installation will begin and you should see the MAC address and compute node name on the insert-ethers screen.

    An asterisk (*) indicates that a kickstart file was requested by the compute node and installation should proceed normally. If there is no (*), Platform OCS will not install properly on the node and you should see an error on the compute node.

    Once a node has a (*), you can PXE boot the next node in the rack. Repeat the process for the rest of the nodes in the rack.

    If you see a (503) status, it means that the frontend is too busy to serve a Kickstart file to a node. In this case, try PXE booting the compute node again. If you see a "(500)" status, then an error occurred when generating the Kickstart file for the node. In this case, verify whether the Kickstart file can be generated locally on the frontend.

  7. You can monitor the installation of a compute node by either switching to the console of the compute node with a kvm switch or using management tools supplied by the hardware vendor. If you do not have a kvm switch or you have not configured the hardware management utilities you can still monitor the installation progress of the compute node by creating a secure shell connection to the compute node.
    # ssh compute-0-0 -p 2200
    
  8. You should see the install progress on the compute node.
  9. Once installation is complete, the node will automatically reboot and join the cluster.
  10. Once you have finished installing all of the compute nodes in rack `0', exit `insert-ethers' and run it again incrementing the cabinet number, for example:
    # insert-ethers --cabinet=1
    

    Return to Step 4 and repeat the installation process for the rack.

Installing compute nodes using add-hosts

For small clusters insert-ethers is the quickest and easiest way to install Platform OCS. However, for larger clusters of 128 nodes and beyond, the add-hosts tool provides better configuration management. A large cluster requires planning out the layout of the network, switches, racks, and nodes in the cluster. The add-hosts tool is an easy way to plan out the cluster layout, when a list of all of the MAC addresses is provided by the hardware vendor.

The steps are as follows:

  1. Obtain a list of the MAC addresses for all nodes in the cluster. Save the addresses in a text file. For example, /opt/rocks/etc/mac.txt. The MAC addresses must be listed in the order in which you plan to add the hosts. In other words, the first MAC address corresponds to the first node in the first rack, the second MAC to the second node in the first rack, and so on.
  2. Create an XML configuration file in /opt/rocks/etc/add-hostsrc to define the names, IP addresses, and appliances that you will be installing.
  3. For brevity, we will show you a sample add-hostsrc file and MAC address file, and ask that you refer to the Advanced Administration section of this guide for more information on setting up the add-hostsrc file.

    Suppose that you are installing 5 compute node, located in the same rack, in a class B network (i.e. netmask is 255.255.0.0). Assume that you want to assign IP address starting from 10.1.1.5. Your MAC address file and add-hostsrc file will contain:

    MAC Address file:

    00:11:22:33:44:55        # first compute node
    00:11:22:33:44:56        # second compute node
    00:11:22:33:45:57        # etc......
    00:11:22:33:45:58
    00:11:22:33:45:59
    

    Add-hostsrc file:

    <?xml version="1.0" standalone="yes"?>
    
    <add-hosts>
      <mac_addr_file value = "/opt/rocks/etc/mac.txt" />
      <num_hosts_per_rack value = "10" />
      <order_by_rack value = "yes" />
      <netmask value = "255.255.0.0" />
    
      <subnet>
        <host_prefix value = "compute" />
        <baseip value = "10.1.1.5" />
        <num_hosts_in_subnet value = "5" />
        <appliance value = "compute" />
      </subnet>
    
    </add-hosts>
    
  4. Run add-hosts to populate the database based on the information in the XML file above
    # add-hosts
    

    The following information is added to the Platform OCS database:

    Hostname IP Address MAC Address
    compute-0-0
    10.1.1.5
    00:11:22:33:44:55
    compute-0-1
    10.1.1.6
    00:11:22:33:44:56
    compute-0-2
    10.1.1.7
    00:11:22:33:44:57
    compute-0-3
    10.1.1.8
    00:11:22:33:44:58
    compute-0-4
    10.1.1.9
    00:11:22:33:44:59

  5. PXE boot your compute nodes. Note that the order in which your nodes are PXE booted is not important since the node information is already in the database. You can PXE boot several hosts at the same time.

Install other appliance types

In addition to compute nodes, you can install other appliance types:

Install an LSF HPC master candidate host

If you installed the LSF HPC roll, you can install LSF HPC master nodes to fail-over the LSF HPC master host to another host. This increases cluster uptime and availability. We recommend installing one or more LSF HPC master nodes if you are setting up a large cluster.

Install an LSF HPC master candidate host using the following steps:

  1. Log into the frontend as root
  2. Run insert-ethers and select the LSF HPC Master appliance type.
  3. Install one or more of the LSF HPC Master nodes using PXE boot.
  4. Exit insert-ethers by pressing F9 to update the lsf.cluster.lsfhpc file.
  5. Create an NFS shared path on another NFS server, and make sure that this NFS path can be mounted on the new LSF HPC master node.
  6. On the frontend, run the following:
    # cd /home/install/upgrades/lsfhpc
    # config-lsf-master
    
  7. Answer the dialog questions when prompted by the script.
Install a PVFS2 meta server

If you installed the PVFS2 roll, you can install this appliance type. The PVFS2 appliance installs a server that acts as both a PVFS2 Meta Server and Data Server. It will create a sample PVFS2 filesystem that is mounted under /mnt/pvfs2.

Install a PVFS2 meta server using the following steps:

  1. Log into the frontend as root
  2. Run insert-ethers and select the Pvfs2-meta-server appliance type.
  3. Install the PVFS2 meta server using PXE boot
  4. Repeat the process till all the nodes to be used as Data Servers are installed.
  5. Exit insert-ethers by pressing F9.
  6. Follow the instructions in the PVFS2 Roll section under Production Cluster Configuration to complete the configuration.

Test compute nodes and appliances

You can test the compute nodes and appliances as follows:

  1. Check if you can log into the compute node without a password:
    # ssh <compute node name>
    
  2. Check DNS by resolving the frontend's hostname:
    # host <frontend's local name>
    
  3. Check if /home/install is auto-mounted:
    # ls /home/install/
    
  4. Check if 411 can update all of the files on the compute node:
    # 411get --all
    
  5. If you installed an LSF HPC master candidate host, perform the following tests:
    1. Make sure the license is installed.
    2. Run the compute node tests
    3. Run the lsid command to check that the cluster is up
    4. Run lsadmin ckconfig and badmin ckconfig. There should be no errors.
  6. If you installed a PVFS2 meta server, check that the /mnt/pvfs2 path is mounted. Test that other compute nodes can also mount the /mnt/pvfs2 path.

Test the cluster installation

Before proceeding further, make sure you have completed the post-install tests for the frontend and compute nodes. When done, run the following tests to ensure that your cluster is functioning properly.

  1. Run Cluster-fork to verify that all nodes can be connected
    # cluster-fork hostname
    
  2. Test 411 to verify that 411 broadcasts can be sent out to all nodes
    # make -C /var/411 force
    
  3. Check Ganglia (if installed). Point your browser to http://localhost/ganglia and verify that all nodes appear on the webpage.
  4. Check Clumon (if installed). Point your browser to http://localhost/clumon and verify that all nodes appear on the webpage.
  5. Check Lava cluster (if installed) to see if all nodes appear in the cluster
    # lsid
    # lsload
    # bhosts
    
  6. Check LSF HPC cluster (if installed) to see if all nodes appear in the cluster
    # lsid
    # lsload
    # bhosts
    

[ Top ]


Basic Administration

The following topics describe basic tasks when administrating your Platform OCS cluster:

Online documentation

Online documentation is provided online by the frontend node. Start a browser on the frontend. It will default to the Cluster page, which contains links to the following guides:

Roll-specific documentation is available from the Installed Rolls link, by following the Guide or Readme links beside the roll of interest.

Clumon

Clumon is a cluster job-monitoring tool that allows the administrator to see: the states of jobs, view job queues, load information, resource usage and process information and if a node's scheduler daemons are up or down.

Clumon represents the system load by colours on a bar for each compute node. Icons representing nodes will have indicators denoting the load of the node, red indicating high or heavy load, and various levels of blue to indicate a lighter load. If a node is experiencing problems or is down, the node will become a black and red crossbones icon. If you move your mouse over a node icon, a popup note will appear and provides summary information about the node.

To view Clumon information, go to the main cluster webpage, click on "Cluster Status (CluMon)" link, or point your browser to http://localhost/clumon (on the frontend).

In a screen with running jobs, you can examine each job's state by clicking on the job number, or by examining the queues:

Platform Lava GUI

The Lava web GUI is a frontend to the Lava batch scheduling system. Users can submit jobs and perform actions such as suspending, resuming or killing jobs.

To submit or modify jobs go to the Lava GUI web interface. Go to the main cluster webpage, click on "Lava GUI". You will need to log into the interface. User root is not permitted to login. Log in to the interface using an existing user account or the lavaadmin account.

The following is a Lava GUI dialog window for submitting a job:

Ganglia

Ganglia is a cluster statistics collector which monitors node availability, displays system load, network usage, and other resource information over a period of time. Data is collected for each metric and is stored on the frontend. The data is stored for up to one year.

Ganglia displays detailed information regarding the usage of each node and provide the administrator a guide as to the day-to-day functions of the cluster.
To view Ganglia information, go to the main cluster webpage, click on "Cluster Status (Ganglia)" link.

The following is a Ganglia display showing the overview of a cluster:

Ntop

Ntop is a network traffic analyzer designed to show the administrator the different protocol traffic passing through the frontend.. Ntop can also show network traffic patterns to better diagnose network problems and network utilization issues.

By default, Ntop is configured for both public and private traffic with one interface always listening. You can switch which network interface ntop should listen on by clicking the Admin menu option and selecting "Switch NIC". From there a new screen will appear and you can then select which network interface to listen on.

Ntop provides several plug-ins that can be enabled or disabled for further analysis of the traffic. See the plugins page within Ntop for more details.

To view the Ntop page, go to the main cluster webpage, click on "Ntop Cluster Monitoring (SSL)" link.

The following is an Ntop display showing active TCP and UDP sessions connected to a frontend on a private network:

SSH

By default, the OpenSSH daemon is configured to enable X11 forwarding. This can sometimes slow down connecting to nodes. You can disable forwarding by using the -x option when connecting to a node to skip X11 forwarding.

This can also be disabled permanently by editing the /etc/ssh/ssh_config file and changing the line ForwardX11 Yes and setting this to No.

An SSH connection from one node to another may be slow in setting up. This is usually because of a name resolution failure, and subsequent timeout. This can occur if the frontend was installed with an invalid DNS server.


Note: this will also slow MPI jobs.

Adding, removing, or upgrading rolls

Platform OCS provides a tool that allows the user to do roll maintenance on their frontend.

Adding a roll

Using the rollops tool, you can add a roll to the frontend. To do this, you need a CD/DVD roll or you can download an ISO image.

  1. Insert the CD/DVD roll into the drive or use the -i option to rollops
  2. Run either of the following:
    • For a regular CD/DVD roll:
      # rollops -a
      
    • For an ISO image roll:
      # rollops -a -i isoimage
      

    Example output:

    rollops: Copying Roll: ntop
    Copying roll from media (directory "/tmp/tmprcwC0V") into
    mirror
    Copying "ntop" (4.1.1,x86_64) roll...
    7645 blocks
    chmod a+rx /home/install/ftp.rocksclusters.org
    Installing Roll: ntop, please wait...
    <Roll installation output>
    rollops: The 'ntop' roll has been successfully installed!
    


    Note: If the CD/DVD roll or the ISO image is a meta-roll (a roll that contains many rolls in one), you will see a list of rolls to install.

    rollops: Autodetecting CD-ROM/DVD roll...
    
    Rolls found
    
    1) clumon
    2) extras
    3) ganglia
    4) lsfhpc
    5) modules
    6) myrinet
    7) ntop
    8) pvfs2
    9) ts_ib
    
    q) Quit
    
    To install a roll, type the number or type "q" to quit>
    
Upgrading a roll
  1. Insert the CD/DVD roll into the drive or use the rollops -i option.
  2. Run either of the following:
    • For a regular CD/DVD roll:
      # rollops -u
      
    • For an ISO roll:
      # rollops -u -i isoimage
      

    Example output:

    rollops: Copying Roll: dell
    Copying roll from media (directory "/tmp/tmprcwC0V") into
    mirror
    Copying "dell" (4.1.1,x86_64) roll...
    7645 blocks
    chmod a+rx /home/install/ftp.rocksclusters.org
    Installing Roll: dell, please wait...
    <Roll installation output>
    rollops: The 'dell' roll has been successfully upgraded!
    


    Note: If the CD/DVD roll or the ISO image is a meta-roll (a roll that contains many rolls in one), you will see a list of rolls to perform an upgrade.
    You can upgrade to rolls with the same version or with a newer version but cannot rollback to an older roll.

    rollops: Autodetecting CD-ROM/DVD roll...
    
    Rolls found
    
    1) myrinet
    2) dell
    3) intel_mpirt
    4) ts_ib
    
    q) Quit
    
    To upgrade a roll, type the number or type "q" to quit>
    
Removing a roll

To remove the roll from the frontend, run the following command:

# rollops -e <roll_name>

Example output:

rollops: Removing Roll: 'ntop', please wait...
<Roll removal output>
rollops: The 'ntop' roll has been removed successfully!
Disabling a roll

To disable a roll from being installed on a compute node run the following command:

# rollops -p no -r <roll_name>

Example output:

rollops: Setting permissions for the 'ntop' roll. Please
wait...
rollops: Completed updating permissions for the 'ntop' roll.

Adding or removing users

To add a user or delete a user, you must be logged into the frontend as root. After a user is added or removed, 411 automatically updates the user information on all of the nodes in the cluster.

Adding a user
Removing a user

To remove a user, run the following command:

# userdel <user_name>

Firewall/iptables

The frontend is installed with firewalling software (iptables). It is configured with some basic forwarding rules. From a network security standpoint the frontend and nodes are not secure. Evaluate the security risks at your site and create appropriate firewall rules to secure the cluster.


Warning: The frontend should never be connected to the Internet without first restricting the type of packets allowed by customizing the iptables rules.

By default, services are only visible to the private network. However, you may choose to enable HTTP and HTTPS over the public network. Please note that this will expose your cluster homepage and Platform OCS database to the external network. To open HTTP and HTTPS access, edit the /etc/sysconfig/iptables file and uncomment the following lines:

# Uncomment the lines below to activate web access to the
cluster.
-A INPUT -m state --state NEW -p tcp --dport https -j ACCEPT
-A INPUT -m state --state NEW -p tcp --dport www -j ACCEPT

Then, restart iptables:

# service iptables restart

For details on customizing your firewall, see http://www.netfilter.org.

The default routing of Platform OCS is to use eth0 for private and eth1 for public traffic.

Platform OCS services and utilities

411

Platform OCS provides a service called 411. This is very similar to NIS. It is used to synchronize files across a cluster. This is done via multicasting a notification of change from the frontend then having the nodes download the file over an encrypted channel. Users and groups are one example of information passed over 411. Whenever you run useradd or userdel, 411 will update the user information on all nodes in the cluster.The diagram below depicts the process.

By default the following files are propagated throughout the cluster by 411:

If you have made any changes to the files listed above. Running the command make -C /var/411 will push the updated files to the cluster.

You can also have compute nodes pull any 411 synchronized files by running 411get --all on the compute node to retrieve all files. To update all compute nodes, run the following command:

# cluster-fork 411get --all

See the Advanced Administration for how to customize 411.

Rocks-grub

The rocks-grub service is a tool that forces an appliance such as a compute node to reinstall if the node is powered off incorrectly, such as a power outage. If the service is turned ON, the node will be reinst