HCL cluster/heterogeneous.ucd.ie install log
- Basic installation of Debian Squeeze
Contents
Networking
Interfaces
- edit
/etc/networks/interfaces
Note that at some point eth1 should be configured by DHCP, it is on the UCD LAN and must be registered correctly (update MAC address with services).eth0
is the internal network.
# The loopback network interface
auto lo eth0 eth1
iface lo inet loopback
# The primary network interface
allow-hotplug eth0
iface eth0 inet static
address 192.168.21.254
netmask 255.255.255.0
gateway 192.168.21.1
iface eth1 inet static
address 193.1.132.124
netmask 255.255.252.0
gateway 193.1.132.1
- Install non-free linux firmware for network interface (eth0). This will allow Gigabit operation on eth0 with the tg3 hardware (I think). Edit
/etc/apt/sources.list
including the lines:
deb http://ftp.ie.debian.org/debian/ squeeze main contrib non-free
deb-src http://ftp.ie.debian.org/debian/ squeeze main contrib non-free
- Install firmware-linux:
apt-get update && apt-get install firmware-linux
DNS / BIND
We will run our own DNS server for the cluster. First set resolv.conf:
nameserver 127.0.0.1
nameserver 137.43.116.19
nameserver 137.43.116.17
nameserver 137.43.105.22
domain ucd.ie
search ucd.ie
Now install bind9 (apt-get install bind9
). Edit /etc/bind/named.conf.local
and set the domain zones for the cluster (forwards and reverse). We have two subdomains where reverse lookups will have to be specified 192.168.20 and 192.168.21
//
// Do any local configuration here
//
// Consider adding the 1918 zones here, if they are not used in your
// organization
//include "/etc/bind/zones.rfc1918";
include "/etc/bind/rndc.key";
controls {
inet 127.0.0.1 allow { localhost; } keys { "rndc-key"; };
};
zone "heterogeneous.ucd.ie" {
type master;
file "db.heterogeneous.ucd.ie";
};
zone "21.168.192.in-addr.arpa" {
type master;
file "db.192.168.21";
};
zone "20.168.192.in-addr.arpa" {
type master;
file "db.192.168.20";
};
Also edit the options file: /etc/bind/named.conf.options
, note the subnet we define in the allow sections, 192.168.20/23, it will permit access from 192.168.20.* and 192.168.21.* addresses.
options {
directory "/var/cache/bind";
// If there is a firewall between you and nameservers you want
// to talk to, you may need to fix the firewall to allow multiple
// ports to talk. See http://www.kb.cert.org/vuls/id/800113
// If your ISP provided one or more IP addresses for stable
// nameservers, you probably want to use them as forwarders.
// Uncomment the following block, and insert the addresses replacing
// the all-0's placeholder.
forwarders {
137.43.116.19;
137.43.116.17;
137.43.105.22;
};
recursion yes;
version "REFUSED";
allow-recursion {
127.0.0.1;
192.168.20.0/23;
};
allow-query {
127.0.0.1;
192.168.20.0/23;
};
auth-nxdomain no; # conform to RFC1035
listen-on-v6 { any; };
};
Now work on the zone files specified db.heterogneneous.ucd.ie
and the reverse maps db.192.168.21
& db.192.168.21
Populate them with all nodes of the cluster.
IP Tables
- Set up
iptables
. We want to implement NAT between the internal network (eth0
) and external one (eth1
). Add a script to/etc/network/if-up.d
directory, named00iptables
. All scripts in this directory will be executed after network interfaces are brought up, so this will persist:
#!/bin/sh
PATH=/usr/sbin:/sbin:/bin:/usr/bin
IF_INT=eth0
IF_EXT=eth1
#
# delete all existing rules.
#
iptables -F
iptables -t nat -F
iptables -t mangle -F
iptables -X
# Always accept loopback traffic
iptables -A INPUT -i lo -j ACCEPT
# Allow established connections, and those not coming from the outside
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -m state --state NEW ! -i $IF_EXT -j ACCEPT
iptables -A FORWARD -i $IF_EXT -o $IF_INT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Allow outgoing connections from the LAN side.
iptables -A FORWARD -i $IF_INT -o $IF_EXT -j ACCEPT
# Masquerade.
iptables -t nat -A POSTROUTING -o $IF_EXT -j MASQUERADE
# Don't forward from the outside to the inside.
iptables -A FORWARD -i $IF_EXT -o $IF_EXT -j REJECT
# Enable routing.
echo 1 > /proc/sys/net/ipv4/ip_forward
Clonezilla
Firstly, Clonezilla is probably going to pollute a lot of your server configuration when it sets itself up. Be prepared to loose your IPtables configuration, NFS (if any) and DHCP settings. Maybe more.
- follow the guide to installing Clonezilla here. Essentially:
- add repository key
wget -q http://drbl.sourceforge.net/GPG-KEY-DRBL -O- | apt-key add -
- the line add to /etc/apt/sources.list:
deb http://drbl.sourceforge.net/drbl-core drbl stable
- run:
apt-get update && apt-get install drbl && /opt/drbl/sbin/drbl4imp
- accept default options to drbl4imp.
- add repository key
- After Clonezilla has installed edit
/etc/dhcpd3/dhcpd.conf
, adding all entries for test nodeshcl07
andhcl03
. Also ensure these nodes have been removed from the inplace heterogeneous.ucd.ie server so that they are only served by one machine.
default-lease-time 300;
max-lease-time 300;
option subnet-mask 255.255.255.0;
option domain-name-servers 137.43.116.19,137.43.116.17,137.43.105.22;
option domain-name "ucd.ie";
ddns-update-style none; # brett had ad-hoc ...?
server-name drbl;
filename = "pxelinux.0";
subnet 192.168.21.0 netmask 255.255.255.0 {
option subnet-mask 255.255.255.0;
option routers 192.168.21.1;
next-server 192.168.21.254;
pool {
# allow members of "DRBL-Client";
range 192.168.21.200 192.168.21.212;
}
host hcl03 {
option host-name "hcl03.ucd.ie";
hardware ethernet 00:14:22:0A:22:6C;
fixed-address 192.168.21.5;
}
host hcl03_eth1 {
option host-name "hcl03_eth1.ucd.ie";
hardware ethernet 00:14:22:0A:22:6D;
fixed-address 192.168.21.105;
}
host hcl07 {
option host-name "hcl07.ucd.ie";
hardware ethernet 00:14:22:0A:20:E2;
fixed-address 192.168.21.9;
}
host hcl07_eth1 {
option host-name "hcl07_eth1.ucd.ie";
hardware ethernet 00:14:22:0A:20:E3;
fixed-address 192.168.21.109;
}
default-lease-time 21600;
max-lease-time 43200;
}
Install DHCP
Install the DHCP server package with apt-get install dhcp3-server
. When you install Clonezilla it will probably pollute your DHCP server setup, so make
Install NIS
Copy users from passwd
, groups
and shadow
from /etc
on hcl01
.
Install nis.
Edit /etc/defaultdomain
so that it contains:
heterogeneous.ucd.ie
Edit /etc/defaults/nis
so that it contains:
# Are we a NIS server and if so what kind (values: false, slave, master) NISSERVER=master
Edit /etc/ypserv.securenets
so that is contains:
# allow connects from local 255.0.0.0 127.0.0.0 # allow connections from heterogeneous subnets .20 and .21 255.255.254.0 192.168.20.0
The NIS host is also a client of itself, so do the client set up as follows:
Edit /etc/hosts
end ensure the NIS Master is listed
192.168.21.254 heterogeneous.ucd.ie heterogeneous
Edit /etc/yp.conf
and ensure that it contains:
domain heterogeneous.ucd.ie server localhost
Edit /etc/passwd
adding a line to the end that reads: +::::::
. Edit /etc/group
with a line +:::
at the line.
The NIS Makefile will not pull userid and groupids that are lower than a certain value, we must set this to 500 in /var/yp/Makefile
MINUID=500 MINGID=500
Start the ypbind
and yppasswd
services. Then initialise the NIS database:
/usr/lib/yp/ypinit -m
Accept defaults at prompts.
Now start other NIS services
service nis start
Installing Ganglia Frontend
Install the packages gmetad and ganglia-webfrontend.
Configure the front end by appending to /etc/apache2/apache2.conf
, the following:
Include /etc/ganglia-webfrontend/apache.conf
Configure gmetad by adding to the /etc/ganglia/gmetad.conf
, the following line:
data_source "HCL Cluster" 192.168.20.1 192.168.20.16
This means that the gmetad collector connect to hcl01 and hcl16 on the .20 subnet to gather data for the frontend to use.
After all packages are configured execute:
service apache2 restart
service gmetad restart
Pointing your browser to here should display the monitoring page for HCL Cluster. gmond
must also be installed and configured on the cluster nodes.
Hardware Monitoring & Backup
Disk Monitoring
Install smartmontools as per here. Briefly:
apt-get install smartmontools
Edit /etc/defaults/smartmontools
so that it contains:
# List of devices you want to explicitly enable S.M.A.R.T. for
# Not needed (and not recommended) if the device is monitored by smartd
enable_smart="/dev/sda"
# uncomment to start smartd on system startup
start_smartd=yes
# uncomment to pass additional options to smartd on startup
smartd_opts="--interval=1800"
Open /etc/smartd.conf
and edit the first line that begins with DEVICESCAN (all lines after the first instance of DEVICESCAN are ignored). Have it read something like:
DEVICESCAN -d removable -n standby -m root -m robert_higgins@iol.ie -M exec /usr/share/smartmontools/smartd-runner
Then start the service /etc/init.d/smartmontools start
Note, consider installing this on all nodes, as it would be interesting to have prior notice of any failing disks.
Torque - PBS
Allow all users to see all queued jobs:
qmgr -c 'set server query_other_jobs=TRUE'