Contents:
SciDB on a cluster.
SciDB is designed to run on a shared nothing cluster of commodity servers each with local storage, for instance with a JBOD configuration, and interconnected by an Ethernet network. One server in the cluster is selected to be a 'coordinator' and the others are 'workers' that participate in data storage and query processing. SciDB currently supports a single coordinator node. The coordinator is where external client applications connect, and it is responsible for parsing, planning and scheduling query operations over the collection of SciDB worker nodes.
Our software makes use of a kind of distributed file system internally, only the 'files' are logical 'arrays', and the bits and pieces of the arrays are distributed evenly over the physical nodes. To keep track of all of this information -- which nodes are where, how data is partitioned, and even what physical operators the instance has available to it -- we use a centralized PostgresSQL DBMS.
In a single-node SciDB setup, the same node will run as both coordinator, and worker.
Installing SciDB
Install SciDB on the coordinator as described in the SciDB Quick Install Guide.
Prepare the cluster
- Create a user account, say scidb on the cluster.
- Configure password-less ssh logins from master into each of the nodes for this account. Here's a quick tutorial on how to set this up. If you have a better method, please add it here.
- Correctly configuring SciDB requires setting up a PostgresSQL instance, and making a database in it to hold the SciDB catalog. Get sudo privileges for this account. sudo privileges are required for creating a postgres account for SciDB and for initializing the catalog database.
- Check that postgres is running on the coordinator.
sudo /etc/init.d/postgresql-8.4 status sudo /etc/init.d/postgresql-8.4 start
If you cannot obtain sudo privileges for the scidb account ask your system administrator to run this script as the 'postgres' user:
/opt/scidb-0.7.5/bin/scidb-prepare-db.sh
This script is used to create a new role or account (scidb_user1) with password (scidb_passwd1) and a database for testing scidb (say test1 or cluster1).
Additionally, postgres may need to be configured to accept connections from remote worker nodes.
- Install and configure pdsh on the master node. pdsh is a Linux utility to execute commands on the cluster and gather up the results.
sudo apt-get install pdsh # Configure pdsh by setting this in your .bashrc: export PDSH_RCMD_TYPE=ssh pdsh -w node[2-4] "uptime"
- Export "/opt" via NFS or samba from the master to the worker nodes. This file system will be used to share binaries, scripts and configuration files between the coordinator and workers. Mount this file system from the other worker nodes. This file share should be available as /opt from all the nodes in the cluster.
Configuring SciDB
- Edit the config file /opt/scidb-0.7.5/etc/config.ini for your cluster. Two canned config files are provided for testing. config.ini is a generic config file that you must customize for your environment. config.ini.planet is a config file for the SciDB planet cluster.
[cluster1] master_ip=10.0.0.1 worker_ip=10.0.0.2,10.0.0.3,10.0.0.4 db_user=scidb_user1 db_passwd=scidb_passwd1 install_root=/opt/scidb-0.7.5 metadata=/opt/scidb-0.7.5/share/scidb/meta.sql pluginsdir=/opt/scidb-0.7.5/lib/scidb/plugins logconf=/opt/scidb-0.7.5/share/scidb/log4cxx.properties master_data_dir=/mnt/master worker_data_dir1=/mnt/slave master_port=1340 interface=eth0
The config file format is described in the SciDB Quick Install Guide.
Launch SciDB
The command scidb-0.7.5 allows a SciDB user to initialize, start, stop and monitor nodes in the SciDB cluster. This command is run from the coordinator.
- Add these to your path.
export PATH=/opt/scidb-0.7.5/bin:/opt/scidb-0.7.5/share/scidb:$PATH export LD_LIBRARY_PATH=/opt/scidb-0.7.5/lib:$LD_LIBRARY_PATH
- Initialize the master node using 'minit'. Then initialize the worker nodes using 'winit'.
scidb-0.7.5 minit cluster1 scidb-0.7.5 winit cluster1
- Start all nodes, master and workers. 'mstart' starts the master. 'wstart' starts all workers in the cluster.
scidb-0.7.5 mstart cluster1 scidb-0.7.5 wstart cluster1
- Get status of the SciDB service instance. Note that this only reads the metadata in the catalog. It does not indicate if the processes are currently alive.
scidb-0.7.5 status cluster1
- Stop all nodes, master and workers using 'mstop' and 'wstop'.
scidb-0.7.5 mstop cluster1 scidb-0.7.5 wstop cluster1
Running queries
- As before, test by running iquery.
iquery -aq "list('arrays')"By default iquery contacts SciDB on the standard port number (1239). If you used a non-default port number as in the cluster1 example above, use:iquery -p 1340 -aq "list('arrays')"
Congratulations! You are done! Other AIL and AQL commands are available here and here.