Contents:
- Architectural Overview of SciDB
Architectural Overview of SciDB
This page is an overview of the SciDB architecture. It contains a high level description of the engine's features and functionality, a description of how these features is implemented, and breaks the code base down into it's various modules and the API's between them. This is intended to be the first point of reference for new users and developers.
We begin with an overview of SciDB; our system's motivations and design goals. In brief, SciDB is a data management system (DBMS) and massively parallel computational platform intended to be used for the management and analysis in large scientific applications. In Section 1 we present an overview of SciDB's architecture, showing how the various moving parts are organized. SciDB is an example of a shared-nothing architecture, consisting of a large number of worker nodes where data is stored, and local computation is performed. In Section 2 we briefly describe our client APIs; a C/C++ library used to implement our command line tool iquery, and a Python API. In Section 3 we introduce our data model and query language(s). Both of these features are described in more detail on other pages.
Starting in Section 4 we present a more detailed review of the internals of SciDB. We describe the internal modules of the SciDB engine, and describe how each works. We begin by reviewing how a SciDB engine starts up; how it gets configuration parameters, etc. Then we present a detailed walk-through showing how SciDB processes a user query.
Finally we present a detailed call-graph. The motivation for this section is to provide new SciDB engine developers with a code-level road-map.
1. Introduction
SciDB is a database management system (DBMS) and parallel computation platform that provides users with an array data model and declarative query language(s). The following figure illustrates what a running SciDB installation will look like.
Figure 1. SciDB Installation Architecture
Key to Figure 1:
| 1 | SciDB is a client/server system. External applications connect to a SciDB Coordinator Node on a well-known host, at a well known port number. External applications send queries to the SciDB installation, and receive data. SciDB supports both C/C++ and Python client APIs. |
| 2 | Each SciDB client application connects to a SciDB Coordinator Node. All SciDB engine instances--Coordinator and Worker nodes--are simply instances of the same SciDB Engine image running as a daemon process. The Coordinator Node to which the user connects is responsible for query parsing, query planning, coordinating the work being performed by the set of Worker Nodes, and communicating query results (and/or errors) back to the external client application. |
| 3 | Currently SciDB uses a PostgreSQL instance to manager metadata about the overall SciDB installation. The PostgreSQL catalogs contain information about the set of SciDB nodes, the arrays being stored in the system, the set of operators supported in the query language(s), and the external libraries which implement user-defined extensions to the SciDB core. Note that all SciDB instances--Coordinator and Worker nodes--connect to PostgreSQL, which manages all global (installation-wide) meta-data. |
| 4 | The actual storage and computational workload is performed by a (possibly quite large) number of SciDB Worker nodes. Each Worker node is a completely independent instance of SciDB, responsible for local storage management, and query execution. Queries which engage more than one Worker node typically require that the SciDB Coordinator schedule data movement between Worker nodes. |
| 5 | Each SciDB node--either a Coordinator node, or a Worker node--is a multi-threaded server process. It listens on a (well known) port for instructions (from the Coordinator node) or data (from another Worker node). Locally, each Worker node's engine applies localized operations to the data in it's local store. |
| 6 | SciDB Worker nodes each manage a portion of the data that makes up the arrays which are the primary organizational unit of the SciDB data model. SciDB divides a single logical array into independent partitions (sub-arrays) and distributes the partitions over the available Worker nodes. Part of the task of the Persistent System Catalog facility (PostgreSQL) is to maintain information about the mapping between the logical array's components and how they correspond to the physical data stored on each Worker node. |
The general pattern in a SciDB instance is to have a network file-system containing the SciDB images--engine, client libraries, etc--available on all nodes in the same filesystem location. Then on a per-node basis, each Worker node is assigned its own storage area. On startup (which as we will see is controlled centrally), on each of the physical computers over which SciDB is to run, a (number of) SciDB engine processes are initiated. Each reads their configuration from a well known location (on the networked filesystem). Each connected to the PostgreSQL persistent system catalog facility to register their presence.
2. Client Interface
SciDB follows a simple client/server application connectivity model. Applications are programs written in languages like C/C++ or Python. They run as independent programs, and connect to the SciDB engine, which meets the application's data management and manipulation needs. That is, SciDB is not an embedded system.
3. Query Language(s)
SciDB provides users with an Array data model. The basic idea is that data in a SciDB DBMS is organised around rectangular arrays. To manipulate data stored in SciDB arrays, users can employ one of two query languages; either the Array Functional Language (AFL) or the Array Query Language (AQL).
3.1 Logical Data Model
Each array has:
- A name, which uniquely identifies the array in the SciDB installation.
- A list of dimensions. The number of dimensions is called the rank of the array, and the number of possible values in each dimensions together with the start and end points are called the size of the array.
- A set of attributes, each of which consists of a name, a data type and some constraints.
For example:
CREATE ARRAY Raw_Load_Array_A < label : string, seconds : int64, value : double > [ Offset=0:99,100,0 ]; CREATE ARRAY A < value : double > [ label(string)=20,20,0, seconds=0:4,5,0 ]; CREATE ARRAY Night_Sky < Red : double, Green : double, Blue : double > [ RA=0:180000000,18000,180, DECL=0:360000000,36000,360 ];
3.2 Array Functional Language - AFL
AFL is a functional language. An AFL expression is an "tree" of operators which transform their inputs (typically an array or a pair of arrays) into another array of some other form.
3.3 Array Query Language - AQL
AQL is a declarative language based on SQL--which is 'inter-galactic database speak' in the Relational DBMS world.
4. Engine Startup
SciDB is an example of a massively parallel DBMS engine. This means that each instance of the SciDB server process runs independently of all of the other instances. The collection of server instances communicates using i. the persistent system catalog facilities, and ii. by exchanging messages that contain data and instructions directly.
5. Query Planner and Optimizer
6. Query Execution
6.1 Distributed Query Execution
6.2 Local Executor Design
7. Storage Manager
7.1 Distributed Storage Management
7.2 Local Storage Management
8. Call Graph
In this section we present a "call graph" overview of the SciDB engine. The idea is to illustrate the relationships between the principle components of SciDB.
Startup and Initialisation of a SciDB Engine
| main() | |||
| Config() | Get the instance's configuration details from the command line, environment variables, and configuration files, and register them. The Config object provides a API for setting and getting these per-instance parameters. The Config object is a Singleton. If at any point you want to access one of the global Singleton objects in SciDB (for example, the TypeLibrary, or more generally any Singleton of class T ) you would use T *cfg = T::getInstance(); | ||
| log4cxx... | Configure and create event loggers. | ||
| ? deamon() | SciDB engine (by default) runs as a daemon (background) process. | ||
| NetworkManager()->run() | Register the instance engine with the global meta-data (the Postgres DBMS). Then set up the engine to listen on a well known port (1239) and allocate a pool of threads to handle tasks as they arrive. The NetworkManager organises a collection of objects to listens on its configured port(s), and schedules Jobs to Threads in response to messages either from clients (iquery) or from other SciDB instances. | ||
| Config() | Get network configuration details; port, etc. | ||
| SystemCatalog() | Establish connection from this instance to Postgres DBMS catalog manager. The SystemCatalog object is another Singleton. It manages information about the SciDB cluster, arrays, and library modules. | ||
| StorageManager() | Initialize the (local) Storage Manager. The StorageManager object is another Singleton. If this instance of SciDB is being re-started after being stopped, then the StorageManager contains information about the instance, such as the NodeID. Open the local instance's storage.header file and populate the StorageManager with vector<Segments> describing each data file. If the header does not exist (because this is the first time the instance has started) then the StorageManager create it and initialise the first storage.data file. Each instance's storage management consists of a header file named storage.header, and a (collection of) data files named storage.dataN. The directory location of these (per-instance) files is configured in config.ini, and is kept in the global Config object. | ||
| (Optional) SystemCatalog::addNode() | Register this node with the Postgres system catalog. Initially a local instance doesn't know its own NodeID. The return result of the SystemCatalog::addNode() call is the instance's NodeID, assigned by the Postgres DBMS, and subsequently written to the instance's disk through the StorageManager. | ||
| PluginManager() | Initialise the instance's plugin manager. The PluginManager() tracks the status of the shared libraries containing user-defined types, functions and operators. Internally, this initialisation interrogates the system catalogs for a list of library names, and tries to load each of them. | ||
| NetworkManager::startAccept() | Initialize the asynchronous IO service (asio) handlers which associate a Connector object with the socket. Creates a Connection object, and then instructs SciDB to invoke NetworkManager::handleAccept(), passing it the Connection object and some context when a client application connects to the SciDB engine. | ||
| boost::asio::io_service.run() | Having created the handler for external messages, wait for one The SciDB engine suspends, waiting for external messages. At this point the SciDB engine has completed it's startup and initialisation. | ||
When an external process connects to the socket on which the SciDB instance is listening, the boost::asio::io_service facilities call back to NetworkManager::handleAccept(). This method handles any errors and exceptions generated by the boost::asio facilities before passing the message on to be handled by a SciDB Connection. Once the message has been accepted, parsed and handled, SciDB goes back to listening on the socket (or to handling the next message).
Message Parsing and MessageDesc
SciDB processes--including the client library--communicate by passing messages over TCP/IP connections. The base class for message passing is the MessageDesc class. Different flavors of messages convey things like status and control messages, query plans, chunks of data, and so on. When a SciDB instance receives a message (when a client connects to a SciDB coordinator node, or when a coordinator node sends a message to a worker node, or a worker node sends a message or data to another worker node) the first thing the SciDB instance does is to invoke the Connection object associated with the IO handler to create a MessageDesc object and populate it.
| Connection::start() | Initialize message handling. | ||
| Connection::readMessage() | Read the message header from the asio handler, and put the message data into a MessageDesc object. Within this class, there are a series of nested calls to read the contents of the message: its header, the message body, and (if necessary) whatever binary data is associated with the message. | ||
| NetworkManager::handleMessage() | Having populated the MessageDesc object, call back to the NetworkManager with instructions to handle it. |
Message Handling (Where the Real Work Begins)
The MessageDesc class encapsulates the various kinds of messages SciDB processes can send to one another.
| NetworkManager::handleMessage() | Create the appropriate MessageHandlerJob object for SciDB to assign to a thread, and then push the new job into the NetworkManager JobQueue. |
Attachments
-
Slide1.jpg
(86.3 KB) -
added by plumber 14 months ago.
SciDB Architectural Picture - as of Version 1.0
