Contents:
SciDB Client API for Python
SciDB uses an ODBC/JDBC like interface to connect to the SciDB server and execute commands. This interface is available from multiple computer languages. This page documents the Python version of the SciDB API.
To get access to the API add the following to your Python file:
import sys
sys.path.append('/opt/scidb/11.06/lib') # or location appropriate to your installed version
import scidbapi as scidb
This imports the scidbapi module, scidbapi.py, which defines an interface to scidb. This interface is implemented using several Python and C++ libraries beneath it: libscidbpython.py (generated by the SWIG compiler) contains python classes that are proxies for C++ classes, _libscidbpython.so which was also generated by SWIG and provides some of the conversion between the Python API and C++, and libscidbclient.so, which implements the C++ remote client library for SciDB.
You can then list information about the scidbapi module with the Python statement:
help(scidb)
Example Python Application
Two sample python applications are provided in the /opt/scidb/11.06/shared/scidb/examples/python directory of a server installation. These are also located at src/capi/pythonexamples directory of the SciDB sources, available to registered SciDB developers. The example files are:
- README
- sample.py is a program that creates and loads an array, executes a select AQL statement, and drops the array.
- sample2.py takes in a list of afl/aql queries from a file (or files) and executes them. This example shows the use of additional data types and the empty flag for queries with filter predicates.
- simplearray.data is read by sample.py
- sample2.csv is read by sample2.py
NOTE: You may find other contributed examples in the src/capi/pythonexamples; however, they will not be given the level of attention for maintaining correctness as sample.py and sample2.py
and may be written in older versions of the API.
NOTE: The python API will probably move to its own directory in the source tree (out of the capi directory) in the near future.
Example: Connect and Execute a Simple Query
You connect to SciDB using connect() and execute queries using executeQuery(). Connect() takes a server address and the port number for the SciDB coordinator.
db = scidb.connect("localhost", 1239) # connect to the SciDB coordinator.
result = db.executeQuery("drop array simplearray", 'aql') # execute an AQL query
You then iterate over result to obtain the result data. See the section on Array and Chunk Iterators, below.
Create and Load Queries
executeQuery (statement, type, result, handle)
| Arg | Description |
|---|---|
| statement | Valid AQL or AFL statement. |
| type | scidb.AQL or scidb.AFL. |
| result | Each query requires a new QueryResult structure on the client. QueryResult is described in the section below on Data Access. |
| handle | connection handle |
The examples below show how to execute create and load queries.
Create an array simplearray.
db.executeQuery("CREATE immutable ARRAY simplearray < foo:int32, bar:char, baz:string > [row=0:99,10,0, col=0:9,10,0]", "aql")
Load data into this array. The data file must be visible on the server's file system. A relative path to the file will be interpreted relative to the working directory of the server. This will be appropriate if the data were saved from the same server. In other cases, it may be more appropriate to use an absolute path to files to be loaded.
db.executeQuery("load simplearray from 'simplearray.data'", "AQL")
db.executeQuery("select * from simplearray", "AQL")
Data Access
Query Result
Accessing the schema of the result set is performed through a set of python objects accessible through the QueryResult.
Use the following help commands to get more information
help(scidb.swig.QueryResult)
Arg Description array Handle to the array object, its iterators and descriptors returned by the server. queryID Query ID as known to the server. It is valid after the successful execution of a statement and may not be re-used. selective Indicates if a data retrieval command was executed. executionTime Execution time of this query. explainLogical Logical plan used. explainPhysical Physical plan used.
Array, Attribute, and Dimension Descriptors
Additional information such as the dimensions and attributes of the result array are accessible from objects in the array. For more information use help on the following classes:
help(scidb.swig.ArrayDesc) help(scidb.swig.AttributeDesc) help(scidb.swig.DimensionDesc)
Array and Chunk Iterators
The data access API is based on the nested array data model of SciDB.
A SciDB array is returned to the caller as a collection of chunks that together represent the array. Array and chunk iterators must be used to scan all cells of the array.
Each attribute of the array can be accessed using a separate set of iterators. Attribute iteration is done at two levels: an outer iteration of chunks of the array and an iteration of cells in a chunk. The array iterator iterates over the chunks in dimension major order, as does the chunk iterator. That is to say that both arrays and chunks are multidimensional and data is returned in first to last dimension order (e.g., row-major order for a 2d array).
The data access API includes the following objects.
Array ArrayDesc AttributeDesc DimensionDesc ConstArrayIterator ConstChunkIterator ConstChunk Coordinates Value
The following example shows how to iterate over all chunks of an array, and all elements of each chunk.
chunkiters = []
for i in range(attrs.size()):
nc = -1
while not iters[i].end():
nc += 1
chunkiter = iters[i].getChunk().getConstIterator((scidb.swig.ConstChunkIterator.IGNORE_EMPTY_CELLS |
scidb.swig.ConstChunkIterator.IGNORE_OVERLAPS))
print "Chunk iterator %d loaded." % nc
while not chunkiter.end():
dataitem = chunkiter.getItem()
item = scidb.getTypedValue(dataitem, attrs[i].getType()) # generate the right type of python object
print "Data: %s" % item
iters[i].increment_to_next();
Items
Each dataitem returned by the iterator requires a different internal method to retrieve it. Determining that method requires examining the type of its attribute, which can be found by calling AttributeDesc.getType(). If you are using a built-in type, then there is a utility function that will call the per-type for you and return and object of the correct type. For example:
scidb.getTypedValue(dataItem, attrs[i].getType()) # attrs is a per-attribute array of AttributeDesc
Special methods are available to detect if an element (or array position) has special significance, such as:
dataitem.isEmpty()
Cleanup
A query previously started may be canceled using cancelQuery(). See above for a description of the query ID.
db.cancelQuery(queryID) # queryID is in the queryResult class
And the client can disconnect from the server using
db.disconnect()
which will also be called when the db object is deleted (typically by garbage collection). In order to be sure the connection resources are recycled for use by the SciDB server, its probably smart to call db.disconnect() explicitly, rather than waiting for garbage collection to delete the db object at some indeterminate point in the future.
Exception Handling
If the connector encounters an error, or if the server returns an error during query execution an exception is raised to the python application. These exceptions may be handled using the standard python try/except mechanism.