wiki:Docs/Development/ArrayFunctionalLanguage_trunk

AFL: Array Functional Language Reference

Contents:

  1. AFL: Array Functional Language Reference
    1. Array Definition Operators
      1. Create a SciDB Array : create array
      2. Adding and Deleting dimensions : adddim and deldim
      3. Remove an array : remove
      4. Rename an array : rename
    2. Basic Array Operators
      1. Compute new values : apply
      2. Select all the data in an array : scan
    3. Sampling Operators
      1. Load data from a file into a SciDB array : load
      2. Unload array data to a file : save
      3. Building array data : build
      4. Build a sparse array : build_sparse
      5. Input : input
      6. Store data in an array : store
      7. Change the attribute names, or dimension names of an array : cast
      8. Change the name of an attribute : attribute_rename
    4. DML
      1. Unary DML Operators
        1. Statistical Properties : aggregate
        2. Selecting a subset of data within a specified region : between
        3. Selecting a subset of data by boolean expression : filter
        4. Projecting (selecting) array attributes : project
        5. Reverse array elements : reverse
        6. Compute aggregates for a sub-grid : regrid
        7. Compute window aggregates : window
        8. Sort array by attribute : sort
      2. Binary DML Operators
        1. Concatenate two arrays : concat
        2. Full cross product join of two arrays : cross
        3. Cross product join with equality predicates : cross_join
        4. Joining two arrays : join
        5. Select from an array based on a pattern : lookup
        6. Merge attributes : merge
        7. Random sample of cells/chunks from an array : sample and bernoulli
        8. Substitute NULL values : substitute
      3. DML Operators that Change Array Shapes
        1. Selecting subset of data by dimension range : subarray
        2. Reshape an array : reshape
        3. Re-partition an array : repart
        4. Transform attributes to dimensions : redimension and redimension_store
        5. Expanding an element into a grid : xgrid
        6. Select a subplane of the array : slice
        7. Transform into a single dimensional array : unpack
        8. Transpose an array : transpose
        9. Select elements from an array dimension : thin
    5. Matrix and Vector Operations
      1. Multiply two matrices : multiply
      2. Normalize vector elements : norm
    6. Debug and Internal commands
      1. Cancel a running query : cancel
      2. Output a string : echo
      3. Explain plan - explain_logical and explain_physical
      4. Get the signature of an operator : help
      5. Scatter/Gather an array : sg
      6. Reduce the distribution of a replicated array : reduce_distro
    7. Metadata Operators
      1. List contents of SciDB database : list
      2. Getting dimensions of an array : dimensions
      3. Getting attributes of an array : attributes
      4. Get public format of SciDB array : show
      5. Getting versions of an array : versions
    8. Loading and Unloading Plugins
      1. load_library
      2. unload_library

SciDB provides two programming interfaces:

  • AQL, an array query language that is similar to SQL.
  • AFL, a functional language that provides the similar capabilities as AQL but with a functional syntax. AFL also provides support for querying array metadata and additional array operators.

This page describes the features of AFL supported in this release. While AFL may be revised over time as the implementation evolves and matures, the AQL interface will remain fixed with a commitment to backward compatibility.

AFL divides up into several areas:

  • Array definition operators allow users to create arrays and change array structures
  • Basic array operators allow users to query the contents of arrays in a SciDB instance.
  • Sampling operators
  • Combining two arrays
  • Metadata operators provide information about the state of the database.
  • Internal command operators allow you to debug SciDB transactions.
  • Linear Algebra operators provide basic linear algebra functionality

Array Definition Operators

Create a SciDB Array : create array

Use the create array operator to create SciDB arrays.

Syntax: CREATE [array_type] ARRAY

array_name < attribute_name : type_name [ NULL | NOT NULL ] [ RESERVE reserve ] [, ...] > [ dimension_name = start: end|*, chunk_size, chunk_overlap [, ...] ]

Example: Create a 2-dimensional array of size 100-by-100 with two attributes per cell. The dimensions should be named i and j and the attributes should be named name and val:

create array my_array <name:string, val:double> [ i=0:99,10,0, j=0:99,10,0]

Adding and Deleting dimensions : adddim and deldim

Change the structure of an array by adding or deleting a dimension.

The adddim operator prepends the existing dimensions for an array with a new dimension, whose name is supplied as the second argument to the operator.

The new dimension will have start = 0, length = 1, chunkInterval = 1, overlap = 0.

Deldim deletes the left-most dimension from the array. Deleted dimension must have size 1.

Signature:

adddim( array, new_dimension : dimension_name )

deldim( array )

Examples:

adddim(matrix, timestamp)
deldim(subarray(matrix, 1, 100, 1, 200))

Remove an array : remove

Removes an array from the SciDB instance.

Signature:

remove( array_identifier )

The only argument to the remove operator is the name of the array to be removed. The base array and all its versions are removed. If mapping arrays were created for any non-integer dimensions, they are also removed.

For example:

list ('arrays')
["Load_and_Append"),("Load_Two"),("Example_One"),("Example_Four"),("Build_Example_INT")]

remove (Load_and_Append)
Query was executed successfully

list ('arrays')
[("Load_Two"),("Example_One"),("Example_Four"),("Build_Example_INT")]

NOTES:

  1. Deleted array names can be re-used.
  2. See below for AFL operators to interrogate the SciDB instance about the shape, attributes and contents of an array.
  3. The remove operator currently does not reclaim storage.

Rename an array : rename

Rename an existing array in the database.

Signature:

rename( old_name : array_identifier, new_name : array_identifier )

The first argument is the name of the array to be renamed, and the second argument is the new array name.

Example:

 store ( build (< A: int32 > [ X=0:4,5,0, Y=0:4,5,0 ], iif(X=Y,1,0)), rename_example);
 list ('arrays');
 rename (rename_example, new_name);
 project (new_name, A);

[[(1),(0),(0),(0),(0)],
[(0),(1),(0),(0),(0)],
[(0),(0),(1),(0),(0)],
[(0),(0),(0),(1),(0)],
[(0),(0),(0),(0),(1)]]

NOTE:

  1. Once a rename is done, the old array name can be re-used. Any mapping arrays are also renamed.
  2. remove and rename are unsafe when concurrently used with other queries on the same arrays.

Basic Array Operators

Compute new values : apply

Use the apply operator to compute new values from attributes and indexes of input arrays. The value(s) computed in the apply are appended to the attributes in the input array.

Signature:

apply( array, new_attribute : attribute_identifier, expr_to_apply : expression )

For example, the query below will compute a new attribute C, whose value is given by the expression A + x. The resulting array will have all the old attributes, as well as the new attribute C.

 apply ( subarray ( my_array, 4, 4, 6, 6 ), C, A + 20 )
y\x012
0(44,44.0,64.0)(45,45.0,65.0)(46,46.0,66.0)
1(54,54.0,74.0)(55,55.0,75.0)(56,56.0,76.0)
2(64,64.0,84.0)(65,65.0,85.0)(66,66.0,86.0)

Queries that need more than one apply must nest them.

 apply ( apply ( subarray ( my_array, 4, 4, 6, 6 ), C, A + 20 ), D, A + X)
y\x012
0(44,44.0,64.0,44.0)(45,45.0,65.0,45.0)(46,46.0,66.0,46.0)
1(54,54.0,74.0,55.0)(55,55.0,75.0,56.0)(56,56.0,76.0,57.0)
2(64,64.0,84.0,66.0)(65,65.0,85.0,67.0)(66,66.0,86.0,68.0)

Select all the data in an array : scan

Scan returns the contents of the array whose name is passed in as an argument.

Signature:

scan(array_identifier [, timestamp] )

Example:

scan(my_array)

The 'scan' operator expects an array identifier as its first argument, and returns an array as a result of operation. This operator is only used to get a complete listing of the contents of an array stored in the SciDB database. To see a particular version of an array, use the optional timestamp argument:

scan(my_array,now())

Sampling Operators

Load data from a file into a SciDB array : load

Use the load operator to load data into an existing SciDB array.

Unload array data to a file : save

Use the save operator to unload the data to an external file on the file system. The format of the file is the same as the load file format described above. If the internal chunk storage format of the array is dense, this file is unloaded into a dense file format. And if the array chunk storage format is sparse, the save output is in sparse file format.

Signature:

save( ''array'', ''file_path : string'' )

Array name refers to the array to be unloaded and saved. File path is the path to a file to use -- if a relative path name is used this is assumed to be relative to the working directory of the SciDB server.

In a cluster configuration, save operates as follows. For each node in a cluster save creates a file with the same file name containing only the chunks stored on that node. The entire array data is obtained by concatenating saved files from all nodes.


Building array data : build

Return a dense single-attribute array, assigning values to its attributes using an expression. Expressions can refer to one of the functions supported in SciDB, and to dimension values.

Signature:

build( array : array_identifier | anonymous_schema, expression )

The build operator's first argument is:

  • the name of an array and its schema to be used as the template for the operator's result, or
  • an array schema ("anonymous_schema") to be used as the template for the operator result

build proceeds through the array, cell by cell, using the value of the exp : expression to compute the value of each cell. The expression can use any of the array's dimension values. On a cluster configuration, build works in parallel on all nodes.

Example 1: Create an identity matrix called Build_Example_INT:

CREATE ARRAY Build_Example_INT < A: int32 > [ X=0:4,5,0, Y=0:4,5,0 ]

build (Build_Example_INT, iif(X=Y, 1, 0))
[
[(1),(0),(0),(0),(0)],
[(0),(1),(0),(0),(0)],
[(0),(0),(1),(0),(0)],
[(0),(0),(0),(1),(0)],
[(0),(0),(0),(0),(1)]
]

Example 2: Build an array of monotonically increasing values from an anonymous schema:

build (<val:int64> [x=1:4,4,0,y=1:4,4,0], x*4+y-5)"
[
[(0),(1),(2),(3)],
[(4),(5),(6),(7)],
[(8),(9),(10),(11)],
[(12),(13),(14),(15)]
]

NOTE: The build operator does not store data and does not change the given array - it only uses the given shape to generate and return a result. To store the result of the build operator you will need to use the store operator (see below).


Build a sparse array : build_sparse

build_sparse works similarly to build but returns an EMPTY type array and accepts a second expression that must be boolean.

Signature:

build_sparse( array : array_identifier | anonymous_schema, exp : expression, bexp : expression )

The build_sparse operator's first argument is:

  • the name of an array and its schema to be used as the template for the operator's result, or
  • an array schema ("anonymous_schema") to be used as the template for the operator result.

The output of build_sparse contains empty cells wherever bexp evaluates to false. Both expressions can access any of the array coordinates.

Example: Build a diagonal matrix called sparse_diagonal:

CREATE ARRAY sparse_diagonal <a1: double> [x=0:2,3,0, y=0:2,3,0]
build_sparse(sparse_diagonal, 1.0*x+100.0*y, x=y)

Build the same matrix template with an anonymous schema:

build_sparse(<a1: double> [x=0:2,3,0, y=0:2,3,0], 1.0*x+100.0*y, x=y)

Input : input

Input is used to read a file from the filesystem, interpret the contents as cells in an array and return the result. Input has the exact same signature as load.

Signature:

input (array_name : array_identifier, data_file : string [, nodeid : int] )

Input works exactly the same way as load, except it does NOT store the data. The store operator can be used to persist the output of input, as in:

store ( input (a, 'somefile' ), a) -- equivalent to -- load ( a, 'somefile' )

or even

store ( input (a, 'somefile' ), b)

Store data in an array : store

The store operator takes the result of an AFL expression, and stores the result in the target array. If the target array does not exist, a new array is created using the schema of the input.

Signature:

store( array, target_array : array_identifier )

The store operator takes the input array and stores the result in the target_array. If target_array already exists and the schema of the input matches that of the target, the result is stored to it. Otherwise, target_array is created using the schema inferred from the input.

For example:

store ( build ( < A: int32 > [ X=0:4,5,0, Y=0:4,5,0 ] , 5*X + Y ), New_Build_Array)

Change the attribute names, or dimension names of an array : cast

This operator allows renaming an array or any of its attributes and dimensions. A single cast invocation can be used to rename multiple items at once -- one or more attribute names and/or one or more dimension names. The input array and template arrays should have the same numbers and types of attributes and the same numbers and types of dimensions.

Note that the template-array need not have actual data and is only used as a template from which attribute names and dimension names are used. Note that the input array is not modified, rather, a new array is created as a result of the cast operator. To store this newly created array use the store operator.

Signature:

cast ( array, template : array_identifier | anonymous_schema )

The cast operator's second argument is:

  • the name of an array and its schema to be used as the template for the operator's result, or
  • an array schema ("anonymous_schema") to be used as the template for the operator result

Example 1:

  1. Create an array called source with a attribute called val:
    create array source <val:double> [x=0:5,3,0]
    
  2. Use an anonymous schema to change the attribute name to value,and the dimension name to i. Store the result in an array called target:
    store(cast(source, <value:int64>[i=0:5,3,0]),target)
    

target has the new names while source maintains the old:

show(target);
[("target<value:double NOT NULL> [i=0:5,3,0]")]
show(source);
[("source<val:double NOT NULL> [x=0:5,3,0]")]

Example 2: One important application of cast is the resolution of naming conflicts:

store(build (<a:int64> [x=0:10,1,0], x), vector)
[(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)]

store(build (<a:int64> [x=0:10,1,0], -x), vector2)
[(0),(-1),(-2),(-3),(-4),(-5),(-6),(-7),(-8),(-9),(-10)]

apply(join(vector,vector2), b, a+a)
-- ERROR: a is ambiguous

apply(join(vector, cast (vector2, <b:int64>[x=0:10,1,0])), c, a+b)
[(0,0,0),(1,-1,0),(2,-2,0),(3,-3,0),(4,-4,0),(5,-5,0),(6,-6,0),(7,-7,0),(8,-8,0),(9,-9,0),(10,-10,0)]

Change the name of an attribute : attribute_rename

Works similarly to cast but only changes the name of an attribute. The new name is used in the result array, and the source array is not changed.

Signature:

attribute_rename ( array, old_name : attribute_identifier, new_name : attribute_identifier )

Example:

attribute_rename(m3x3,val,foo)

DML

In this section we review the syntax and the operators that make up the AFL Data Manipulation Language.

The DML portion of SciDB AFL adopts a functional programming language syntax over an underlying algebra of array operators. The general form of a DML expression in AFL DML will look like this:

  operator_B ( operator_A ( array_name ), constants and expressions )

for unary operations, like this:

   operator_B (operator_A (array_1, array_2), constants and expressions )

for binary operations.

AFL query expressions should be read from the inside out. In this example, the first operator executed is the inner-most--operator_A--and it is applied to an array stored in the SciDB database named 'array_name'. In this example, the output of 'operator_A ( array_name )' is another logical array that becomes the input to the outermost operator, operator_B. The output of B is then returned to the client issuing the query. Whenever an operator signature uses an array input - it implies that the input to the operator can be a stored array (using an array identifier) or the output of another operator. In addition to arrays, some operators in AFL can take other arguments - constants, expressions, or identifiers.

NOTE: DML operators leave the database unchanged. Only STORE and LOAD operators may add stored data to new or existing arrays.


Unary DML Operators


Statistical Properties : aggregate

The aggregate operator takes an array as input, groups the array by the specified dimension and computes a given statistic for each group. The statistics available are:

Name Operation Performed
avg Average value
count Number of nonempty elements
max Largest number
min Smallest number
sum Sum of all the elements
stdev Standard deviation
var Variance

The aggregate operator returns a scalar value for each statistic.

Signature:

aggregate ( array, aggregate_name_1(attribute) [, aggregate_name_2(attribute),... aggregate_name_N(attribute)] [, dimension_1, dimension_2,... dimension_M] )

Example:

  1. Create a 3-by-3 array called m3x3:
    store ( build ( <val:double> [x=0:2,3,0,y=0:2,3,0], x*3+y), m3x3);
    [
    [(0),(1),(2)],
    [(3),(4),(5)],
    [(6),(7),(8)]
    ]
    
  2. Find the sums of each column:
    aggregate(m3x3,sum(val),y)
    [(9),(12),(15)]
    
  3. Find the average value, number of nonempty elements, largest element, smallest element, sum of all elements, variance, and standard deviation:
    aggregate(m3x3,avg(val),count(val),max(val),min(val),sum(val),var(val),stdev(val))
    [(4,9,8,0,36,7.5,2.73861)]
    

You can also use the built-in aggregate operations, which have the syntax:

Arithmetic mean : avg

Calculate the average value of the specified attribute in the array. The result is an array with single element containing average value. If the input array contains only one attribute, then attribute name can be omitted.

Signature:

avg( array [ , attribute-name : attribute_identifier [, dimension1 : dimension_identifier [, dimension2 : dimension_identifier ]]] )

The first argument is the array to be aggregated over. The second argument is the name of the attribute to use. Additional dimension arguments are optional. If present, the list of dimensions specified in the avg operator is used to perform a group-by average. The result is organized as an array with the remaining dimensions from the source array, after grouping has been performed based on the group-by dimensions.

Example: Support my_array is a 2 D array with attributes a: int32 and b: double, and dimensions x and y. The following command returns the average value of a over all elements of array. The second example below calculates the average value of b after grouping over all values of the dimension x. The second example below results in a one-dimensional array whose dimension is the remaining dimension y from the input.

avg(m3x3, val)
[(4)]

avg(m3x3, val, y)
[(3),(4),(5)]
Count of cells : count

Counts non-empty cells of the input array. When dimensions are provided they are used to do a group-by and a count per resulting group is returned.

Signature:

count( array [, dimension1 : dimension_identifier [, dimension2 : dimension_identifier]] )

Example:

count(m3x3)
[(9)]

count(m3x3,x)
[(3),(3),(3)]

Note that cells of value 0 or null are not considered empty:

create array A <a:int32 null> [x=0:0,1,0];
count(A)
[(0)]

store(build(A,x),A)
count(A)
[(1)]

store(build(A,null),A)
count(A)
[(1)]

This is different than the behavior of the aggregate operator with the count option:

Maximum value : max

Calculate maximum of the specified attribute in the array. Result is an array with single element containing maximum of specified attribute.

If input array contains only one attribute, then attribute name can be omitted. Again, if dimensions are provided, they are used to produce groups and the maximum of each group is returned.

Signature:

max( array [ , attribute-name : attribute_identifier [, dimension1 : dimension_identifier [, dimension2 : dimension_identifier ]]] )

Example:

max(m3x3)
[(8)]

max(m3x3,val,x)
[(2),(5),(8)]
Minimum value : min

Calculates the minimum value of the specified attribute in the array. Result is an array with single element containing minimum of specified attribute. If input array contains only one attribute, then attribute name can be omitted. If an attribute list is specified, the result is an array with the remaining dimensions from the source array, and the minimum is evaluated over groups, where each group is the set of all elements matching the group by dimension(s).

Signature:

min( array [ , attribute-name : attribute_identifier [, dimension1 : dimension_identifier [, dimension2 : dimension_identifier ]]] )

Example:

min(m3x3)
[(0)]

min(m3x3,val,y)
[(0),(1),(2)]
Sum of attribute values : sum

Calculate sum of the specified attribute in the array. Result is an array with single element containing sum of specified attribute. If input array contains only one attribute, then attribute name can be omitted. Again, similar group by semantics as with the other aggregates, if a dimension list is present.

Signature:

sum( array [ , attribute-name : attribute_identifier [, dimension1 : dimension_identifier [, dimension2 : dimension_identifier ]]] )

Example:

sum(m3x3)
[(36)]

sum(m3x3,val,x)
[(3),(12),(21)]
Variance of attribute values : var

Calculate the variance of the specified attribute in the array. Result is an array with single element containing the variance of specified attribute. If input array contains only one attribute, then attribute name can be omitted. Again, similar group by semantics as with the other aggregates is supported, if a dimension list is present.

Signature:

var( array [ , attribute-name : attribute_identifier [, dimension1 : dimension_identifier [, dimension2 : dimension_identifier ]]] )

Example:

var(m3x3)
[(7.5)]

var(m3x3,val,x)
[(1),(1),(1)]

Variance accepts numeric attributes only. The result is the sample variance of the attribute values.

Standard deviation of attribute values : stdev

Calculate the standard deviation of the specified attribute in the array. Result is an array with single element containing the standard deviation of specified attribute. If input array contains only one attribute, then attribute name can be omitted. Again, similar group by semantics as with the other aggregates is supported, if a dimension list is present.

Signature:

stdev( array [ , attribute-name : attribute_identifier [, dimension1 : dimension_identifier [, dimension2 : dimension_identifier ]]] )

Example:

stdev(m3x3)
[(2.73861)]

stdev(m3x3,val,y)
[(3),(3),(3)]

Standard deviation accepts numeric attributes only. The result is the population standard deviation of the attribute values.

Selecting a subset of data within a specified region : between

The between operator accepts an input array and a set of coordinates specifying a region within the array. The output is an array of the same shape as input, where all cells outside of the given region are marked empty.

Signature:

between ( array, low-boundary-coordinate-1: value, ..., low-boundary-coordinate-N: value, high-boundary-coordinate-1: value, ..., high-boundary-coordinate-N: value )

Note that between is very similar to subarray, except that between does not change the shape of the array, and does not change the starting coordinates.

For example, consider the following four-by-four array of integers:

scan(fbf)
[[(0),(1)],[(4),(5)]];[[(2),(3)],[(6),(7)]];[[(8),(9)],[(12),(13)]];[[(10),(11)],[(14),(15)]]

Between can be used to set all cells outside of the (1,1)->(2,2) region to empty:

between(fbf,1,1,2,2)
[[(),()],[(),(5,true)]];[[(),()],[(6,true),()]];[[(),(9,true)],[(),()]];[[(10,true),()],[(),()]]

Note that a new boolean attribute referred to as the emptyTag is appended to each element of the between result. This boolean attribute is set to true for all elements that are included in the between result and false for the rest of the elements of the array.

If the input to subarray or between is an array with non-integer dimension, the range must be specified in terms of the dimension value. The following example shows the use of subarray with float dimensions.

create immutable empty array B <b:float>[a(float)=3,3,0];

-- Use redimension_store to add data --
...
-- Now, use unpack to inspect the coordinate values -- 
unpack(B,row);
[(1.05,5.08,true),(2.05,6.08,true),(3.05,7.08,true)]

subarray(B,float(1),float(3));
[(5.08,true),(6.08,true)]

Selecting a subset of data by boolean expression : filter

The filter operator 'filters' out data in the array based on an expression over the attribute and dimension values. Suppose we want to filter out all the data from my_array, where the value of attribute A doesn't equal 75. We would use the following query:

 filter( my_array, A <> 75)

The filter operator marks all cells in the input which do not satisfy the predicate expression to 'empty'.

Signature:

filter ( array, condition : expression )

where 'condition' is a boolean expression.

The following examples illustrate both the functionality of the filter operator, and also how operators in AFL can be combined into query expressions.

  filter ( subarray ( my_array, 4, 4, 6, 6), A > 46 )
y\x012
0(0,0.0,false)(0,0.0,false)(0,0.0,false)
1(54,54.0,true)(55,55.0,true)(56,56.0,true)
2(64,64.0,true)(65,65.0,true)(66,66.0,true)
filter ( subarray ( my_array, 4, 4, 6, 6), A > 46 and A < 56)

y\x012
0(0,0.0,false)(0,0.0,false)(0,0.0,false)
1(54,54.0,true)(55,55.0,true)(56,56.0,true)
2(0,0.0,false)(0,0.0,false)(0,0.0,false)

The output of the filter is an array with the EMPTY attribute.


Projecting (selecting) array attributes : project

Project the input array on the specified attributes, in the specified order. Attributes that are not specified are excluded from the output.

Signature:

project ( array, attribute1 : attribute_identifier [, attribute-2 : attribute_identifier ...] )

Example:

 project( subarray ( my_array, 4, 4, 6, 6 ), A)
y\x012
0(44)(45)(46)
1(54)(55)(56)
2(64)(65)(66)
 project( apply ( subarray ( my_array, 4, 4, 6, 6 ), C, A + 20 ), C)
y\x012
0(64.0)(65.0)(66.0)
1(74.0)(75.0)(76.0)
2(84.0)(85.0)(86.0)


Reverse array elements : reverse

"Invert" or reverse the elements of an array by reversing the values of each dimension.

Signature:

reverse( array )

Assume that dense is a 2-dimensional array with the following definition and contents:

CREATE ARRAY dense <a: int32>[x=0:5,3,0, y=0:5,3,0]

scan(dense)
[
[(66),(65),(64)],
[(56),(55),(54)],
[(46),(45),(44)]
];
[
[(63),(62),(61)],
[(53),(52),(51)],
[(43),(42),(41)]
];
[
[(36),(35),(34)],
[(26),(25),(24)],
[(16),(15),(14)]
];
[
[(33),(32),(31)],
[(23),(22),(21)],
[(13),(12),(11)]
]

reverse returns an array in which each dimension is the reverse of the corresponding dimension of the source array.

reverse(dense)
[
[(11),(12),(13)],
[(21),(22),(23)],
[(31),(32),(33)]
];
[
[(14),(15),(16)],
[(24),(25),(26)],
[(34),(35),(36)]
];
[
[(41),(42),(43)],
[(51),(52),(53)],
[(61),(62),(63)]
];
[
[(44),(45),(46)],
[(54),(55),(56)],
[(64),(65),(66)]
]

Compute aggregates for a sub-grid : regrid

Partition (divide) the cells in the input array into blocks, and for each block, apply a specific aggregate operation over the value(s) of some attribute in each block.

Regrid does not allow grids to span array chunks and requires the chunk size to be a multiple of the grid size in each dimension.

Signature:

regrid ( array, grid_1, grid_2,.. grid_N, aggregate_call_1 [, aggregate_call_2,...aggregate_call_N] )

Example:

store ( build ( <val:double> [x=0:3,4,0,y=0:3,4,0], x*4+y), m4x4);
[
[(0),(1),(2),(3)],
[(4),(5),(6),(7)],
[(8),(9),(10),(11)],
[(12),(13),(14),(15)]
]

regrid(m4x4, 2,2, sum(val));
[[(10),(18)],[(42),(50)]]

regrid(m4x4, 2,2, sum(val),max(val))
[[(10,5),(18,7)],[(42,13),(50,15)]]

Compute window aggregates : window

Compute one or more aggregates of any of an array's attributes over a moving window.

Signature:

window ( array, grid_1, grid_2,.. grid_N, aggregate_call_1 [, aggregate_call_2,...aggregate_call_N] )

Each aggregate_call argument consists of a call to an aggregate and an optional alias. For example:

sum(val) as output

will sum the attribute called val and place it in output.

Example:

create array m4x4 <val1:double,val2:int32> [x=0:3,4,0,y=0:3,4,0];
load(m4x4, '/tmp/m4x4_2attr');
[
[(0,100),(1,99),(2,98),(3,97)],
[(4,96),(5,95),(6,94),(7,93)],
[(8,92),(9,91),(10,90),(11,89)],
[(12,88),(13,87),(14,86),(15,85)]
]
window(m4x4,2,2, max(val1),sum(val2));
[
[(0,100),(1,199),(2,197),(3,195)],
[(4,196),(5,390),(6,386),(7,382)],
[(8,188),(9,374),(10,370),(11,366)],
[(12,180),(13,358),(14,354),(15,350)]]

The resulting schema for the output of window is:

 <val1_sum:double NULL, val2_max:int32 NULL>[x=0:3,4,0, y=0:3,4,0]

You can also use an alias with the aggregate call:

window(m4x4,2,2, max(val1) as z, sum(val2) as w);

The window is defined by a size in each dimension, for example, the window in the above example is 3-by-3.

The starting position of the window centroid is the first element of the array. At the edges of the array the window aggregate only contains elements that are included in the array. The centroid of the window moves in stride-major order from lowest to highest value in each dimension. The output array has the same shape (no overlap) as the input array. Each element of the output array contains the aggregates computed over the corresponding window location over the input array.

SciDB uses array overlap to perform the window operation on each chunk locally. If necessary a repart operator is implicitly included in the execution plan.

Sort array by attribute : sort

Sort a one-dimensional array by one or more attributes. The sort attributes are specified using a 1-based attribute number.

Signature:

sort( array, [-] attribute-number-1 : integer [ , [-] attribute-number-I : integer ] )

Example:

show(sd)
[("sd<num:double NOT NULL,label:string NOT NULL> [x=0:8,8,0]")]

scan(sd)
[(1,"a"),(2,"z"),(0,"b"),(3,"x"),(2,"f"),(6,"o"),(1.5,"e"),(2.7,"l")]

sort(sd,1)
[(0,"b"),(1,"a"),(1.5,"e"),(2,"z"),(2,"f"),(2.7,"l"),(3,"x"),(6,"o")]

Unadorned attributes imply ascending order. Pre-fixing an attribute's position with a '-' symbol tells the sort operator to sort in descending order.

sort(sd,-2)
[(2,"z"),(3,"x"),(6,"o"),(2.7,"l"),(2,"f"),(1.5,"e"),(0,"b"),(1,"a")]

Binary DML Operators

Concatenate two arrays : concat

Concatenate two arrays. Array should have the name number of dimensions. Concatenation is performed by the left-most dimension. All other dimensions of the input arrays must match. The left-most dimension of both arrays must have a fixed size (not unbounded) and same chunking schema. Both inputs must have the same attributes.

Signature:

concat( left: array, right : array )

Example:

store(build(<val:int64>[x=0:3,100,0,y=0:4,100,0],x*5+y),four_by_five)
[
[(0),(1),(2),(3),(4)],
[(5),(6),(7),(8),(9)],
[(10),(11),(12),(13),(14)],
[(15),(16),(17),(18),(19)]
]

store(build(<val:int64>[x=0:1,100,0,y=0:4,100,0],20+x*2+y),two_by_five)
[
[(20),(21),(22),(23),(24)],
[(22),(23),(24),(25),(26)]
]

concat(four_by_five,two_by_five)
[
[(0),(1),(2),(3),(4)],
[(5),(6),(7),(8),(9)],
[(10),(11),(12),(13),(14)],
[(15),(16),(17),(18),(19)],
[(20),(21),(22),(23),(24)],
[(22),(23),(24),(25),(26)]
]

Note that concat can be combined with transpose, adddim, deldim, subarray, slice and other ops for more complex transformations.


Full cross product join of two arrays : cross

Calculates the full cross product join of two arrays, say A (m-dimensional array) and B (n-dimensional array) such that the result is an m+n dimensional array in which each cell is computed as the concatenation of the attribute lists from corresponding cells in arrays A and B. For example, consider a 2-dimensional array A with dimensions i, j, and a 1-dimensional array B with dimension k. The cell at coordinate position {i, j, k} of the output is computed as the concatenation of cells {i, j} of A with cell at coordinate {k} of B.

Signature:

cross( left : array, right : array )

Example:

create array a<a1: double>[i=1:3,3,0, j=1:3,3,0];
create array b<b1: double>[k=1:2,2,0];

store(build(a, i+j), a);
[[(2),(3),(4)],
[(3),(4),(5)],
[(4),(5),(6)]]

store(build(b, 1.0/k), b);
[(1),(0.5)]

cross(a, b);
[[
[(2,1),(2,0.5)],
[(3,1),(3,0.5)],
[(4,1),(4,0.5)]
],
[
[(3,1),(3,0.5)],
[(4,1),(4,0.5)],
[(5,1),(5,0.5)]
],
[
[(4,1),(4,0.5)],
[(5,1),(5,0.5)],
[(6,1),(6,0.5)]
]]

The output is a 3-dimensional array with the following schema:

<a1:double NOT NULL, b1:double NOT NULL>[i=1:3,3,0, j=1:3,3,0, k=1:2,2,0]

Cross product join with equality predicates : cross_join

Calculates the cross product join of two arrays, say A (m-dimensional array) and B (n-dimensional array) with equality predicates applied to pairs of dimensions, one from each input. Predicates can only be computed along dimension pairs that are aligned in their type, size, and chunking.

Assume p such predicates in the cross_join, then the result is an m+n-p dimensional array in which each cell is computed by concatenating the as follows.

For example, consider a 2-dimensional array A with dimensions i, j, and a 1-dimensional array B with dimension k. The result of a cross_join(A, B, j, k) results in a 2-dimensional array with coordinates {i, j} in which the cell at coordinate position {i, j} of the output is computed as the concatenation of cells {i, j} of A with cell at coordinate {k=j} of B.

Signature:

cross_join( left : array, right : array, left_dim_1: dimension_identifier, right_dim_1 : dimension_identifier [, left_dim_2: dimension_identifier, right_dim_2: dimension_identifier,... ] )

Example:

create array a<a1: double>[i=1:3,3,0, j=1:3,3,0];
create array b<b1: double>[k=1:3,3,0];

store(build(a, i+j), a);
[[(2),(3),(4)],
[(3),(4),(5)],
[(4),(5),(6)]]

store(build(b, 1.0/k), b);
[(1),(0.5), (0.333)]

cross_join(a, b, j, k);
[
[(2,1),(3,0.5),(4,0.333333)],
[(3,1),(4,0.5),(5,0.333333)],
[(4,1),(5,0.5),(6,0.333333)]
]

The result has schema:

<a:double NOT NULL, b:double NOT NULL>[i=1:3,3,0, j=1:3,3,0]

Joining two arrays : join

Join combines the attributes of two input arrays at matching dimension values.

Signature:

join ( left : array, right : array )

The following example illustrates a join between two different subarray operators applied to the same underlying array. Note that join combines attributes from cells at dimension addresses on its immediate input arrays; not in the original inputs:

 subarray ( my_array AS M1, 4, 4, 6, 6 )
y\x012
0(44,44.0)(45,45.0)(46,46.0)
1(54,54.0)(55,55.0)(56,56.0)
2(64,64.0)(65,65.0)(66,66.0)
 subarray ( my_array AS M2, 6, 6, 8, 8 )
y\x012
0(66,66.0)(67,67.0)(68,68.0)
1(76,76.0)(77,77.0)(78,78.0)
2(86,86.0)(87,87.0)(88,88.0)
 join ( subarray ( my_array AS M1, 4, 4, 6, 6 ), subarray ( my_array AS M2, 6, 6, 8, 8 ) )
y\x012
0(44,44.0,66,66.0)(45,45.0,67,67.0)(46,46.0,68,68.0)
1(54,54.0,76,76.0)(55,55.0,77,77.0)(56,56.0,78,78.0)
2(64,64.0,86,86.0)(65,65.0,87,87.0)(66,66.0,88,88.0)

The join result has the same dimension names as the first input. The left and right arrays must have the same shape. If a cell in either the left or right array is empty, the corresponding cell in the result is also empty.


Select from an array based on a pattern : lookup

Lookup maps elements from the second array using the attributes of the first array as coordinates into the second array. The result array has the same shape as first array and the same attributes as second array.

Signature:

lookup ( pattern : array, source: array )

Example: suppose the array foo has coordinates 0 to 7 and the values 1 to 8:

scan(foo)
[(1),(2),(3),(4),(5),(6),(7),(8)]

Suppose the array bar ranges from -10 to 10 and has values from 40 to 60:

scan(bar)
[(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60)]

Then lookup(foo,bar) returns 8 values, corresponding to bar positions from 1 to 8 :

lookup(foo,bar)
[(51),(52),(53),(54),(55),(56),(57),(58)]

Merge attributes : merge

Merge two arrays. The two input arrays should have the same shape as one another: that is, the same attribute list and dimensions.

Merge combines elements from the input array the following way: for each cell in the two inputs, if the cell of first (left) array is non-empty, then the attributes from that cell are selected and placed in the output. If the cell in the first array is marked as empty, then the attributes of the corresponding cell in the second array are taken. If the cell is empty in both input arrays, the output's cell is set to empty.

Signature:

merge( left : array, right : array )

Example:

scan(vec1)
[[(1,true)],[()],[(3,true)],[()],[(5,true)]]

scan(vec2)
[[()],[(5,true)],[()],[(7,true)],[()]]

merge(vec1,vec2)
[[(1,true)],[(5,true)],[(3,true)],[(7,true)],[(5,true)]]

Random sample of cells/chunks from an array : sample and bernoulli

SciDB supports two methods to sample a subset of elements from an array. They are the sample and bernoulli operators. In both, the user provides a sampling probability, which is the probability of inclusion.

In bernoulli, we evaluate each cell by generating a random number and seeing if it lies in the range (0, probability). If it does, the cell is included.

In sample, the selection is chunk-based. We evaluate each chunk for inclusion based on the probability.

Users often want samples with repeatable results. To achieve this, we use a seed value for the underlying random number generator.

Signature:

sample( array, probability : double [, seed : int64 ] )

bernoulli( array, probability : double [, seed : int64 ] )

Example:

bernoulli(m4x4,0.3)
[
[(0),(1),(0),(3)],
[(0),(0),(0),(0)],
[(8),(0),(0),(0)],
[(12),(13),(0),(0)]
]

Substitute NULL values : substitute

Substitute null values in the input array, using their missing reason code, if using the "?code" format, as index in the second (substitution) array.

Signature:

substitute( array, mask: array )

For example, array foo contains two nulls (which is equivalent to ?0) and one ?1:

scan(foo)
[(1),(2),(null),(4),(5),(null),(7),(?1)]

We can use substitute to replace ?0 with 20 and ?1 with 30, for instance. We build an array where 0 maps to 20 and 1 maps to 30:

build(<val:double> [x=0:1,2,0], iif( x = 0, 20, 30))"
[(20),(30)]

We can use this build expression as part of substitute directly - without having to store an array:

substitute(foo,build(<val:double> [x=0:1,2,0], iif( x = 0, 20, 30)))"
[(1),(2),(20),(4),(5),(20),(7),(30)]

Both inputs to substitute must be single-attribute arrays.


DML Operators that Change Array Shapes

List of AFL Operators that change the shape or dimensionality of an array.

In the case of some shape changing operators that require a fully qualified array schema, SciDB supports two ways to specify this schema. The first is by creating a new array with the desired output schema and referring to it - useful for storing the changed shape. The second method is to specify an inline anonymous schema. The inline schema is a string that specifies the type, attributes, and dimensions of the result array.

Example:

show(src);
[("a1<a:double NOT NULL> [i=1:3,3,0,j=1:3,3,0]")]

reshape(src, foo)
-- or --
reshape(src, <a: double>[k=1:9,9,0]);

Selecting subset of data by dimension range : subarray

Subarray selects a block of cells from an input array. The result is an array whose shape is defined by the 'bounding box' specified by the subarray.

Signature:

subarray ( array, low-boundary-coordinate-1: value, ..., low-boundary-coordinate-N: value, high-boundary-coordinate-1: value, ..., high-boundary-coordinate-N: value )

The first argument to the subarray operator is the array from which the block is to be sampled. The second parameter section specifies which block is to be extracted. There are as many pairs of array index values in the parameter block as the input : array has dimensions. For each dimension, the operator requires the minimum index value for each dimension first, followed by the maximum index value for each dimension. In other words, if the input : array has two dimensions, then the subarray operator requires four start and end values.

Example:

Suppose 'my_array' is a 2-dimensional array with two integer attributes and dimensions x and y. The following table illustrates the contents of this array. This result would be produced by the scan('my_array') query introduced in the previous section.

[
[(1,3),(5,7),(9,11),(13,15),(17,19)],
[(21,23),(25,27),(29,31),(33,35),(37,39)],
[(41,43),(45,47),(49,51),(53,55),(57,59)],
[(61,63),(65,67),(69,71),(73,75),(77,79)],
[(81,83),(85,87),(89,91),(93,95),(97,99)]
]

The first input to the subarray operator is the source array. The rest of the inputs are the indexes of the subarray window. First, the lower bound indexes for all dimensions in the input array, are given, then the upper ones. The result of subarray is an array of smaller or equal size, whose dimensions always start at 0 and span only the length of the subarray region.

We can use the subarray operator as shown here:

  subarray( my_array, 1, 1, 2, 2)

This query will return a block of the input array where the initial (top-left) cell is [ 1,1 ] and the final (bottom-right) cell is at [ 2, 2 ].

y\x01
0(11,11.0)(12,12.0)
1(21,21.0)(22,22.0)

This query will return a block of the input array where the initial (top-left) cell is [ 4,4 ] and the final (bottom-right) cell is at [ 6,6 ].

 subarray ( my_array, 4, 4, 6, 6)
y\x012
0(44,44.0)(45,45.0)(46,46.0)
1(54,54.0)(55,55.0)(56,56.0)
2(64,64.0)(65,65.0)(66,66.0)
 subarray ( my_array, 4, 3, 6, 7)
y\x01234
0(43,43.0)(44,44.0)(45,45.0)(46,46.0)(47,47.0)
1(53,53.0)(54,54.0)(55,55.0)(56,56.0)(57,57.0)
2(63,63.0)(64,64.0)(65,65.0)(66,66.0)(67,67.0)

Reshape an array : reshape

Change the shape of an array to the rank and dimensions of another array.

Signature:

reshape( array, target : array_identifier | anonymous_schema )

The reshape operator's second argument is:

  • the name of an array and its schema to be used as the template for the operator's result, or
  • an array schema ("anonymous_schema") to be used as the template for the operator result

NOTES:

  1. The target array (or anonymous array schema) should have the same number of attributes as the input array.
  2. The arrays must have fixed size dimensions. That is, reshape for unbounded arrays is not supported.
  3. Size of the input and target array should be the same. That is, both arrays should have the same number of cells. For example, it is possible to reshape 2x2x2 array to 4x2, but not 3x3 to 2x2.

Example:

  1. Create a 3-by-4 array called source:
    create array source <val:double> [x=0:2,1,0,y=0:3,1,0];
    
  2. Add numerical values to source:
    store(build(source,x*3+y),source)
    
  3. Reshape the array to 4-by-3 and store the result in an array called target:
    store(reshape(source, <values:double> [x=0:3,1,0, y=0:2,1,0]),target)
    show(target);
    [("target<val:double NOT NULL> [x=0:3,1,0,y=0:2,1,0]")]
    

Re-partition an array : repart

Change partitioning (chunking) of the array. Target array must have the same attributes and dimensions, but chunk size may be different. Repart returns an array whose attributes are taken from the input array, with the dimensions of the target.

Signature:

repart( array, target : array_identifier | anonymous_schema )

Example:

  1. Create a 4-by-4 array with chunk size of 1 called source:
    create array source <val:double> [x=0:3,1,0,y=0:3,1,0];
    
  2. Add numerical values to source:
    store(build(source,x*3+y),source);
    
  3. Repartition the array into 2-by-2 chunks and store the result in an array called target:
    store(repart(source, <values:double> [x=0:3,2,0, y=0:3,2,0]),target);
    show(target);
    [("target<val:double NOT NULL> [x=0:3,2,0,y=0:3,2,0]")]
    

Transform attributes to dimensions : redimension and redimension_store

The redimension and redimension_store operators convert array attributes to dimensions. Redimension the array and apply aggregates to duplicate cells.

Signature:

redimension( array, target : array_identfier )
redimension_store( array, target : array_identfier )
redimension_store ( source_array, target_array [, (aggregate_call_1 as alias_1 ) ,... (aggregate_call_N as alias_N ) ] )

redimension does not create or update array storage or metadata, and returns the transformed array result. redimension also only works when transforming int64 attributes into dimensions.

redimension_store updates the target_array storage and created additional mapping arrays if necessary.

For both variants of redimension, the target array must be empty-able.

The input and target arrays must have compatible schemas, and both commands determine the list of transformations (attribute to dimension) by matching names in the attribute and dimension lists of the two arrays. It is possible to omit certain attributes and dimensions from the target array.

The key differences betwen redimension and redimension_store are:

  • redimension does not accept an anonymous schema. Create the target array before calling redimension
  • redimension supports non-integer dimensions

Example 1: Count the even and odd values in an array.

  1. Create an array that has a single attribute "a".
    create array A <a:double>[x=0:5,3,0,y=0:5,3,0];
    store(build(A, x*3+y), A);
    
  2. Apply a synthetic attribute "even_or_odd" that is 0 when a is even and 1 when a is odd. Turn the even_or_odd attribute into a dimension with the redimension command and perform a count aggregate:
    redimension(apply(A, even_or_odd, iif(int64(a)%2=0,0,1)),<count:uint64 null,empty_tag:indicator>[even_or_odd=0:1,2,0], count(a) as count)
    

SciDB returns:

[{0}(18),{1}(18)]

This tells us that there are 18 even and 18 odd values in the array.

Note: the target array schema must contain the empty_tag attribute and the datatype of the new count attribute must match the output of the aggregate (uint64 and nullable).

Example 2: Maintain one of the original dimensions and use the aggregates sum and avg to find the sum and average of the even and odd elements:

redimension(apply(A, even_or_odd, iif(int64(a)%2=0,0,1)), 
  <sum:double null, avg:double null, empty_tag:indicator>[x=0:5,3,0,even_or_odd=0:1,2,0], 
  sum(a) as sum, avg(a) as avg)

SciDB returns:

[[{0,0}(6,2),{0,1}(9,3),{1,0}(18,6),{1,1}(15,5),{2,0}(24,8),{2,1}(27,9)]];
[[{3,0}(36,12),{3,1}(33,11),{4,0}(42,14),{4,1}(45,15),{5,0}(54,18),{5,1}(51,17)]]

This means that for x=0, the sum of the even numbers is 6 and the average is 2; the sum of the odd numbers is 9 and average is 3.

Example 3: Instead of creating a schema, the target schema can come from a pre-existing array:

create empty array dense_redim <a:double>[x=0:5,3,0]
redimension(dense,dense_redim)

In this case, we don't use any aggregates and condense values. In the original array, there are 6 values for x=0, and the result of the query at x=0 will contain an unspecified value, chosen from one of the 6 possible candidates. This command does not change any data for the array dense_redim - just reads the schema from it.

The two statements

create empty array dense_redim <a:double>[x=0:5,3,0]

and

create array dense_redim <a:double, empty_tag:indicator>[x=0:5,3,0]

are equivalent.


Expanding an element into a grid : xgrid

Use this operator to scale an input array by repeating cells of the original array specified number of times. This operator can be considered as the inverse of the regrid operator. While regrid splits an input array into "hypercubes" and calculates an aggregate function over them, xgrid produces a hypercube from the single element of the input array by repeating that element.

Signature:

xgrid( array, scale-1, ..., scale-N )

Example:

scan(m3x3)
[
[(0),(1),(2)],
[(3),(4),(5)],
[(6),(7),(8)]
]

xgrid(m3x3,2,2)"
[
[(0),(0),(1),(1),(2),(2)],
[(0),(0),(1),(1),(2),(2)],
[(3),(3),(4),(4),(5),(5)],
[(3),(3),(4),(4),(5),(5)],
[(6),(6),(7),(7),(8),(8)],
[(6),(6),(7),(7),(8),(8)]
]

Select a subplane of the array : slice

Get a slice of the array. Result is a slice of the input array corresponding to the given coordinate value(s). Number of dimensions of the result array is equal to the number of dimensions of input array minus number of specified dimension, and the coordinate value should be a valid dimension value of the input array.

Signature:

slice( array, dimension1 : dimension_identifier, coordinate1 : value [ ,dimensionN : dimension_identifier, coordinate-value-N : value ] )

Example:

scan(m3x3)
[
[(0),(1),(2)],
[(3),(4),(5)],
[(6),(7),(8)]
]

slice(m3x3,x,1)
[(3),(4),(5)]

Transform into a single dimensional array : unpack

Unpack array into a single-dimensional array creating new attributes to represent source array dimension values. Result array has a single zero-based dimension and arguments combining attributes of the input array. The name for the new single dimension is passed to the operator as the second argument.

Signature:

unpack( array, attribute_name )

Example:

show(m3x3)
[("m3x3<val:double NOT NULL> [x=0:2,3,0,y=0:2,3,0]")]

scan(m3x3)
[
[(0),(1),(2)],
[(3),(4),(5)],
[(6),(7),(8)]
]

unpack(m3x3,i)
[(0,0,0),(0,1,1),(0,2,2),(1,0,3),(1,1,4),(1,2,5),(2,0,6),(2,1,7),(2,2,8)]

The format for the output of the unpack operator is to begin by enumerating the dimensions of the input, and then to append the attribute values. Chunk size -- ie., the number of elements per chunk is preserved from input to output. The resulting one-dimensional array has chunk dimension equal to the count of cells in each input chunk.

NOTE: unpack only supports arrays whose size is evenly divisible by chunk size in every dimension. You cannot unpack an array A <a:int32> [x=0:100,30,0]; but you can unpack an array A <a:int32> [x=0:100,20,0].


Transpose an array : transpose

Transpose an array by inverting the order of its dimensions.

Signature:

transpose( array )

scan(m3x3)
[
[(0),(1),(2)],
[(3),(4),(5)],
[(6),(7),(8)]
]

transpose(m3x3)
[
[(0),(3),(6)],
[(1),(4),(7)],
[(2),(5),(8)]
]

Select elements from an array dimension : thin

Signature:

thin( array, start-1, step-1, start-2, step-2, ... )

Select regularly spaced elements of the array in each dimension. The selection criteria are specified by the starting dimension value start-i and the number of cells to skip using step-i for each dimension of the input array.

Example:

  1. Create a 2-dimensional array:
    create array A <a:int32> [x=0:8,6,0,y=0:8,8,1]
    
  2. Store numerical values in the array:
    store(build(A,(x*8+y),A)
    
  3. Select every other element from the first dimension:
    thin(A,1,3,0,2)
    

Note: The position specified by start-1 must be within the actual, rather than the relative position of the array indices. For example, if the range of dimension x were 10:18, the thin command would throw an out-of-range error.


Matrix and Vector Operations

Some operators accept 2-dimensional arrays as inputs and interpret them like matrices. As such, the term matrix refers to an array that has two dimensions and one numeric attribute. For all matrix operators the first declared dimension is rows and the second declared dimension is columns.

Multiply two matrices : multiply

Perform a matrix multiply. Matrices should have two dimensions and a single numerical attribute of precision double. The number of columns of the first array must be equal to the number of rows in the second array.

Signature: multiply( double : array, double : array )

Example:

create array A <a:double>[x=0:2,1,0,y=0:1,1,0]
store(build(A,x*3+y),A)

create array B <b:double>[x=0:1,1,0,y=0:2,1,0]
store(build(B,((x+2)*(3+y))),B)

multiply(A,B)

remove(A)
remove(B)

Normalize vector elements : norm

Calculate the L2-norm of a vector by dividing each element by the square root of the sum of squares of the elements.

Signature: norm( array, attribute : attribute_identifier )

Example:

scan(v10)
[(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)]

norm(v10)
[(0.0509647),(0.101929),(0.152894),(0.203859),(0.254824),(0.305788),(0.356753),(0.407718),(0.458682),(0.509647)]

Debug and Internal commands

Cancel a running query : cancel

Cancel a currently running query by query id.

Signature: cancel( query_id : integer )

Example:

cancel(12345)

The query id can be obtained from the scidb log or via the list() command. SciDB maintains query context information for each completed and in-progress query in the server. If the user issues a "ctrl-C" or abort from the client, the query is cancelled and its context is removed from the server.

In-progress queries are aborted by issuing a cancel() command with the query ID. Context information is also removed by this command.

This query context can also be removed for completed queries by issuing a cancel() command.

Query context information maintained by the server is not persistent and is lost on server restart.


Output a string : echo

Simply accepts a string and returns a single-element array containing the string.

Signature:

echo( string )

Example:

echo('Hello world')
[("Hello world")]

Explain plan - explain_logical and explain_physical

The operators explain_logical and explain_physical can be used to emit a human-readable plan string for a particular query without running the query itself. SciDB first constructs a logical plan, optimizes it and then translates it into a physical plan.

Signature:

explain_logical( query: string, language: string ) explain_physical ( query: string, language: string )

where language corresponds to the language of the query string and is either afl or aql.

Example:

explain_physical('store(join(a, b), c)','afl')

Note that the output of the explain operators is a single-element array that contains the plan string. The plan text format is intended for SciDB developers and experts and is subject to change in the future.


Get the signature of an operator : help

Accepts an operator name and returns an array containing a human-readable signature for that operator.

Signature: help ( operator_name: string )

Example:

help('multiply')
[("Operator: multiply
Usage: multiply(<input>, <input>)")]

Scatter/Gather an array : sg

The SG operator is used internally to redistribute or repartition the elements of an array between SciDB processing nodes. SG by itself does not perform any computation and its use in user queries can lead to inefficiencies and is discouraged.

sg redistributes chunks of the input array across cluster nodes according to a specified policy. sg is not a user-visible operator, but it is heavily used by the system internally. It is a physical operator for data movement within the system and users are not expected to invoke it directly. For example, the scidb optimizer inserts an sg operator after load to redistribute array data using round robin distribution. The optimizer may also insert sg operators to assist with aggregations, operators that don't preserve shape or operators that don't preserve distribution.

The signature of 'sg' is: sg ( array, [ partitioning schema [, node id [, result array name [, store_flag [, offset_x, offset_y, ... ] ] ] ] ])

The following table describes the arguments to sg:

arg name Description Comment
input_array array to be redistributed. May also be the output of another operator like scan, filter, etc.
partitioning schema integer constant specifying partitioning schema. Possible values are shown in the table below.
node number a node number to send all chunks. Used only with psLocalNode. Ignored for other partitioning schemas.
result array a name of result array. If omitted, the result of sg is nameless and temporary with query duration.
store_flag a boolean indicator used to specify whether sg should store the result array. By default, true if result array name is given. If store_flag is true, the result of sg is persisted as a new array with given name. If store_flag is false, the result is a temporary named array that can be referenced elsewhere in the query.
offset a distribution offset vector. For internal use.

The partition schema is one of the following types.

partition schema Description Comment
0 psReplication replicate array on every node of cluster
1 psRoundRobin distribute chunks by round robin
2 psLocalNode send every local chunk to specified node
3 psByRow distribute every chunk to node according to chunk coordinate of dimension 0
4 psByCol distribute every chunk to node according to chunk coordinate of dimension 1

Examples: Query to distribute array by round robin onto every node:

sg(seq_info, 1, -1, seq_info_sg)

Query to gather whole array on node 0 in memory array (will not be stored on disk):

sg(seq_info_sg, 2, 0)

Reduce the distribution of a replicated array : reduce_distro

Given an array that is distributed with psReplication - quickly reduce the distribution to a different schema by masking certain chunks. For internal use.

Signature:

reduce_distro( array, partitioning_schema: integer )

Example:

reduce_distro(\"target@1:label\",2)
[("a"),("b"),("e"),("f"),("l"),("o"),("x"),("z")]

Metadata Operators

AFL provides a number of mechanisms for getting information about the database it is managing. These facilities all interrogate the SciDB internal meta-data, and return an array object which can be subject to further query expressions.

List contents of SciDB database : list

The list operator allows users to get a list of operators, types or the names of arrays in the SciDB instance.

Signature:

list ( code : string )

The code : string parameter value is one of the following: 'aggregates', 'arrays', 'functions', 'libraries', 'operators', 'types', or 'queries'. If called without any parameters, list will return a list of all arrays created in SciDB.

Example:

list() 
list('arrays')
[("Reads"),("Load_and_Append"),("Load_Two"),("Example_One"),("Example_Four"),("Build_Example_INT")]
list('functions')

enumerate the list of functions available for use in expressions.

FunctionsDescription
<=, <>, <, =, >=, >, -, /, *, %, +, and, or, notarithmetic and logic operators
abs, acos, asin, atan, ceil, cos, exp, floor, log, log10, pow, sin, sqrt, tanmathematical functions
double, float, int16, int32, int64, int8, uint8, uint16, uint32, uint64, stringtype conversion
random random number generation
nowget current time (local time zone)
first, last, length, lowdimension interrogation
nodeidget current nodeid, useful in determining array distribution
regex, strchar, strftime, strlen, substrstring manipulation functions
list('operators')

enumerates the list of operators available on this server. User-defined operators appear along with built-in operators in this list.

list('types')

enumerates the list of known types. User-defined types registered with the server appear along with standard built-in types.

[("bool"),("char"),("double"),("float"),("indicator"),("int16"),("int32"),("int64"),("int8"),("string"),("uint16"),("uint32"),("uint64"),("uint8")]
list('queries')
name,query_id
"create array A <a:int64> [x=0:2,3,0]",3338107939
"remove(B)",3338227944
"select * from A a1 join A a2 on a1.a=a2.a",3338257950
"drop array simplearray",3352140153
"list('arrays')",3352150179
"list('arrays')",3352150185
"list('queries')",3371118518

Note that list returns a SciDB array. So it can be used as the input to other data management operators like filter, project.

The following query returns all of the functions in SciDB named 'regex', together with their function signatures:

filter(list ('functions'), name='regex');

The following query uses the SciDB regex() function to filter the functions to find only those that take as input a pair of strings:

filter (list ('functions'), regex(profile, '(.*)(string,string)(.*)'));

And finally, the following query filters out only functions that take a pair of strings as input and compute some boolean result (like "regex"). Then it sorts that result by the function's name (the first attribute produced by list('functions')), and finally projects out only the function's signature:

project (
    sort (
        filter (
            list ('functions'),
            regex(profile, '(.*)bool(.*)(string,string)(.*)')
        ),
    1),
    profile
);

Getting dimensions of an array : dimensions

Get a list of array dimensions.

Signature:

dimensions ( array_identifier )

The argument to the dimensions operator is the name of the array. It returns an array with the following attributes: name, start, length, chunk-interval, chunk-overlap, low-boundary, high-boundary.

Examples:

dimensions (Example_Three)

[("X",0,10,10,0,0,10),("Y",0,10,10,0,0,10),("Z",0,100,100,0,0,10)]

For a regular array defined using a create array statement with fixed dimensions the start and length are set to the values specified in the CREATE ARRAY statement.

The low and high boundaries represent the current occupancy of cells within the array. Low-boundary represents the lowest dimension value occupied by any element in the array and high-boundary represents the highest dimension value occupied by an array element.

NOTE:

  • The length field for an unbounded dimension is set to the maximal signed 64-bit integer, namely, 9223372036854775807.
  • The low and high boundaries for a newly created array that does not yet have any data is also set to Undefined values recorded as the following special constants (MAXINT64 - 1) and -(MAXINT64-1) respectively. These constants are interpreted as "Undefined" for array dimension values into which no data has been loaded.

Getting attributes of an array : attributes

Get a list of array attributes. Returns an array with the following attributes: name, type, and boolean flag. The boolean flag represents nullability of the attribute and is True if the attribute can be set to NULL.

Signature:

attributes ( array_identifier )

The argument to the attributes operator is the name of the array.

Examples:

 CREATE ARRAY Attr_Example_One < A: int32, B: string > [ I=0:9,10,0 ]
 attributes ( Attr_Example_One )
 [("A","int32",false),("B","string",false)]

CREATE ARRAY Attr_Example_Two < A: string, B: double, C: int32 > [ I=0:9,10,0 ]
attributes ( Attr_Example_Two )
[("A","string",false),("B","double",false),("C","int32",false)]

Attributes are only available for stored arrays. Result arrays returned by an AFL command may not be used as input to this operator.


Get public format of SciDB array : show

The show operator combines the information an array's attributes and dimensions. The result is formated as it would appear in a create array ... statement.

Signature:

show ( array_identifier )

Example:

CREATE ARRAY Attr_Example_Two < A: string, B: double, C: int32 > [ I=0:9,10,0 ];
show ( Attr_Example_Two )
[("Attr_Example_Two 8<A:string NOT NULL,B:double NOT NULL,C:int32 NOT NULL> [I=0:9,10,0]")]

Getting versions of an array : versions

Get a list of array versions for an array which has been declared as updatable.

Signature:

versions ( array_identifier )

Example:

versions(updarr)

[(1,"12/31/10 11:03:10"),(2,"12/31/10 12:22:17"),(3,"12/31/10 12:22:19")]

The output of the versions command is a list of versions, each of which has a version ID and a datestamp which is the date and time of creation of that version.


Loading and Unloading Plugins

SciDB supports user-defined types, functions, and operators using SciDB's extension architecture. Operators are written using the UDO array API. For details on how to write a user-defined function or type for SciDB, please refer to the "SciDB Plugins" documentation for this release. Plugins are built as shared libraries and are loaded from the plugin directory specified in the config.ini file.

load_library

load_library( 'librational.so' );

Loading a plugin registers the library in the SciDB system catalog. Plugins registered with the SciDB instance will be loaded at system start time. Plugins will be re-loaded from the .so file on system restart.

unload_library

To unload a library,

unload_library( 'librational' )

This command will unregister the library in the system catalog. The library will not be loaded on next restart, but will remain available for reload.