Create array_type


#1

Hi,

First, thank for a highly demanded array-oriented approach.

But I have a question associated with the user-defined types (UDT):

Doesn’t SciDB need an additional statement CREATE ARRAY_TYPE?

For me, this would address several conceptual and practical issues.
At this time, the Array instance looks like a singleton whose id is simultaneously
used as a type identifier and instance identifier. An ArrayType would allow to

  •      distinguish these two concepts
    
  •      maintain extends of array types ( e.g. streams from multiple devices of the same families)
    
  •      use a type identifier for supporting nested arrays
    
  •      consolidate array and UDT type systems
    

Actually, the ArrayType would be consistent with the present SciDB conceptual array model
(as well as other run-time type systems), the previous paper “Retirements for Science Bases
and SciDB” (define vs create), and the SCiDB class model (Array::getHandle just needs
to return its own identifier).

Please let me know if this is so or if I missed something.

Nikolay Malitsky


#2

Hi Nikolay!

The fast answer to your question is “No”. We don’t have any plans for this feature, at this time. The longer answer is as follows.

First, several DBMS products have this kind of functionality today. Postgres supported CREATE TABLE syntax that took as input a ROW TYPE. This functionality found it’s way into the Illustra and INFORMIX engines. Oracle supports object tables. DB2 supports typed tables.

Now, supporting this kind of feature is a fair bit of development work. You need a catalog to hold the types, and to maintain the relationship between types and arrays. You need to support infrastructure that will allow develoeprs to modify types (and their corresponding arrays). On the other hand, when you visit conferences and talk to developers who use these SQL engines, almost no one actually uses the typed table feature.

So given a) the complexity of the implementation, b) the way the people we talked to wanted lots of other features (UDTs, aggregates, user-defined operators, concurrency control, complete support for a variety of query operations such as joins, regrid, various external APIs etc) and c) the fact that we’re still only 4 or 5 engineers, we kinda punted on structured arrays for now.

All of this said, while it isn’t a priority for us (yet), if it is for you, we welcome your patches and design docs! Feel free to implement a prototype.

KR
Pb