I’m new to SciDB (13.11) and I’m constantly inserting data into a 3D array. I’m not updating any data, just inserting into empty regions of the array. Now I’m wondering:

  • Is it possible that each time I insert new data the new chunks get larger?
  • How can I avoid the creation of a new version each time I insert new data?
  • How can I remove the array’s versions but the last one?

I’d like to know the answer to the last 2 questions no matter the answer to the first one.

First question … not at this time. The partitioning of arrays into chunks, and the mapping of chunks to SciDB instances, is something we currently set statically at CREATE ARRAY time. This isn’t perfect; it’s a problem especially when working with skewed data, and adding new instances is a problem. We know this, and we have a plan to improve things. We want the system to adapt more dynamically. Watch this space …

Second, … the short answer to the last two questions is that you can’t.

We want SciDB to be able to support multiple concurrent readers and writers, all sharing a consistent “view” of the data in our arrays. The state of each “write” operation is isolated from “read” operations, which means the “write” operation’s changes to the database aren’t visible to “readers” until the “write” operation commits, and what “read” operations see is the state of the database at the time they start reading data. To achieve this, we use MVCC (multi-version concurrency control).

We try to make this as lightweight as possible, but to be honest, our goal is not to be an OLTP engine. Our design assumption is large, periodic appends, usually along the dimensions (like the temporal dimension, or a symbol).

Which raises the question … unless you’re explicit about naming the array versions you want, none of this should be visible to you. You just run queries against your array and you get all of the data. I am wondering what it is about the existing functionality that’s causing you problems?