remove_versions: VersionNo vs. version_id


#1

Hi,

I’ve been playing around with versions and remove_versions and noticed a small discrepancy between the docs and the output of “versions”:
The remove_version doc talks about a version number; the output from “versions” has both a VersionNo and a version_id. The input that the remove_version actually expects (in 14.8, as far as I could test out) is the version_id.

I hope this helps somebody.


#2

Thanks!

I have pointed our Crack Documentation Team cough Joe cough at your post.


#3

“Upon further review … it has been determined that the bug call on the previous post was called in error. There was no discrepancy. Replay the down.”

Here’s the longer explanation. For those playing along in the SciDB fantasy league at home, cut-n-paste each of these and apply them using iquery to see what’s going on.

Let’s start with a simple example:

DROP ARRAY Version_Test;
CREATE ARRAY Version_Test
< a1 : double >
[ I=0:*,100,0 ];

Now, let’s populate it.

SET NO FETCH;
SELECT a1
  INTO Version_Test
  FROM build ( < a1 : double > [ I=0:99,100,0 ],
               double(random()%10000)/1000.0
             );

If you want to, you can always have a look at the state of this array by using a

SELECT * FROM Version_Test;

Anyway, at this point, we have the following versions of the array. (Note that your mileage should vary, as the timestamp results in my query are unlikely to be the same produced in your own.)

SET FETCH;
SELECT * FROM versions ( Version_Test );
--
--  {VersionNo} version_id,timestamp
--  {1} 1,'2014-10-09 19:33:53'

So, let’s create another version by updating the contents of the array a few times. Between each UPDATE, we’ll check what the state of the array’s version information is.

SET NO FETCH;
UPDATE Version_Test
   SET a1 = a1 + 100
 WHERE I%20 = 0;
--
SET FETCH;
SELECT * FROM versions ( Version_Test );
--
-- {VersionNo} version_id,timestamp
-- {1} 1,'2014-10-09 19:33:53'
-- {2} 2,'2014-10-09 19:34:22'
--
SET NO FETCH;
UPDATE Version_Test
   SET a1 = a1 + 200
 WHERE I%19 = 0;
--
SET FETCH;
SELECT * FROM versions ( Version_Test );
--
-- {VersionNo} version_id,timestamp
-- {1} 1,'2014-10-09 19:33:53'
-- {2} 2,'2014-10-09 19:34:22'
-- {3} 3,'2014-10-09 19:35:08'
--
SET NO FETCH;
UPDATE Version_Test
   SET a1 = a1 + 300
 WHERE I%21 = 0;
--
SET FETCH;
SELECT * FROM versions ( Version_Test );
--
--  {VersionNo} version_id,timestamp
--  {1} 1,'2014-10-09 19:01:40'
--  {2} 2,'2014-10-09 19:01:42'
--  {3} 3,'2014-10-09 19:01:43'
--  {4} 4,'2014-10-09 19:01:45'

So at this point, we have 4 versions. Now, note that the result of the versions ( Array_Name ) query is a 1D array, where the array’s dimension is named “VersionNo”, and the array has two attributes, “version_id”, and “timestamp”. The dimension here is simply an arbitrary value that contains something like the “row number” of the query. Having a “row number” or “dimension index” is handy when you want to ask questions like, “What’s the longest period between version updates?”, say. But the VersionNo here tells us nothing about the array’s versions (other than where each one fits in the sequence). Each version’s identity is in the version_id.

You can see what goes on as we remove a couple of versions.

SET LANG AFL;
remove_versions ( Version_Test, 2 );
--
SET LANG AQL;
SELECT * FROM versions ( Version_Test );
--
--  {VersionNo} version_id,timestamp
--  {1} 2,'2014-10-09 19:01:42'
--  {2} 3,'2014-10-09 19:01:43'
--  {3} 4,'2014-10-09 19:01:45'
--
SET LANG AFL;
remove_versions ( Version_Test, 3 );
--
SET LANG AQL;
SELECT * FROM versions ( Version_Test );
--
--  {VersionNo} version_id,timestamp
--  {1} 3,'2014-10-09 19:01:43'
--  {2} 4,'2014-10-09 19:01:45'

The VersionNo in the result array is simply a handle to the “n-th version in the system”. It doesn’t contain a value you can use to address the array’s version. So, as you can see, removing versions means that the the “VersionNo” gets out-of-sync with the array’s “version_id”. If you remove a version, we keep the max version_id, but the number of versions goes down.

Each cell in the array produced by versions(…) represents one version of the array that we are storing (and that you can address). The VersionNo refers to the number (but not the identity!) of each active version.

Hope this helps!


#4

Paul’s explanation should clear things up nicely.

However, in the spirit of scrupulous accuracy, the documentation for remove_versions() will be changed slightly. The second argument of the operator signature, called “version_number” in the 14.8 documentation, will in a subsequent release be called “version_id.”


#5

Man, a 650 word post to tell me that I was wrong and then go on explaining in elaborate detail exactly what I told you in the first place - that’s… interesting.

But hey, it seems jmaguire understood what I was hoping for, so my inner nitpicker can rest in peace :smile:


#6

No, no!

The 650 word post was just me explaining to myself that I was wrong! I thought (at first) what you were telling us about was a bug. It wasn’t.

But at least now, if anyone googles some combination of SciDB Version Num version_id, they’ll get an explanation of what’s going on.