String Plugin


#1

Gang:

I’ve been playing with strings (https://github.com/tshead/scidb-string)as a way to get familiar with plugins. Along the way, I’ve run into some oddities:

First, there’s an undocumented “length” function that seems like it might return the length of a string:

AQL% select * from list('functions') where name='length'; [(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),("length","int64 length(string)",true,"scidb"),("length","int64 length(string,string)",true,"scidb"),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),()]

… but it doesn’t:

AQL% select length(model) from carbig; SystemException in file: src/system/catalog/SystemCatalog.cpp function: getArrayDesc line: 760 Error id: scidb::SCIDB_SE_SYSCAT::SCIDB_LE_ARRAY_DOESNT_EXIST Error description: System catalog error. Array 'chevrolet chevelle malibu' does not exist. Failed query id: 1100866347035

… so I created my own “len” function, but I’m wondering what “length” does (either version).

Second, I tried creating a function to concatenate two strings together. If I call it “concatenate”, everything works fine and I can use it in the expected way:

AQL% select concatenate(origin, model) from carbig; [{0}("USAchevrolet chevelle malibu"),{1}("USAbuick skylark 320"),{2}("USAplymouth satellite"),{3}("USAamc rebel sst"),{4}("USAford torino"),{5}("USAford galaxie 500"),{6}("USAchevrolet impala"),{7}("USAplymouth fury iii"),{8}("USApontiac catalina"),{9}("USAamc ambassador dpl"),{10}("Francecitroen ds-21 pallas"),{11}("USAchevrolet chevelle concours (sw)"),{12}("USAford torino (sw)"),{13}("USAplymouth satellite (sw)"),{14}("USAamc rebel sst (sw)"),{15}("USAdodge challenger se"),{16}("USAplymouth 'cuda 340"),{17}("USAford mustang boss ...

… but if I follow the UDT examples and call it “+”, the query fails:

AQL% select origin + model from carbig; SystemException in file: src/util/Job.cpp function: execute line: 55 Error id: scidb::SCIDB_SE_EXECUTION::SCIDB_LE_UNKNOWN_ERROR Error description: Error during query execution. Unknown error: vector::_M_range_insert. Failed query id: 1100866597657

Putting some output to stderr in my code and watching the logs, it looks as if this failure happens before my function is ever called. Any thoughts on debugging this sort of problem?

Cheers,
Tim


#2

Not documented? But there is all of that lovely source code to read…

length turns out to be dimension length, and takes two arguments, an array name, and a dimension name, and it returns an int64. If an array has only one dimension, there is a special form of the function that requires only the array name.

strlen is the function you were looking for.

It would be helpful if functions supported a one line description, like operators do. This way you can find out the semantics as well as the function syntax.

As for the final problem, it would help to see more of your code, but it is possible to run the server in a debugger. A bit tedious since it is multithreaded, but very illuminating.


#3

I don’t suppose you have a concrete example of length (either version) in use? I’ve tried several permutations of AQL and AFL without luck.

Turns-out that the crash I’m seeing with “+” happens with my SciDB built from source on CentOS 5 … a build on Ubuntu 12.4 works fine, so I guess I’ll be looking closely at software versions. I’ve gotten back traces on core dumps from the server, but haven’t looked into running in the dubugger yet.

Cheers,
Tim


#4

Tim,

Out of my own interest, I looked up the length function – I haven’t heard of it before. Here’s a usage example. The function reaches into the catalog and looks up how long the array’s dimension is. I can see it being useful in some computation - i.e. “compute how far the current cell is from the edge?” etc.

apoliakov@daitanto:~/workspace/scidb_trunk$ iquery -aq "show(foo)"
[("foo<val:int64> [x=1:10,10,0]")]
apoliakov@daitanto:~/workspace/scidb_trunk$ iquery -aq "scan(foo)"
[(0),(0),(2),(0),(0),(2),(2),(1),(1),(0)]
apoliakov@daitanto:~/workspace/scidb_trunk$ iquery -aq "show(bar)"
[("bar<x:int64> [val=0:2,3,0,synth=0:9,10,0]")]
apoliakov@daitanto:~/workspace/scidb_trunk$ iquery -aq "scan(bar)"
[[(1),(2),(4),(5),(10),(),(),(),(),()],[(8),(9),(),(),(),(),(),(),(),()],[(3),(6),(7),(),(),(),(),(),(),()]]
apoliakov@daitanto:~/workspace/scidb_trunk$ iquery -aq "apply(foo, foolen, length('foo','x'))"
[(0,10),(0,10),(2,10),(0,10),(0,10),(2,10),(2,10),(1,10),(1,10),(0,10)]
apoliakov@daitanto:~/workspace/scidb_trunk$ iquery -aq "apply(foo, barlen_val, length('bar','val'))"
[(0,3),(0,3),(2,3),(0,3),(0,3),(2,3),(2,3),(1,3),(1,3),(0,3)]

#5

I also find it useful for between as well, when I wish to limit some dimensions, but not others, and I don’t feel like looking up the dimension size, via show.

Using Alex’s bar array:

Of course, you still need to remember if the array start is 0 or 1 based.

The other key to remember is that, as a function, it takes the array and dimension names as a strings, which is inconsistent with SciDB operators, where they are identifiers, i.e. not quoted.


#6

I’m a newbie so I’m going to have to look into the sources you have mentioned and hope it gets me on my way, thanks for the info