Working with more_math library


#1

I’m trying to access any of the functions from the more_math library (checkisprime, for example) from AQL/AFL but there’s nothing in the documentation other than how to load the library. Could you please let me know how to use these additional math functions?


#2

Hello,

Looks like the library adds a few functions and here’s how we can find their signatures:

apoliakov@scalpel:~/scidb$ iquery -o csv+ -aq "list('functions')" | grep more_math
181,"fact","string fact(int64)",true,"more_math"
195,"isprime","string isprime(int64)",true,"more_math"
196,"lasso","double lasso(double,double,double)",true,"more_math"
207,"mylog","double mylog(double,double)",true,"more_math"

So here’s how we can use the functions isprime() and mylog() in a query:

apoliakov@scalpel:~/scidb$ iquery -aq "create array foo<val:int64> [x=1:10,10,0]"
Query was executed successfully
apoliakov@scalpel:~/scidb$ iquery -aq "store(build(foo,x),foo)"
[(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)]

apoliakov@scalpel:~/scidb$ iquery -o csv+ -aq "apply(foo, val_is_prime, isprime(val), val_log, mylog(val,3))"
x,val,val_is_prime,val_log
1,1,"1 :not prime",0
2,2,"2 :prime",0.63093
3,3,"3 :prime",1
4,4,"4 :not prime",1.26186
5,5,"5 :prime",1.46497
6,6,"6 :not prime",1.63093
7,7,"7 :prime",1.77124
8,8,"8 :not prime",1.89279
9,9,"9 :not prime",2
10,10,"10 :not prime",2.0959

Does this help?
– Alex Poliakov


#3

It does help, thank you.

While the list(); command is helpful, I didn’t see anything in your documentation that differentiated between libraries, functions, and plugins. In fact, the doc only discusses libmore_math in the plugins section, so I would have expected list(plugins); to tell me something rather than list(functions);.


#4

Hello,

We do have list(‘libraries’) that returns all of the plugins that are currently loaded:

apoliakov@scalpel:~/scidb$ iquery -aq "list('libraries')"
[("libmore_math.so",0,0,0,0),("librational.so",0,0,0,0)]

We are sort of assuming that everyone who loads a plugin already knows (or has documentation for) all the extra functionality that the plugin adds. But suppose you see librational.so and you have no idea what’s in that plugin. Well, you could try running list commands with grep to figure that out:

apoliakov@scalpel:~/scidb$ iquery -o csv+ -aq "list('operators')" | grep rational
apoliakov@scalpel:~/scidb$ iquery -o csv+ -aq "list('types')" | grep rational
12,"rational","rational"
apoliakov@scalpel:~/scidb$ iquery -o csv+ -aq "list('functions')" | grep rational
14,"*","rational *(rational,rational)",true,"rational"
26,"+","rational +(rational,rational)",true,"rational"
47,"-","rational -(rational,rational)",true,"rational"
62,"/","rational /(rational,int64)",true,"rational"
63,"/","rational /(rational,rational)",true,"rational"
78,"<","bool <(rational,rational)",true,"rational"
94,"<=","bool <=(rational,rational)",true,"rational"
125,"=","bool =(rational,rational)",true,"rational"
141,">","bool >(rational,rational)",true,"rational"
157,">=","bool >=(rational,rational)",true,"rational"
187,"getdenom","int64 getdenom(rational)",true,"rational"
188,"getnumer","int64 getnumer(rational)",true,"rational"
214,"rational","rational rational(int64)",true,"rational"
215,"rational","rational rational(int64,int64)",true,"rational"
216,"rational","rational rational(string)",true,"rational"
222,"str","string str(rational)",true,"rational"
apoliakov@scalpel:~/scidb$ iquery -o csv+ -aq "list('aggregates')" | grep rational

So from this I can surmise that the rational library adds a new datatype “rational” and adds the above functions for it. The library does NOT add any aggregates or operators.

Does this help? Is our doc not very clear on this point?


#5

So there’s no way that I could write a function and document it in SciDB so that other people using my SciDB install could discover it and use it themselves?

This is helpful, thanks. I’m curious: is it the norm for most SciDB users to use the interactive mode non-interactively? By which I mean, is it usual for people to work from the Linux command line rather than the iquery CL?

Your documentation is not very clear on many points. What would have helped me in this case, is an example that goes from start (C++ code) to finish (AFL/AQL query) for adding new analytical processing. That’s why I ended up examining the lib_moremath examples rather than the rational data type discussed in the document.

Thanks!


#6

Well feedback like this allows us to improve our product, and so, thanks for the feedback.

We’ll try to

  • improve the doc with an example like you mentioned
  • build more doc capabilities into the software so that tasks like “describe to me all the user-defined elements from this plugin” become much easier

Here’s my take on how to interact with scidb in general. At this point, different people definitely do very different things. Some say the CLI is sufficient. Some folks write their own clients altogether (i.e. via our Python api). For me, for most of the work that I’ve been doing - I prefer using afl, and I prefer to always view the results in csv+ form. Sometimes the result sets are huge and I like to save them to temporary files. So I’ve actually written a little bash thing that I use:

#this is in my .bashrc 
afl()
{
  if [ $# -ne 1 ]; then
   echo "You need to give me a query"
   return 1;
  fi
  iquery -o csv+ -aq "$1" | less
}

Then when I say

$ afl "scan(foo)"

It brings up a less window with the query result. I can scroll through the output and search through it. And, at other times

$ afl "scan(foo)" > foo.csv

allows me to save things to an output file.

And of course, sometimes you’re running a store() query and you don’t want to see the output at all. Sometimes non-integer dimensions come into play and you want to examine them. But generally, after I came up with this little afl helper, I found it was pretty comfortable to use and did the job in a lot of cases for me.


#7

Thanks for the bash script, that’s really nice. I have been mostly using AFL since the doc says AQL doesn’t have all the functionality so this will be handy.

I notice all your stuff is for single commands. How do you handle complex processing? If I need to do multiple steps, do I break it into multiple functions? I assume that if I have one function handle everything then I’m not able to debug in SciDB? If I go the route of multiple function/multiple intermediate arrays, is there anyway to send all the processing commands at once, or do I need to do them one at a time?

Also, I’d heard there was a way to make a change to an intermediate product cascade automatically so that the later values are updated; is that in the documentation?


#8

I believe you’re referring to either provenance and materialized views. These features are not implemented yet.

As far as “how to do complex processing” - again there is a variety of options here. In the use cases I’ve encountered, I’ve relied primarily on little shell scripts and found them quite adequate. At times, they do become a bit painful to edit, but overall, once you get a script to work, it becomes a fairly stable “procedure” that you can store and reuse. Just as easily as shell scripts, you could write perl scripts or python apps or even C++ code that sends queries to scidb directly. It does depend on the use case.

Here’s a little shell script I wrote to demonstrate. It’s completely standalone and you should be able to run it on your setup. It creates a sample CSV file of american presidents, loads it into scidb and performs some simple calculations. After the script is finished, you have three new arrays in the system: us_presidents, president_stats and yio_histogram.

This is the script:

#!/bin/bash

#Function cleanup. This function is executed whenever the user issues ctrl+c or kills the script.
#The command "kill -9" is special and won't be handled by this function
trap cleanup 1 2 3 6 15
cleanup()
{
  echo "Caught signal. Exiting"
  #Note: you may or may not want to remove temp arrays here. 
  #One problem is that remove might block. So it's safer not doing it.
  exit 1;
}

#A function that executes some AFL and does not return the output
afl_nr()
{
  if [ $# -ne 1 ]; then
   echo "You need to give me a query"
   return 1;
  fi
  OUTPUT=`iquery -anq "$1" 2>&1`
  #IF there was an error -kill the script and show the error
  if [ $? -ne 0 ]; then
   echo "There was an error running query $1: $OUTPUT"
   exit 1;
  fi
}

#Let's create a CSV file of US presidents. The columns in this file are:
#first_name:string, middle_name:string null, last_name:string, years_in_office:int8 null, died_in_office:bool null
#Barack Obama is the current president, so he has null for "years_in_office" and "died_in_office".
echo '
george,,washington,8,false
john,,adams,4,false
thomas,,jefferson,8,false
james,,madison,8,false
james,,monroe,8,false
john,quincy,adams,4,false
andrew,,jackson,8,false
martin,,vanburen,4,false
william,henry,harrison,0,true
john,,tyler,4,false
james,knox,polk,4,false
zachary,,taylor,1,false
millard,,fillmore,3,false
franklin,,pierce,4,false
james,,buchanan,4,false
abraham,,lincoln,4,true
andrew,,johnson,4,false
ulysses,simpson,grant,8,false
rutherford,birchard,hayes,4,false
james,abram,garfield,0,true
chester,alan,arthrur,4,false
grover,,cleveland,4,false
benjamin,,harrison,4,false
grover,,cleveland,4,false
william,,mckinley,4,true
theodore,,roosevelt,8,false
william,howard,taft,4,false
woodrow,,wilson,8,false
warren,gamaliel,harding,2,true
calvin,,coolidge,6,false
herbert,clark,hoover,4,false
franklin,delano,roosevelt,12,true
harry,s,truman,8,false
dwight,david,eisenhower,8,false
john,fitzgerald,kennedy,2,true
lyndon,baines,johnson,6,false
richard,milhous,nixon,5,false
gerald,rudolph,ford,3,false
james,earl,carter,4,false
ronald,wilson,reagan,8,false
george,herbertwalker,bush,4,false
william,jefferson,clinton,8,false
george,walker,bush,8,false
barack,hussein,obama,,
' >/tmp/presidents.csv

#Let's remove all the other arrays that were created during past runs:
echo "Removing old arrays..."
#these queries will throw an error if the array does not exist. So we don't use afl_nr() here.
iquery -anq "remove(us_presidents)" >/dev/null 2>&1
iquery -anq "remove(president_stats)"  >/dev/null 2>&1
iquery -anq "remove(yio_histogram)"  >/dev/null 2>&1


#Load the CSV into SciDB
echo "Loading presidents..."

#Create the array that we're loading into. When loading CSVs, we use one-dimensional arrays
afl_nr "create empty array us_presidents <first_name: string, middle_name:string null, last_name:string, years_in_office:int8 null, died_in_office: bool null> [president_number=1:50,10,0]"

#this method works best for very large CSV files. First we create a fifo and convert from csv to scidb format, piping output to the FIFO. Then we tell scidb to load from the FIFO.
PIPE=/tmp/load.pipe.presidents
rm -f $PIPE
mkfifo $PIPE

#skip 0 lines from the beginning; start the first value at coordinate 1 (will correspond to president_number 1); use chunk size of 10. 
#format explanation:
#S - non-nullable string-like field
#s - nullable string-like field
#N - numeric field (may be nullable)
#Run this in the background and allow 2 seconds for it to start up
csv2scidb -s 0 -f 1 -c 10 -p "SsSNs" < /tmp/presidents.csv > $PIPE &
sleep 2

afl_nr "load(us_presidents, '$PIPE')"

echo "Computing statistics on presidents..."
afl_nr "
store(
 aggregate(
  us_presidents, 
  count(*) as total_presidents,
  max(years_in_office), 
  avg(years_in_office)
 ),
 president_stats
)"
echo "Created president_stats array..."

echo "There were `iquery -o csv -aq "project(president_stats, total_presidents)" | tail -n 1` presidents in the US"
echo "The average present spent `iquery -o csv -aq "project(president_stats, years_in_office_avg)" | tail -n 1` years in office"

MAX_YEARS_IN_OFFICE=`iquery -o csv -aq "project(president_stats, years_in_office_max)" | tail -n 1`
echo "President `iquery -o csv -aq "project(filter(us_presidents, years_in_office=$MAX_YEARS_IN_OFFICE), last_name, first_name)" | tail -n 1` was one of the longest serving with $MAX_YEARS_IN_OFFICE years in office"

echo "Computing a histogram by years in office..."
afl_nr "create empty array yio_histogram <num_presidents:uint64 null, list_of_names:string null> [years=0:$MAX_YEARS_IN_OFFICE, 10, 0]"
afl_nr "
redimension_store(
 apply(
  substitute( 
   us_presidents, 
   build(<val:int8> [x=0:0,1,0], 0),
   years_in_office
  ),
  last_name_with_comma, last_name+', ',
  years, int64(years_in_office)
 ),
 yio_histogram, 
 0, 
 count(*) as num_presidents,
 sum(last_name_with_comma) as list_of_names
)"

echo "There were `iquery -o csv -aq "project(filter(yio_histogram, years=8), num_presidents)" | tail -n 1` presidents who served for 8 years. They are: `iquery -o csv -aq "project(filter(yio_histogram, years=8), list_of_names)" | tail -n 1`"

And on my machine it outputs the following:

apoliakov@scalpel:~/workspace/scidb_trunk$ ./sample_script.sh 
Removing old arrays...
Loading presidents...
Computing statistics on presidents...
Created president_stats array...
There were 44 presidents in the US
The average present spent 5.11628 years in office
President "roosevelt","franklin" was one of the longest serving with 12 years in office
Computing a histogram by years in office...
There were 13 presidents who served for 8 years. They are: "jefferson, madison, monroe, jackson, washington, clinton, bush, grant, roosevelt, wilson, truman, eisenhower, reagan, "

apoliakov@scalpel:~/workspace/scidb_trunk$ iquery -o csv -aq "list()"
name
"president_stats"
"us_presidents"
"yio_histogram"