[SOLVED] About cross_join


#1

I have some question about cross_join, between 2-D sparse arrays.

For example, When A[latitude, longitude], B[latitude, longitude],

A and B are joined only if same latitude, longitude and both have value.

I would like to join, even if they are not.

For example,

A is
[
[( ),(1),( )],
[(3),( ),(5)],
[( ),(7),( )]
]

B is
[
[(7),( ),(7)],
[(7),(7),(7)],
[(7),( ),(7)]
]

I wants the following result.

[
[(null, 7),(1, null),(null, 7)],
[(3,7),(null, 7),(5,7)],
[(null, 7),(7, null),(null, 7)]
]

but, real result is

[
[( ),( ),( )],
[(3,7),( ),(5,7)],
[( ),( ),( )]
]

What should I do?


#2

Use merge. Merge requires matching attributes. So you can construct three things that all have <reflectance, elevation> and then merge them together.
Use apply to generate nulls and then project to re-order attributes.

This is a pretty common pattern:

merge(
 join( A, B),
 apply( A, elevation, double( null) ), -- not sure if elevation is double or what
 project( apply( B, reflectance, double(null) ), reflectance, elevation)
)

#3

Thank you, Mr. apoliakov.

I had tried in the way you suggest.

And, Got a very successful outcome.

Thank you very much.


#4

[quote=“myfeelup”]Thank you, Mr. apoliakov.

I had tried in the way you suggest.

And, Got a very successful outcome.

Thank you very much.[/quote]

I am a graduate student , and recently I meet some problems in the process of analysing the source code of SciDB . I saw your question in Cross_Join,and I think you may know something that can help me.I have confused myself for a long time>_<···

1. I use Graphviz and Doxygen to generate the relationship of all the source code,but I find that there is no connection between Join/Cross_Join files and other files. I don't know why they have no call or caller,if so ,how did they ues in AQL or AFL?

2. In Join/Cross_Join files , there are also many classes and functions.But the problems are that there are few relations between these functions, I also don't know why they do not have call or caller,if so ,how did they connect each other?o(╯□╰)o

3. I read all functions in Join/Cross_Join files,but I feel a little disorganized about them . Is there any papers,or algorithms,or any materials,or anything else could share with me to help me unstand these functions?%>_<%

Thank you very very very very very very very much!O(∩_∩)O···

#5

Hi,

Yeah, join and cross_join are separate operators, separate code paths. That is the case at least for now. They definitely have a caller but it’s possible your code analysis tools aren’t properly resolving it. The system is organized in such a way that you can write and add your own Operators like Join. This necessarily makes for a more sophisticated callback structure.

Here’s an experiment using iquery and gdb:

  1. Launch a small config of SciDB in debug build. I.e. using run.py. Find the pid of a scidb process, connect to it with gdb, set a breakpoint:
[apoliakov@localhost linear_algebra]$ ps axf | grep scidb | head -n 4
  560 pts/2    S+     0:00  |   \_ grep scidb
20973 pts/8    S+     0:00      \_ vim /home/apoliakov/workspace/scidb_trunk/stage/install/etc/config.ini
  477 ?        S      0:00 /home/apoliakov/workspace/scidb_trunk/stage/DB-mydb/000/0/SciDB-000-0-mydb -i localhost -p 1239 -k -l /home/apoliakov/workspace/scidb_trunk/stage/install/share/scidb/log1.properties -s /home/apoliakov/workspace/scidb_trunk/stage/DB-mydb/000/0/storage.cfg --install_root=/home/apoliakov/workspace/scidb_trunk/stage/install --network-buffer=64 --merge-sort-buffer=64 --mem-array-threshold=64 --pluginsdir=/home/apoliakov/workspace/scidb_trunk/stage/install/lib/scidb/plugins --smgr-cache-size=64 -c host=localhost port=5432 dbname=mydb user=mydb password=mydb
  498 ?        Sl     0:00  \_ /home/apoliakov/workspace/scidb_trunk/stage/DB-mydb/000/0/SciDB-000-0-mydb -i localhost -p 1239 -k -l /home/apoliakov/workspace/scidb_trunk/stage/install/share/scidb/log1.properties -s /home/apoliakov/workspace/scidb_trunk/stage/DB-mydb/000/0/storage.cfg --install_root=/home/apoliakov/workspace/scidb_trunk/stage/install --network-buffer=64 --merge-sort-buffer=64 --mem-array-threshold=64 --pluginsdir=/home/apoliakov/workspace/scidb_trunk/stage/install/lib/scidb/plugins --smgr-cache-size=64 -c host=localhost port=5432 dbname=mydb user=mydb password=mydb

[apoliakov@localhost linear_algebra]$ sudo gdb scidb 498
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
...
...
0x00000030dfee8e63 in epoll_wait () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install apr-1.3.9-5.el6_2.x86_64 apr-util-1.3.9-3.el6_0.1.x86_64 blas-3.2.1-4.el6.x86_64 bzip2-libs-1.0.5-7.el6_0.x86_64 cyrus-sasl-lib-2.1.23-13.el6_3.1.x86_64 db4-4.7.25-18.el6_4.x86_64 expat-2.0.1-11.el6_2.x86_64 glibc-2.12-1.132.el6_5.4.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.6.x86_64 lapack-3.2.1-4.el6.x86_64 libcom_err-1.41.12-18.el6.x86_64 libcsv-3.0.3-2.el6.x86_64 libgcc-4.4.7-11.el6.x86_64 libgfortran-4.4.7-11.el6.x86_64 libicu-4.2.1-9.1.el6_2.x86_64 libpqxx-3.1-1.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64 libstdc++-4.4.7-11.el6.x86_64 libuuid-2.17.2-12.14.el6.x86_64 log4cxx-0.10.0-13.el6.x86_64 nspr-4.10.2-1.el6_5.x86_64 nss-3.15.3-3.el6_5.x86_64 nss-softokn-freebl-3.14.3-9.el6.x86_64 nss-util-3.15.3-1.el6_5.x86_64 openldap-2.4.23-32.el6_4.1.x86_64 openssl-1.0.1e-16.el6_5.4.x86_64 postgresql-libs-8.4.20-1.el6_5.x86_64 protobuf-2.4.1-2.x86_64 zlib-1.2.3-29.el6.x86_64

(gdb) br scidb::PhysicalJoin::execute
Breakpoint 1 at 0x14af33a: file /home/apoliakov/workspace/scidb_trunk/src/query/ops/join/PhysicalJoin.cpp, line 161.

(gdb) c
Continuing.

Now let that be and in a separate session run a join query:

[apoliakov@localhost scidb_trunk]$ iquery -aq "create array foo<val:double> [x=1:10,10,0]"
Query was executed successfully
[apoliakov@localhost scidb_trunk]$ iquery -aq "store(build(foo,random()), foo)"
{x} val
{1} 2.0889e+09
{2} 4.62191e+08
{3} 4.85781e+08
{4} 6.07215e+08
{5} 6.09186e+08
{6} 3.01323e+08
{7} 1.03422e+09
{8} 1.57188e+09
{9} 1.69347e+09
{10} 5.91685e+07
[apoliakov@localhost scidb_trunk]$ iquery -aq "join(foo,foo)"

Now when I execute the query, the gdb session will get interrupted and you will enter the join code. You can use gdb’s backtrace command “bt” to see exactly who called it:

[Switching to Thread 0x7fbb69822700 (LWP 517)]

Breakpoint 1, scidb::PhysicalJoin::execute (this=0x7fbb54010940, inputArrays=std::vector of length 2, capacity 2 = {...}, query=...)
    at /home/apoliakov/workspace/scidb_trunk/src/query/ops/join/PhysicalJoin.cpp:161
161	        assert(inputArrays.size() == 2);
(gdb) bt
#0  scidb::PhysicalJoin::execute (this=0x7fbb54010940, inputArrays=std::vector of length 2, capacity 2 = {...}, query=...)
    at /home/apoliakov/workspace/scidb_trunk/src/query/ops/join/PhysicalJoin.cpp:161
#1  0x00000000010a67fd in scidb::PhysicalOperator::executeWrapper (this=0x7fbb54010940, arrays=std::vector of length 2, capacity 2 = {...}, query=...)
    at /home/apoliakov/workspace/scidb_trunk/src/query/OperatorProfiling.cpp:46
#2  0x0000000000fe7ddd in scidb::QueryProcessorImpl::execute (this=0x7fbb5400d4e0, node=..., query=..., depth=0)
    at /home/apoliakov/workspace/scidb_trunk/src/query/QueryProcessor.cpp:315
#3  0x0000000000fe81ba in scidb::QueryProcessorImpl::execute (this=0x7fbb5400d4e0, query=...)
    at /home/apoliakov/workspace/scidb_trunk/src/query/QueryProcessor.cpp:339
#4  0x0000000001075553 in scidb::SciDBExecutor::executeQuery (this=0x2065e90, queryString="join(foo,foo)", afl=true, queryResult=..., connection=0x0)
    at /home/apoliakov/workspace/scidb_trunk/src/query/executor/SciDBExecutor.cpp:276
#5  0x0000000000f4ca0b in scidb::ClientMessageHandleJob::executeClientQuery (this=0x2802f80)
    at /home/apoliakov/workspace/scidb_trunk/src/network/ClientMessageHandleJob.cpp:558
#6  0x0000000000f5d159 in boost::_mfi::mf0<void, scidb::ClientMessageHandleJob>::operator() (this=0x2803020, p=0x2802f80)
    at /opt/scidb/14.12/3rdparty/boost/include/boost/bind/mem_fn_template.hpp:49
#7  0x0000000000f5cbb8 in boost::_bi::list1<boost::_bi::value<scidb::ClientMessageHandleJob*> >::operator()<boost::_mfi::mf0<void, scidb::ClientMessageHandleJob>, boost::_bi::list0> (this=0x2803030, f=..., a=...) at /opt/scidb/14.12/3rdparty/boost/include/boost/bind/bind.hpp:253
#8  0x0000000000f5c125 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, scidb::ClientMessageHandleJob>, boost::_bi::list1<boost::_bi::value<scidb::ClientMessageHandleJob*> > >::operator() (this=0x2803020) at /opt/scidb/14.12/3rdparty/boost/include/boost/bind/bind_template.hpp:20
#9  0x0000000000f5ad07 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, scidb::ClientMessageHandleJob>, boost::_bi::list1<boost::_bi::value<scidb::ClientMessageHandleJob*> > >, void>::invoke (function_obj_ptr=...)
    at /opt/scidb/14.12/3rdparty/boost/include/boost/function/function_template.hpp:153
#10 0x0000000000ea4177 in boost::function0<void>::operator() (this=0x2803018) at /opt/scidb/14.12/3rdparty/boost/include/boost/function/function_template.hpp:767
...

So, GDB is an old arkane tool. It has a lot of options and is pretty good for things like that. It shows you what code is actually being run, as opposed to what code your code analysis tool thinks is being run. Of course it has its own problems - if you’re working on optimized code (i.e. built with RelWithDebInfo or Release) you will get less information and it will be trickier to find out where things are.

Hope it helps get you started.


#6

[quote=“apoliakov”]Hi,

Yeah, join and cross_join are separate operators, separate code paths. That is the case at least for now. They definitely have a caller but it’s possible your code analysis tools aren’t properly resolving it. The system is organized in such a way that you can write and add your own Operators like Join. This necessarily makes for a more sophisticated callback structure.

Hope it helps get you started.[/quote]

Thanks a lot for your help!O(∩_∩)O! Really appreciate your patience and partical explaination,I hope I could be eligible to join you in one day!:arrow_upper_left:(^ω^):arrow_upper_right:!Thank you again for your reply!