Mysterious crash of scidb 13.2


#1

The crash happens after the first query, but startall seems not report any errors.
After the error message (connection refused) I looked at the stderr log on one of many instance nodes.
It reported that a library could not be loaded.

mas-nn is the coordinator, mas-dn01 is one of many instance nodes.

The worker node could not load one of the boost libraries. I ldd’d scidb and looked which one it wanted to load. They all seemed to be there.
I am probably missing some very simple thing. Bu now I have very little pride left, so please don’t hesitate to tell me where I went stupid.

BTW: 12.10 and earlier versions all worked, in all cases I built from sources, etc … so in principle, I am not doing anything differently than before.
The only things different were the boost libraries. Ver 13.2 was looking for 1.46, which I just built with bjam and bootstrap (as directed by boost).

[scidb@mas-nn gfekete]$ iquery -a
AFL% show(MSNOW);
SystemException in file: src/network/NetworkManager.cpp function: abortMessageQuery line: 860
Error id: scidb::SCIDB_SE_NETWORK::SCIDB_LE_CONNECTION_ERROR2
Error description: Network error. Connection error while sending.
Failed query id: 1100879800118
[scidb@mas-nn gfekete]$ ssh mas-dn01
...
[scidb@mas-dn01 1]$ cat scidb-stderr.log 
/bigdata/scidb/merra/001/1/SciDB-001-1-merra: error while loading shared libraries: libboost_system.so.1.46.1: cannot open shared object file: No such file or directory
[scidb@mas-dn01 1]$ ls -l /bigdata/scidb/merra/001/1/SciDB-001-1-merra
lrwxrwxrwx. 1 scidb scidb 25 Apr 10 09:57 /bigdata/scidb/merra/001/1/SciDB-001-1-merra -> /opt/scidb/13.2/bin/scidb
[scidb@mas-dn01 1]$ ldd /opt/scidb/13.2/bin/scidb | grep libboost_system.so.1.46.1
	libboost_system.so.1.46.1 => /opt/scidb/13.2/local/lib/libboost_system.so.1.46.1 (0x00007fd6e5b71000)
[scidb@mas-dn01 1]$ ls -l /opt/scidb/13.2/local/lib/libboost_system.so.1.46.1
-rwxr-xr-x. 1 scidb scidb 17025 Apr  9 14:27 /opt/scidb/13.2/local/lib/libboost_system.so.1.46.1
[scidb@mas-dn01 1]$ 

Any ideas?

George


#2

I “hacked” a solution by physically copying the libboost files onto /usr/lib64 on all instance nodes, so I do have a working SciDB installation,
but I’d like to find out why the non-root solution doesn’t work. I am in an environment where sudo bash or sudo anything is seriously questioned,
and I had to fight hard to get root access, because the machine cluster is part of a bigger organization with heavy security.

Already SciDB developers have done a very good job for not insisting on a root install (except initializing postgres, creating userid for scidb, of course)
but I could not have done this hack without root access, so I would like some help still, please …

Cheers, George


#3

Hey George,

Sorry about the trouble. Let me tell you what happened. “Nobody ever tells you what happened; here’s what happened”:

SciDB needs boost to build source and boost .so libraries to use at runtime. Boost is nice. But there are some problems:
Problem 1: boost has many versions. Boost code and libraries are different from version to version without backwards compatibility.
Problem 2: SciDB is not the only program that needs boost and uses boost.

So we ran into some customers that already had Boost Version X on their machine and they were actively using it for some application A. In order to put SciDB on that machine, we needed access to Boost Version Y. But we couldn’t tell the customer to erase their Boost X and install Boost Y - because they want to have their A running smoothly. So what do we do?

Well, the boost people say explicitly, on their website, “in situations like this, consider shipping the version of boost you need with your application”. So that’s what we did. Now there’s a special package that’s called “scidb-boost” that installs from the scidb repository. For example, on CentOS these packages are:

scidb-boost-13.3-date-time.x86_64 : Runtime component of boost date-time library
scidb-boost-13.3-debuginfo.x86_64 : Debug information for package scidb-boost-13.3
scidb-boost-13.3-devel.x86_64 : The Boost C++ headers and shared development libraries
scidb-boost-13.3-doc.noarch : HTML documentation for the Boost C++ libraries
scidb-boost-13.3-filesystem.x86_64 : Runtime component of boost filesystem library
scidb-boost-13.3-graph.x86_64 : Runtime component of boost graph library
scidb-boost-13.3-iostreams.x86_64 : Runtime component of boost iostreams library
scidb-boost-13.3-math.x86_64 : Stub that used to contain boost math library
scidb-boost-13.3-program-options.x86_64 : Runtime component of boost program_options library
scidb-boost-13.3-python.x86_64 : Runtime component of boost python library
scidb-boost-13.3-random.x86_64 : Runtime component of boost random library
scidb-boost-13.3-regex.x86_64 : Runtime component of boost regular expression library
scidb-boost-13.3-serialization.x86_64 : Runtime component of boost serialization library
scidb-boost-13.3-signals.x86_64 : Runtime component of boost signals and slots library
scidb-boost-13.3-static.x86_64 : The Boost C++ static development libraries
scidb-boost-13.3-system.x86_64 : Runtime component of boost system support library
scidb-boost-13.3-test.x86_64 : Runtime component of boost test library
scidb-boost-13.3-thread.x86_64 : Runtime component of boost thread library
scidb-boost-13.3-wave.x86_64 : Runtime component of boost C99/C++ preprocessing library
scidb-boost-13.3.x86_64 : The free peer-reviewed portable C++ source libraries

And they are installed into /opt/scidb/13.3/include/boost and /opt/scidb/13.3/lib/boost. And so the new scidb requires that this package is installed first, before you can build and run SciDB. And installing it, of course, needs root access - just like installing a compiler or protobuf or any other such thing. That’s the problem you seem to have run into.

We also wrote the script deployment/deploy.sh to simplify all this stuff. Run

./deployment/deploy.sh help

For documentation.

Does this make more sense now? Does it help?


#4

Alex, yes it makes sense.

One thing though: deploy uses /etc/issues to try to establish the distro.
As I said previously, /etc/issues can change, and as in our site, it contains a WARNING MESSAGE mandated by institutional policy.

Thanks for the info !

George


#5

Roger that. We are working on fixing the /etc/issue vulnerability.