Errors when redimensioning large (~100 GB) flat array


#1

Hi all,

We’re having difficulty re-dimensioning a flat array of ~100 GB. We’re using the “SciDB_14.8_2 (ami-eef47286)” AWS AMI on a node with 32 cores and 60 GB of RAM. We reconfigured the default SciDB config.ini in the AMI to use 8 instances with per-instance /tmp/ directories using the configurator web interface for 14.8:

[lux]
server-0=localhost,7
install_root=/opt/scidb/14.8
metadata=/opt/scidb/14.8/share/scidb/meta.sql
pluginsdir=/opt/scidb/14.8/lib/scidb/plugins
logconf=/opt/scidb/14.8/share/scidb/log4cxx.properties
db_user=dualinstance
db_passwd=dualinstance
base-port=1239
base-path=/mnt/scidb/
redundancy=0
mem-array-threshold=1056
smgr-cache-size=1056
merge-sort-buffer=264
network-buffer=528
execution-threads=4
result-prefetch-threads=4
result-prefetch-queue-size=2
operator-threads=2

The /mnt directory has ~630 GB of space for the SciDB


scidb@ip-XX-XX-XX-XX:~$ df -h
Filesystem             Size  Used Avail Use% Mounted on
/dev/xvda1             197G   84G  104G  45% /
udev                    30G   12K   30G   1% /dev
tmpfs                  5.9G  196K  5.9G   1% /run
none                   5.0M     0  5.0M   0% /run/lock
none                    30G     0   30G   0% /run/shm
/dev/mapper/vg0-raid0  630G   70G  529G  12% /mnt

There are a number of failure modes we’ve observed depending on the exact SciDB configuration and if we do the load+re-dimension simultaneously, or as separate commands. Sometimes all SciDB instances drop to 0% CPU usage, but the loadcsv.py command doesn’t return and iquery is not responsive 12+ hours later. Most recently with the above configuration, we loaded in the full 100 GB .csv file to a flat array (PulseArrayflat), and then tried to re-dimension to PulseArray. We received the following error after ~3 hours of runtime:

SystemException in file: src/network/BaseConnection.h function: receive line: 310
Error id: scidb::SCIDB_SE_NETWORK::SCIDB_LE_CANT_SEND_RECEIVE
Error description: Network error. Cannot send or receive network messages.
Command exited with non-zero status 1
0.00user 0.02system 2:53:59elapsed 0%CPU (0avgtext+0avgdata 29408maxresident)k
9832inputs+0outputs (42major+2470minor)pagefaults 0swaps

The schema for the re-dimensioned PulseArray is shown here:


/usr/bin/time iquery -aq "store(redimension(apply(PulseArrayflat,d_luxstamp_samples,luxstamp_samples,d_dpc,int64(dpc),d_pulse_num,int64(pulse_num),d_pulse_classification,int64(pulse_classification)), <pulse_start_samples:int32 null, pulse_end_samples:int32 null, index_kept_sumpods:int32 null, hft_t0_samples:int32 null, hft_t10l_samples:int32 null, hft_t50l_samples:int32 null, hft_t1_samples:int32 null, hft_t50r_samples:int32 null, hft_t10r_samples:int32 null, hft_t2_samples:int32 null, pulse_std_phe_per_sample:float null, aft_t0_samples:int32 null, aft_t05_sample:int32 null, aft_tlx_samples:int32 null, aft_t25_samples:int32 null, aft_t1_samples:int32 null, aft_t75_samples:int32 null, aft_trx_samples:int32 null, aft_t95_samples:int32 null, aft_t2_samples:int32 null, pulse_area_phe:float null, skinny_pulse_area_phe:float null, pulse_height_phe_per_sample:float null, prompt_fraction:float null, prompt_fraction_tlx:float null, top_bottom_ratio:float null, top_bottom_asymmetry:float null, exp_fit_amplitude_phe_per_sample:float null, exp_fit_tau_fall_samples:float null, exp_fit_time_offset_samples:float null, exp_fit_tau_rise_samples:float null, exp_fit_chisq:float null, exp_fit_dof:float null, gaus_fit_amplitude_phe_per_sample:float null, gaus_fit_mu_samples:float null, gaus_fit_sigma_samples:float null, gaus_fit_chisq:float null, gaus_fit_dof:float null, s2filter_max_area_diff:float null, s2filter_max_s2_area:float null, s2filter_max_s1_area:float null, rms_width_samples:float null, pulse_length_samples:float null, s1s2_pairing:float null, z_drift_samples:float null, golden:int32 null, selected_s1_s2:int32 null, multiple:int32 null, cor_x_cm:float null, cor_y_cm:float null, x_cm_tmplt:float null, y_cm_tmplt:float null, xy_sigma_cm:float null, xy_chisq_p:float null, reconstructed:int32 null, x_cm:float null, y_cm:float null, rec_dof:float null, chi2:float null, s2_rec:float null, sd_radius_inf:float null, sd_radius_sup:float null, sd_phiXR:float null, x_corrected:float null, y_corrected:float null, x_tmplt_corrected:float null, y_tmplt_corrected:float null, xy_sigma_corrected:float null, z_corrected_pulse_area_all_phe:float null, xyz_corrected_pulse_area_all_phe:float null, z_corrected_pulse_area_bot_phe:float null, xyz_corrected_pulse_area_bot_phe:float null, correction_electron_lifetime:float null, correction_s1_z_dependence:float null, correction_s1_xyz_dependence:float null, correction_s2_xy_dependence:float null, correction_s1_z_dependence_bot:float null, correction_s1_xyz_dependence_bot:float null, correction_s2_xy_dependence_bot:float null, num_electrons:float null, num_electrons_bot:float null, num_photons:float null, vuv_corrected_pulse_area_all_phe:float null,vuv_corrected_pulse_area_bot_phe:float null, vuv_xyz_corrected_pulse_area_all_phe:float null, vuv_xyz_corrected_pulse_area_bot_phe:float null, pulse_spike_phe_top:float null, pulse_spike_phe_bot:float null, corrected_pulse_cspike_all_phe_s1:float null, corrected_pulse_cspike_bot_phe_s1:float null, xyz_corrected_pulse_cspike_all_phe_s1:float null, xyz_corrected_pulse_cspike_bot_phe_s1:float null>[d_luxstamp_samples=0:4611686018427387902,144000000000,0,d_dpc=0:1000000,1,0,d_pulse_num=1:10,11,0,d_pulse_classification=1:5,5,0]),PulseArray)" >/dev/null

The d_luxstamp_samples dimension is very sparse and the chunk size is tuned to yield ~1e6 entries / chunk. Could we benefit from a more sophisticated determination of the chunk size from the PulseArrayflat array? If so, what are the best utilities?

It seems that this issue is very similar to those described in this forum post: viewtopic.php?f=11&t=1474. I think we’ve avoided the shared /tmp/ directory issue by removing the relevant line from the config.ini file. Is the best course of action to split the initial flat array into several smaller arrays, re-dimension each of the smaller arrays, and then combine them back into a larger array? If this is the preferred strategy for large arrays, are there any existing utilities to perform this action? Also, does this mean that we’d be limited in future operations on the final large array - subsequent re-dimensions, etc.?

Thanks,
James Verbus


#2

NOTE: As a test I loaded the first 25% (20-25 GB) of the .csv file into a flat array and performed the re-dimension successfully. It seems we’re within a factor of a few in terms of array size for a working re-dimension.

scidb@ip-xx-xx-xx-xx:~/data$ /usr/bin/time iquery -aq "store(redimension(apply(PulseArray25pctflat,d_luxstamp_samples,luxstamp_samples,d_dpc,int64(dpc),d_pulse_num,int64(pulse_num),d_pulse_classification,int64(pulse_classification)), <pulse_start_samples:int32 null, pulse_end_samples:int32 null, index_kept_sumpods:int32 null, hft_t0_samples:int32 null, hft_t10l_samples:int32 null, hft_t50l_samples:int32 null, hft_t1_samples:int32 null, hft_t50r_samples:int32 null, hft_t10r_samples:int32 null, hft_t2_samples:int32 null, pulse_std_phe_per_sample:float null, aft_t0_samples:int32 null, aft_t05_sample:int32 null, aft_tlx_samples:int32 null, aft_t25_samples:int32 null, aft_t1_samples:int32 null, aft_t75_samples:int32 null, aft_trx_samples:int32 null, aft_t95_samples:int32 null, aft_t2_samples:int32 null, pulse_area_phe:float null, skinny_pulse_area_phe:float null, pulse_height_phe_per_sample:float null, prompt_fraction:float null, prompt_fraction_tlx:float null, top_bottom_ratio:float null, top_bottom_asymmetry:float null, exp_fit_amplitude_phe_per_sample:float null, exp_fit_tau_fall_samples:float null, exp_fit_time_offset_samples:float null, exp_fit_tau_rise_samples:float null, exp_fit_chisq:float null, exp_fit_dof:float null, gaus_fit_amplitude_phe_per_sample:float null, gaus_fit_mu_samples:float null, gaus_fit_sigma_samples:float null, gaus_fit_chisq:float null, gaus_fit_dof:float null, s2filter_max_area_diff:float null, s2filter_max_s2_area:float null, s2filter_max_s1_area:float null, rms_width_samples:float null, pulse_length_samples:float null, s1s2_pairing:float null, z_drift_samples:float null, golden:int32 null, selected_s1_s2:int32 null, multiple:int32 null, cor_x_cm:float null, cor_y_cm:float null, x_cm_tmplt:float null, y_cm_tmplt:float null, xy_sigma_cm:float null, xy_chisq_p:float null, reconstructed:int32 null, x_cm:float null, y_cm:float null, rec_dof:float null, chi2:float null, s2_rec:float null, sd_radius_inf:float null, sd_radius_sup:float null, sd_phiXR:float null, x_corrected:float null, y_corrected:float null, x_tmplt_corrected:float null, y_tmplt_corrected:float null, xy_sigma_corrected:float null, z_corrected_pulse_area_all_phe:float null, xyz_corrected_pulse_area_all_phe:float null, z_corrected_pulse_area_bot_phe:float null, xyz_corrected_pulse_area_bot_phe:float null, correction_electron_lifetime:float null, correction_s1_z_dependence:float null, correction_s1_xyz_dependence:float null, correction_s2_xy_dependence:float null, correction_s1_z_dependence_bot:float null, correction_s1_xyz_dependence_bot:float null, correction_s2_xy_dependence_bot:float null, num_electrons:float null, num_electrons_bot:float null, num_photons:float null, vuv_corrected_pulse_area_all_phe:float null,vuv_corrected_pulse_area_bot_phe:float null, vuv_xyz_corrected_pulse_area_all_phe:float null, vuv_xyz_corrected_pulse_area_bot_phe:float null, pulse_spike_phe_top:float null, pulse_spike_phe_bot:float null, corrected_pulse_cspike_all_phe_s1:float null, corrected_pulse_cspike_bot_phe_s1:float null, xyz_corrected_pulse_cspike_all_phe_s1:float null, xyz_corrected_pulse_cspike_bot_phe_s1:float null>[d_luxstamp_samples=0:4611686018427387902,144000000000,0,d_dpc=0:1000000,1,0,d_pulse_num=1:10,11,0,d_pulse_classification=1:5,5,0]),PulseArray25pct)" >/dev/null


5855.68user 17.23system 2:33:23elapsed 63%CPU (0avgtext+0avgdata 7645232maxresident)k
7968inputs+0outputs (38major+1415248minor)pagefaults 0swaps

#3

Hi James,

Yes, in 14.8 the redimension memory usage was excessive. The crashes you are seeing are most likely OOMs. There were cases of memory fragmentation caused by using large numbers of small allocations. Indeed one of the strategies is a piece-wise redimension. You could achieve that easily with between. Assuming you load into a 1D form:

insert(redimension(between(source, 0, 9999999), target), target)
insert(redimension(between(source, 10000000, 19999999), target), target)
… and so on

We made a lot of improvements in this area in 14.12 - the memory usage is much smoother and, in fact, the configurator gives better advice. I would advise trying that - and letting us know how it goes.

Looks like very sophisticated data by the way. Also curious what it is.
Sorry for the trouble!


#4

Hi Alex,

Thanks for the suggestion! It doesn’t look like there are any 14.12 AMIs on AWS; do you have any plans to create one? I can do the upgrade manually and save my own, but it is nice to have an “official” image to start from when doing these tests.

I’ll let you know how it goes with 14.12.

PS: Check your PMs regarding information about the data.


#5

Hi James,

Yes we’re working on a 14.12 blank AMI right now. ETA on that is before end of week.


#6

More thoughts, James:

In 14.12, we’ve built into the product a relatively sophisticated mechanism for figuring out your per-dimension chunk lengths. Check out the documentation of the USING clause in 14.12: paradigm4.com/HTMLmanual/14. … 03s01.html

  1. The USING clause will figure out for you what good chunking looks like. Note that it doesn’t build the actual array: the engine just tells you what it thinks are reasonable per-dimension chunk lengths. And you can specify some dimensions’ chunk lengths as part of the target array’s dimension specifications to nail down certain constants.

  2. Alex P. is quite right that reducing the size of each redimension(…) by clipping out regions of the target array and filtering the source array is a very good idea. I would experiment a bit: get a handle on how big a gulp of the fire-hose to take, check that your data doesn’t include any collisions, etc.

PLEASE report back. I would love to hear what you learn.


#7

The AMI is ready: SciDB_14.12_02_04_2015.


#8

Thanks for all of the suggestions and the new AMI. I’ll try it out and report back this week.


#9

I was able to get it working after implementing some of your suggestions. I’ll try to give some detailed notes - hopefully they’re helpful. I started to use the 14.12 AMI as suggested. Here is my current config.ini:

[lux]
server-0=localhost,7
install_root=/opt/scidb/14.12
pluginsdir=/opt/scidb/14.12/lib/scidb/plugins
logconf=/opt/scidb/14.12/share/scidb/log4cxx.properties
db_user=pguser
db_passwd=pguserpwd
base-port=1239
base-path=/mnt/scidb
redundancy=0

### Threading: max_concurrent_queries=2, threads_per_query=2
# max_concurrent_queries + 2:
execution-threads=4
# max_concurrent_queries * threads_per_query:
result-prefetch-threads=4
# threads_per_query:
result-prefetch-queue-size=2
operator-threads=2

### Memory: 6250MB per instance, 4688MB reserved
# network: 1875MB per instance assuming 5MB average chunks
# in units of chunks per query:
sg-send-queue-size=94
sg-receive-queue-size=94
# caches: 1875MB per instance
smgr-cache-size=938
mem-array-threshold=938
# sort: 938MB per instance (specified per thread)
merge-sort-buffer=234
# NOTE: Uncomment the following line to set a hard memory limit;
# NOTE: queries exceeding this cap will fail:
# max-memory-limit=6250

First, I tried a much smaller chunk size for the d_luxstamp_samples dimension as recommended by the “USING” command, while keeping all other dimensions the same as before. The “USING” command yields a much smaller recommended chunk size of 18181, compared to the original chunk size of 1440000000000.

AFL% show(PulseArray2);
{i} schema
{0} 'PulseArray2<pulse_start_samples:int32 NULL DEFAULT null,pulse_end_samples:int32 NULL DEFAULT null,index_kept_sumpods:int32 NULL DEFAULT null,hft_t0_samples:int32 NULL DEFAULT null,hft_t10l_samples:int32 NULL DEFAULT null,hft_t50l_samples:int32 NULL DEFAULT null,hft_t1_samples:int32 NULL DEFAULT null,hft_t50r_samples:int32 NULL DEFAULT null,hft_t10r_samples:int32 NULL DEFAULT null,hft_t2_samples:int32 NULL DEFAULT null,pulse_std_phe_per_sample:float NULL DEFAULT null,aft_t0_samples:int32 NULL DEFAULT null,aft_t05_sample:int32 NULL DEFAULT null,aft_tlx_samples:int32 NULL DEFAULT null,aft_t25_samples:int32 NULL DEFAULT null,aft_t1_samples:int32 NULL DEFAULT null,aft_t75_samples:int32 NULL DEFAULT null,aft_trx_samples:int32 NULL DEFAULT null,aft_t95_samples:int32 NULL DEFAULT null,aft_t2_samples:int32 NULL DEFAULT null,pulse_area_phe:float NULL DEFAULT null,skinny_pulse_area_phe:float NULL DEFAULT null,pulse_height_phe_per_sample:float NULL DEFAULT null,prompt_fraction:float NULL DEFAULT null,prompt_fraction_tlx:float NULL DEFAULT null,top_bottom_ratio:float NULL DEFAULT null,top_bottom_asymmetry:float NULL DEFAULT null,exp_fit_amplitude_phe_per_sample:float NULL DEFAULT null,exp_fit_tau_fall_samples:float NULL DEFAULT null,exp_fit_time_offset_samples:float NULL DEFAULT null,exp_fit_tau_rise_samples:float NULL DEFAULT null,exp_fit_chisq:float NULL DEFAULT null,exp_fit_dof:float NULL DEFAULT null,gaus_fit_amplitude_phe_per_sample:float NULL DEFAULT null,gaus_fit_mu_samples:float NULL DEFAULT null,gaus_fit_sigma_samples:float NULL DEFAULT null,gaus_fit_chisq:float NULL DEFAULT null,gaus_fit_dof:float NULL DEFAULT null,s2filter_max_area_diff:float NULL DEFAULT null,s2filter_max_s2_area:float NULL DEFAULT null,s2filter_max_s1_area:float NULL DEFAULT null,rms_width_samples:float NULL DEFAULT null,pulse_length_samples:float NULL DEFAULT null,s1s2_pairing:float NULL DEFAULT null,z_drift_samples:float NULL DEFAULT null,golden:int32 NULL DEFAULT null,selected_s1_s2:int32 NULL DEFAULT null,multiple:int32 NULL DEFAULT null,cor_x_cm:float NULL DEFAULT null,cor_y_cm:float NULL DEFAULT null,x_cm_tmplt:float NULL DEFAULT null,y_cm_tmplt:float NULL DEFAULT null,xy_sigma_cm:float NULL DEFAULT null,xy_chisq_p:float NULL DEFAULT null,reconstructed:int32 NULL DEFAULT null,x_cm:float NULL DEFAULT null,y_cm:float NULL DEFAULT null,rec_dof:float NULL DEFAULT null,chi2:float NULL DEFAULT null,s2_rec:float NULL DEFAULT null,sd_radius_inf:float NULL DEFAULT null,sd_radius_sup:float NULL DEFAULT null,sd_phiXR:float NULL DEFAULT null,x_corrected:float NULL DEFAULT null,y_corrected:float NULL DEFAULT null,x_tmplt_corrected:float NULL DEFAULT null,y_tmplt_corrected:float NULL DEFAULT null,xy_sigma_corrected:float NULL DEFAULT null,z_corrected_pulse_area_all_phe:float NULL DEFAULT null,xyz_corrected_pulse_area_all_phe:float NULL DEFAULT null,z_corrected_pulse_area_bot_phe:float NULL DEFAULT null,xyz_corrected_pulse_area_bot_phe:float NULL DEFAULT null,correction_electron_lifetime:float NULL DEFAULT null,correction_s1_z_dependence:float NULL DEFAULT null,correction_s1_xyz_dependence:float NULL DEFAULT null,correction_s2_xy_dependence:float NULL DEFAULT null,correction_s1_z_dependence_bot:float NULL DEFAULT null,correction_s1_xyz_dependence_bot:float NULL DEFAULT null,correction_s2_xy_dependence_bot:float NULL DEFAULT null,num_electrons:float NULL DEFAULT null,num_electrons_bot:float NULL DEFAULT null,num_photons:float NULL DEFAULT null,vuv_corrected_pulse_area_all_phe:float NULL DEFAULT null,vuv_corrected_pulse_area_bot_phe:float NULL DEFAULT null,vuv_xyz_corrected_pulse_area_all_phe:float NULL DEFAULT null,vuv_xyz_corrected_pulse_area_bot_phe:float NULL DEFAULT null,pulse_spike_phe_top:float NULL DEFAULT null,pulse_spike_phe_bot:float NULL DEFAULT null,corrected_pulse_cspike_all_phe_s1:float NULL DEFAULT null,corrected_pulse_cspike_bot_phe_s1:float NULL DEFAULT null,xyz_corrected_pulse_cspike_all_phe_s1:float NULL DEFAULT null,xyz_corrected_pulse_cspike_bot_phe_s1:float NULL DEFAULT null> [d_luxstamp_samples=0:4611686018427387902,18181,0,d_dpc=0:1000000,1,0,d_pulse_num=1:10,11,0,d_pulse_classification=1:5,5,0]'

When trying to re-dimension the full 100 GB PulseArray with the 18181 chunk size, the scidb.log file for the coordinator instance reports the issue still occurs after “PHASE 2” of the re-dimension. I think this indicates that the individual instances are crashing during the queries - maybe out of memory?

2015-02-09 00:55:05,338 [0x7fd87289c700] [DEBUG]: [SortArray] Found 8 runs to merge
2015-02-09 00:56:00,486 [0x7fd87279b700] [DEBUG]: [SortArray] Found 8 runs to merge
2015-02-09 00:57:30,206 [0x7fd872599700] [DEBUG]: [SortArray] Found 2 runs to merge
2015-02-09 00:59:08,609 [0x7fd87e9d6800] [DEBUG]: Disconnected
2015-02-09 00:59:13,697 [0x7fd87e9d6800] [ERROR]: Network error in handleSendMessage #32('Broken pipe'), instance 6 (127.0.0.1)
2015-02-09 00:59:13,697 [0x7fd87e9d6800] [DEBUG]: Recovering connection to instance 6
2015-02-09 00:59:13,698 [0x7fd87e9d6800] [DEBUG]: Connected to instance 6 (127.0.0.1), localhost:1245
2015-02-09 00:59:38,371 [0x7fd873529700] [DEBUG]: [SortArray] merge sorted chunks complete took 487681 ms, or 8 minutes 7 seconds 681 milliseconds
2015-02-09 00:59:38,371 [0x7fd873529700] [DEBUG]: [RedimensionArray] PHASE 2A: redimensioned sort pass 1 took 487687 ms, or 8 minutes 7 seconds 687 milliseconds
2015-02-09 00:59:38,371 [0x7fd873529700] [DEBUG]: [RedimensionArray] PHASE 2: complete took 0 ms, or 0 millisecond
2015-02-09 01:00:20,198 [0x7fd87e9d6800] [DEBUG]: Disconnected
2015-02-09 01:00:25,198 [0x7fd87e9d6800] [ERROR]: Network error in handleSendMessage #32('Broken pipe'), instance 5 (127.0.0.1)
2015-02-09 01:00:25,198 [0x7fd87e9d6800] [DEBUG]: Recovering connection to instance 5
2015-02-09 01:00:25,198 [0x7fd87e9d6800] [DEBUG]: Connected to instance 5 (127.0.0.1), localhost:1244
2015-02-09 01:01:04,807 [0x7fd87e9d6800] [DEBUG]: Disconnected
2015-02-09 01:01:10,199 [0x7fd87e9d6800] [ERROR]: Network error in handleSendMessage #32('Broken pipe'), instance 2 (127.0.0.1)
2015-02-09 01:01:10,199 [0x7fd87e9d6800] [DEBUG]: Recovering connection to instance 2
2015-02-09 01:01:10,199 [0x7fd87e9d6800] [DEBUG]: Connected to instance 2 (127.0.0.1), localhost:1241
2015-02-09 01:02:01,888 [0x7fd87e9d6800] [DEBUG]: Disconnected
2015-02-09 01:02:06,883 [0x7fd87e9d6800] [ERROR]: Network error in handleSendMessage #32('Broken pipe'), instance 7 (127.0.0.1)
2015-02-09 01:02:06,883 [0x7fd87e9d6800] [DEBUG]: Recovering connection to instance 7
2015-02-09 01:02:06,883 [0x7fd87e9d6800] [DEBUG]: Connected to instance 7 (127.0.0.1), localhost:1246
2015-02-09 01:03:58,303 [0x7fd87e9d6800] [DEBUG]: Disconnected
2015-02-09 01:04:02,053 [0x7fd87e9d6800] [ERROR]: Network error in handleSendMessage #104('Connection reset by peer'), instance 4 (127.0.0.1)
2015-02-09 01:04:02,053 [0x7fd87e9d6800] [DEBUG]: Recovering connection to instance 4
2015-02-09 01:04:02,053 [0x7fd87e9d6800] [DEBUG]: Connected to instance 4 (127.0.0.1), localhost:1243
2015-02-09 01:05:18,126 [0x7fd873529700] [WARN ]: RedimensionCommon::redimensionArray: Data collision is detected at cell position {12182160426870784, 215, 1, 3} for attribute ID = 92. Add log4j.logger.scidb.array.RedimensionCommon=TRACE to the log4cxx config file for more
2015-02-09 01:06:47,195 [0x7fd87e9d6800] [DEBUG]: Disconnected
2015-02-09 01:06:52,058 [0x7fd87e9d6800] [ERROR]: Network error in handleSendMessage #104('Connection reset by peer'), instance 3 (127.0.0.1)
2015-02-09 01:06:52,058 [0x7fd87e9d6800] [DEBUG]: Recovering connection to instance 3
2015-02-09 01:06:52,059 [0x7fd87e9d6800] [DEBUG]: Connected to instance 3 (127.0.0.1), localhost:1242
2015-02-09 01:12:38,093 [0x7fd87e9d6800] [INFO ]: Start SciDB instance (pid=21273). SciDB Version: 14.12.8739. Build Type: RelWithDebInfo. Copyright (C) 2008-2014 SciDB, Inc.
2015-02-09 01:12:38,096 [0x7fd87e9d6800] [INFO ]: Configuration:

I wrote the following quick-and-dirty script to re-dimension the 1-D PulseArrayflat by breaking it up into ~2 GB sections using “between”, re-dimensioning each section, and then inserting them into the final result array.

#!/usr/bin/python
"""
This script will re-dimension the large O(100 GB) PulseArrayflat by splitting
it up into O(2 GB) chunks and looping over the redimension and inserts.

2015-02-09 - JRV - Created
"""

import time

import numpy as np

from scidbpy import connect

iteration_size = 4000000
target_array = 'PulseArray'

apply_string = 'd_luxstamp_samples,luxstamp_samples,d_dpc,int64(dpc),d_pulse_num,int64(pulse_num),d_pulse_classification,int64(pulse_classification)'
dimensions_string = '[d_luxstamp_samples=0:4611686018427387902,1440000000000,0,d_dpc=0:1000000,1,0,d_pulse_num=1:10,11,0,d_pulse_classification=1:5,5,0]'
attributes_string = '<pulse_start_samples:int32 null, pulse_end_samples:int32 null, index_kept_sumpods:int32 null, hft_t0_samples:int32 null, hft_t10l_samples:int32 null, hft_t50l_samples:int32 null, hft_t1_samples:int32 null, hft_t50r_samples:int32 null, hft_t10r_samples:int32 null, hft_t2_samples:int32 null, pulse_std_phe_per_sample:float null, aft_t0_samples:int32 null, aft_t05_sample:int32 null, aft_tlx_samples:int32 null, aft_t25_samples:int32 null, aft_t1_samples:int32 null, aft_t75_samples:int32 null, aft_trx_samples:int32 null, aft_t95_samples:int32 null, aft_t2_samples:int32 null, pulse_area_phe:float null, skinny_pulse_area_phe:float null, pulse_height_phe_per_sample:float null, prompt_fraction:float null, prompt_fraction_tlx:float null, top_bottom_ratio:float null, top_bottom_asymmetry:float null, exp_fit_amplitude_phe_per_sample:float null, exp_fit_tau_fall_samples:float null, exp_fit_time_offset_samples:float null, exp_fit_tau_rise_samples:float null, exp_fit_chisq:float null, exp_fit_dof:float null, gaus_fit_amplitude_phe_per_sample:float null, gaus_fit_mu_samples:float null, gaus_fit_sigma_samples:float null, gaus_fit_chisq:float null, gaus_fit_dof:float null, s2filter_max_area_diff:float null, s2filter_max_s2_area:float null, s2filter_max_s1_area:float null, rms_width_samples:float null, pulse_length_samples:float null, s1s2_pairing:float null, z_drift_samples:float null, golden:int32 null, selected_s1_s2:int32 null, multiple:int32 null, cor_x_cm:float null, cor_y_cm:float null, x_cm_tmplt:float null, y_cm_tmplt:float null, xy_sigma_cm:float null, xy_chisq_p:float null, reconstructed:int32 null, x_cm:float null, y_cm:float null, rec_dof:float null, chi2:float null, s2_rec:float null, sd_radius_inf:float null, sd_radius_sup:float null, sd_phiXR:float null, x_corrected:float null, y_corrected:float null, x_tmplt_corrected:float null, y_tmplt_corrected:float null, xy_sigma_corrected:float null, z_corrected_pulse_area_all_phe:float null, xyz_corrected_pulse_area_all_phe:float null, z_corrected_pulse_area_bot_phe:float null, xyz_corrected_pulse_area_bot_phe:float null, correction_electron_lifetime:float null, correction_s1_z_dependence:float null, correction_s1_xyz_dependence:float null, correction_s2_xy_dependence:float null, correction_s1_z_dependence_bot:float null, correction_s1_xyz_dependence_bot:float null, correction_s2_xy_dependence_bot:float null, num_electrons:float null, num_electrons_bot:float null, num_photons:float null, vuv_corrected_pulse_area_all_phe:float null,vuv_corrected_pulse_area_bot_phe:float null, vuv_xyz_corrected_pulse_area_all_phe:float null, vuv_xyz_corrected_pulse_area_bot_phe:float null, pulse_spike_phe_top:float null, pulse_spike_phe_bot:float null, corrected_pulse_cspike_all_phe_s1:float null, corrected_pulse_cspike_bot_phe_s1:float null, xyz_corrected_pulse_cspike_all_phe_s1:float null, xyz_corrected_pulse_cspike_bot_phe_s1:float null>'

# Connect to SciDB instance

sdb = connect('http://localhost:8080')

# Wrap pulse array and get length

PAf = sdb.wrap_array('PulseArrayflat')
res = sdb.afl.aggregate(PAf, 'count(*)').toarray()
paf_length = int(res)

# Calculate number of iterations
iteration_range = range(0, paf_length, iteration_size)
iteration_range.append(paf_length)

# Do first store
start = time.time()
res = sdb.afl.store(
  sdb.afl.redimension(
    sdb.afl.apply(
      sdb.afl.between(PAf, iteration_range[0], iteration_range[1]),
    apply_string),
  attributes_string + dimensions_string),
target_array)
res.eval()
end = time.time()

TA = sdb.wrap_array(target_array)
res = sdb.afl.aggregate(TA, 'count(*)').toarray()
ta_length = int(res)

print 'Finished first store of %s in %2.1f seconds, current length %d entries' \
% (target_array, end-start, ta_length)

for idx in range(1 ,len(iteration_range) - 1):
    start = time.time()
    res = sdb.afl.insert(
      sdb.afl.redimension(
        sdb.afl.apply(
          sdb.afl.between(PAf, iteration_range[idx] + 1, iteration_range[idx + 1]),
        apply_string),
      attributes_string + dimensions_string),
    target_array)
    res.eval()
    end = time.time()

    TA = sdb.wrap_array(target_array)
    res = sdb.afl.aggregate(TA, 'count(*)').toarray()
    ta_length = int(res)

    print 'Finished insert to %s for range %d to %d in %2.1f seconds, current length %d entries' \
        % (target_array, iteration_range[idx] + 1, iteration_range[idx + 1], end-start, ta_length)
    sdb.reap()

sdb.reap()

This didn’t crash, but the runtime was prohibitively long; not one iteration was complete when I stopped it after 10-20 minutes. I reverted to the old chunk size of 1440000000000, which was a back-of-the-envelope estimate, and it sped things up considerably (3-6 minutes / iteration). This rate of progress stayed consistent for all 50 chunks, so it only took several hours for the full re-dimension. It looks like the USING command might have recommended a chunk size that is too low?

In any case, the full array is now loaded and re-dimensioned. I’ve successfully been able to perform some test queries. Thanks for all of the help!

On a related note, I’m currently trying to do an opaque save to avoid having to do this re-dimension every time a new AWS instance is spun up. It seems the opaque save is crashing on the PulseArray - any ideas why or suggestions?

scidb@ip-XX-XX-XX-XX:/mnt/data$ scidb_backup.py --save opaque DDBackup
Archiving array EventArray
Query was executed successfully
Archiving array EventArrayflat
Query was executed successfully
Archiving array FileArrayflat
Query was executed successfully
Archiving array PulseArray
Traceback (most recent call last):
  File "/opt/scidb/14.12/bin/scidb_backup.py", line 1790, in <module>
    optsFilePath
  File "/opt/scidb/14.12/bin/scidb_backup.py", line 1712, in saveArrays
    self._runSave(fmtArrayInfo,inputArrayExp,saveFmts,baseFolder)
  File "/opt/scidb/14.12/bin/scidb_backup.py", line 999, in _runSave
    True
  File "/opt/scidb/14.12/bin/scidb_backup.py", line 360, in waitForProcesses
    exits_and_outs = map(lambda proc: self.waitForProcess(proc,raiseOnError),procs)
  File "/opt/scidb/14.12/bin/scidb_backup.py", line 360, in <lambda>
    exits_and_outs = map(lambda proc: self.waitForProcess(proc,raiseOnError),procs)
  File "/opt/scidb/14.12/bin/scidb_backup.py", line 230, in wrapper
    raise Exception('\n'.join(msg_list))
Exception: Abnormal return encountered while running command:
iquery -c localhost -p 1239 -naq save(PulseArray,'/mnt/data/DDBackup/PulseArray',0, 'opaque')

#10

Can you manually run the command and post the output:

?


#11

Sure, here is the output:

scidb@ip-XX-XX-XX-XX:~$ iquery -c localhost -p 1239 -naq "save(PulseArray,'/mnt/data/DDBackup/PulseArray',0, 'opaque')"


SystemException in file: src/network/BaseConnection.h function: receive line: 321
Error id: scidb::SCIDB_SE_NETWORK::SCIDB_LE_CANT_SEND_RECEIVE
Error description: Network error. Cannot send or receive network messages.

#12

Thanks for the update and sorry its not quite as easy as you’d expect :frowning:

  1. There may be some issues with the chunk size selector script and I wonder if it gave you the right numbers. We can check by doing the following:
    iquery -aq "filter(list(‘arrays’), name=‘PulseArray’)"
    Locate the UAID of the array (second column)

iquery -aq “aggregate(filter(list(‘chunk map’), uaid = [UAID]), min(nelem), max(nelem), avg(nelem), count(*))”

We want the average to be about 100K - 1M, we don’t want the max to be above 5-10M. There’s a summarized per-array version of this query here: viewtopic.php?f=18&t=1091

  1. DO NOT do this
    iquery -aq “insert(…, target)” > /dev/null
    DO this:
    iquery -anq "insert(…, target)"
    That should make a huge difference as far as not outputting all that data, and that’s exactly what the -n option is for:
$ time iquery -aq "scan(TCGA_2014_06_14_GENE_STD)" > /dev/null

real	0m1.233s
user	0m1.028s
sys	0m0.028s

$ time iquery -anq "scan(TCGA_2014_06_14_GENE_STD)"
Query was executed successfully

real	0m0.064s
user	0m0.004s
sys	0m0.008s
  1. There’s an unfortunate issue in save that first copies all the data to the coordinator instance (and writes it to disk) and then writes it to disk again to the output file. So the coordinator’s /tmp partition must be big enough to accommodate. That’s a “must fix” item for the next release. To confirm this is, indeed, the issue, check the scidb log. Famous errno 28.

You can use the between pattern again to save data in pieces. For example, maybe along the d_pulse_num dimension:
iquery -anq "save(between(PulseArray, null, null, 1, null, null, null, 1, null), ‘/path/to/file_1.scidb’, 0, ‘opaque’)"
iquery -anq “save(between(PulseArray, null, null, 2, null, null, null, 2, null), ‘/path/to/file_2.scidb’, 0, ‘opaque’)”

You might know better than myself as far as which dimension to save along. Whole chunks are preferred.

You can also do a distributed save:
iquery -anq "save(PulseArray, ‘relative_file_name.scidb’, -1, ‘opaque’)"
This is the fastest method. It will create the file ‘relative_file_name.scidb’ inside the data directory on every instance. You can then load the data back with a symmetric load:
iquery -anq "load(PulseArray, ‘relative_file_name.scidb’, -1, ‘opaque’)"
Or, to append:
iquery -anq "insert(input(PulseArray, ‘relative_file_name.scidb’, -1, ‘opaque’), PulseArray)"
In fact, if you create N pieces using N instances, you can simultaneously reload the N pieces into any M instances as long as M >= N.

Make sense?


#13

Oh dear, oh dear.

From your post …

AFL% show(PulseArray2);
{i} schema
{0} 'PulseArray2<pulse_start_samples:int32 NULL DEFAULT null,... ,xyz_corrected_pulse_cspike_bot_phe_s1:float NULL DEFAULT null> [d_luxstamp_samples=0:4611686018427387902,18181,0,d_dpc=0:1000000,1,0,d_pulse_num=1:10,11,0,d_pulse_classification=1:5,5,0]'

Woah. That’s bizarre.

It thinks the data’s completely dense. The logical chunk size it’s come up with is ~1,000,000; ( 18,181 x 11 x 5 = 999,955 ). That’s the default when the data’s completely dense.

I will file a bug. That’s not what it should be doing.


#14

Thanks for the quick replies!

In response to Alex:

  1. It looks like the average is about a factor of 2x higher than ideal and the max is right on target using my 1440000000000 chunk size for the first dimension. I’ll reduce the chunk size a bit (maybe a factor of 2?) the next time the data is loaded.
scidb@ip-XXXX:~$ iquery -aq "aggregate(filter(list('chunk map'), uaid = 189), min(nelem), max(nelem), avg(nelem), count(*))"
{i} nelem_min,nelem_max,nelem_avg,count
{0} 21694,5029510,2.13385e+06,12276
  1. Understood regarding -n vs. >/dev/null.

  2. It seems in addition to the error I reported last time, sometimes instead the reported error when saving the opaque array is an out of memory error. I didn’t see any errno 28 in the coordinator log.

scidb@ip-XXXX:~$ iquery -c localhost -p 1239 -naq "save(PulseArray,'/mnt/data/DDBackup/PulseArray',0, 'opaque')"
SystemException in file: src/array/MemChunk.cpp function: reallocate line: 228
Error id: scidb::SCIDB_SE_NO_MEMORY::SCIDB_LE_CANT_REALLOCATE_MEMORY
Error description: Not enough memory. Failed to reallocate memory.

In any case, he parallel opaque save/load does seem to work (and it is fast!), so it looks like all is well. Thanks again.


#15

Ok understood.

Yeah I would reduce the chunk sizes next time there’s an opportunity to. I would do 4x less only because you have a lot of attributes. Under normal conditions we live up to the “column store” promise - attributes are stored separately and if a query doesn’t access a particular attribute, then that attribute isn’t looked at. But there are a few cases where we open up the “first row of chunks” for all attributes. The text save is definitely one of those cases. I am not sure how the opaque save handles it.

We will try to do some more investigations on our end.


#16

James, sorry some previous comments from us about the problem being fixed in 14.12 weren’t completely accurate. We’re investigating the issue.


#17

No problem. I think the temporary solution I have in place works for our current tests, but I am looking forward to the integrated solution for these redimensions after your bugfix.

By the way, it seems the “SciDB_14.12_02_04_2015” AMI has disappeared. Is there any way this can be reinstated - or do you recommend going back to using 14.8?


#18

Sorry - I had to fix something in the AMI and resave it. I saved it with a slightly diffferent name - SciDB_14.12_02_13_2015. However, even when I look for that, it does not show up. But searching by its id works: ami-3cace654. Give this a try.


#19

Yes, I can see it. Thanks!

I’m going to be using this AMI to give a live demonstration of SciDB on AWS to a group of 30-40 collaborators later this week. If there is a need to replace the image again, would it please be possible to ensure a 14.12 AMI is available at all times?


#20

Sure. This image will be there for you.