Input parser expected literal error; issue loading arrays


Hello All,

I created a one dimensional array using:

AQL% CREATE ARRAY raw1new <X:double,Y:double,filename: string>[i];

and I then attempted to load my .scidb file into this array, and I received this error:

AQL% LOAD raw1new FROM '/home/scidb/Desktop/Final/output/graphs/NewSciDB/2.scidb';
UserException in file: src/query/ops/input/InputArray.cpp function: end line: 49
Error description: Import error. Import from file '/home/scidb/Desktop/Final/output/graphs/NewSciDB/2.scidb' (instance 0) to array 'raw1new' failed at line 603, column 1, offset 14773, value='2.dat': Input parser expected literal.
Failed query id: 1100853483146

It’s a 3 column x 600 row file, where the 3rd column consists of strings, with rows of the form “(#,#,1.dat)”.

It seems like a very simple sort of issue, but I can’t figure out how to get this file to upload properly. Any help would be greatly appreciated.



Anthony -

A couple of things:

  1. When you use an unadorned dimension name in the CREATE ARRAY ... statement, SciDB is obliged to make a number of assumptions about the dimension. The engine assumes that your dimension values start at zero and have an unbounded size. The engine also guesses at the chunking - 500,000 cells per chunk, or about 4 Meg per chunk.

$ iquery -aq "CREATE ARRAY foo < a1 : int64 > [I]" Query was executed successfully $ iquery -aq "show ( foo )" [("foo<a1:int64> [I=0:*,500000,0]")]

  1. I suspect that you have a malformed load file. What do lines 600 through to the end of the load file look like?





Could you post a few lines of the scidb load file.
I assume you have put the CSV file through csv2scidb first, and you are trying to load the output of csv2scidb.
Do the string attributes appear with quotes (single quotes)?



I am learning how to use scidb ,and i have the same problem as you do.

I created a one dimensional array

AQL% CREATE ARRAY A <…losts of attributes…>[ i=0:*,100000,0 ]

and i have put the CSV file through csv2scidb first, but the string attributes not appear with quotes ,why ?


What’s the precise command you’re using to generate the SciDB load format data from the .csv? Here’s an example of something very similar that just ran.

$ ./genTS 2 10 
AAC,1/2/2011 8:0:1,111.52,51
AAC,1/2/2011 8:0:2,110.52,82
AAC,1/2/2011 8:0:3,110.52,78
AAC,1/2/2011 8:0:4,106.52,34
AAC,1/2/2011 8:0:5,110.52,83
AAC,1/2/2011 8:0:6,109.52,79
AAC,1/2/2011 8:0:7,110.52,41
AAC,1/2/2011 8:0:8,112.52,67
AAC,1/2/2011 8:0:9,112.52,41
AAC,1/2/2011 8:0:10,110.52,44
AACC,1/2/2011 8:0:1,54.50,338
AACC,1/2/2011 8:0:2,53.50,389
AACC,1/2/2011 8:0:3,54.50,680
AACC,1/2/2011 8:0:4,52.50,613
AACC,1/2/2011 8:0:5,53.50,269
AACC,1/2/2011 8:0:6,52.50,618
AACC,1/2/2011 8:0:7,53.50,405
AACC,1/2/2011 8:0:8,51.50,449
AACC,1/2/2011 8:0:9,52.50,676
AACC,1/2/2011 8:0:10,52.50,440

Don’t worry too much about genTS. It’s an internal tool we use to generate fake timeseries data to test things. But the key point is that it’s format is a string, then a datetime (which we will load as a string), then a double, then an integer. So, to load that:

$ ./genTS 2 10 | csv2scidb -c 10 -p SSN
("AAC","1/2/2011 8:0:1",67.59,404),
("AAC","1/2/2011 8:0:2",65.59,328),
("AAC","1/2/2011 8:0:3",67.59,406),
("AAC","1/2/2011 8:0:4",66.59,438),
("AAC","1/2/2011 8:0:5",67.59,303),
("AAC","1/2/2011 8:0:6",67.59,350),
("AAC","1/2/2011 8:0:7",65.59,234),
("AAC","1/2/2011 8:0:8",65.59,188),
("AAC","1/2/2011 8:0:9",66.59,201),
("AAC","1/2/2011 8:0:10",67.59,295)
("AACC","1/2/2011 8:0:1",74.39,677),
("AACC","1/2/2011 8:0:2",71.39,379),
("AACC","1/2/2011 8:0:3",75.39,915),
("AACC","1/2/2011 8:0:4",76.39,670),
("AACC","1/2/2011 8:0:5",74.39,856),
("AACC","1/2/2011 8:0:6",74.39,625),
("AACC","1/2/2011 8:0:7",72.39,538),
("AACC","1/2/2011 8:0:8",74.39,376),
("AACC","1/2/2011 8:0:9",75.39,502),
("AACC","1/2/2011 8:0:10",74.39,945)

That’s plonked the quotation marks around the two strings, and it’s identified the two numerical values.

I suspect that there’s something malformed in the file.

Also: pro tip. You don’t have to create a copy of the original .csv file, and name it something like foo.scidb. Internally, when we’re loading data, we always use a named pipe, rather than a file name, because that avoids a disk bounce.

#  Generate a stream of appropriately formatted SciDB data, and pipe
# it into a named pipe. Then in the SciDB load() command, read 
# from the named pipe. 
rm -rf /tmp/load_pipe
mkfifo /tmp/load_pipe
./genTS 2000 2000000 | csv2scidb -c 100000 -p SSNN > /tmp/load_pipe &
time iquery -aq "load ( Signal_Timeseries_Raw, '/tmp/load_pipe' )" -r /dev/null


When using csv2scidb, or you can supply an optional format string. That format string looks like this “NssNNSS”, it contains one value per field and the meaning is this:
N: do not put the field in quotes, convert , to ,null,
S: put the field in quotes, convert , to ,"",
s: put the field in quotes, convert , to ,null,

So if you want the tool to put the quotes and nulls in - supply this string.
Another tip - when debugging it’s useful to run with these options:

 -M                    Create Intermediate DLF Files (not FIFOs)
 -L                    Leave Intermediate DLF Files

Then you can examine the load error from scidb, go to the offset in the file in the scidb data directory, figure out what went wrong.


thank you very much!