Error:when i load array from csv file


#1

when i load array:
[scidb@scidb01 ~]$ loadcsv.py -a ‘PhotoObjAlltest’ -i 'test.csv’
Retrieving load array schema from SciDB and parsing it to determine load array chunk size.
Getting SciDB configuration information.

ERROR

int() argument must be a string or a number, not ‘NoneType’
##############################

Failure: Error Encountered.

append:
the csv file I use BCP export from sqlserver which i use tab as the delimeter.

Sorry to interrupt you !


#2

I am getting exactly same error in in similar situation - attempting to use parallel load with following options.

loadcsv.py -a “track” -t NNNNNNNNNNNNSNS -c 4000000000 -i ‘/home/data/track.csv’ -m -M

Leads to :

ERROR

int() argument must be a string or a number, not ‘NoneType’
##############################

Tab delimited or coma delimited file load gives the same error. The “csv2scidb”, however, produces expected result and allows load to proceed with the same file and

Can at least some one point out whether it is a user error? a bug? a feature?


#3

[quote=“mvolkov”]I am getting exactly same error in in similar situation - attempting to use parallel load with following options.

loadcsv.py -a “track” -t NNNNNNNNNNNNSNS -c 4000000000 -i ‘/home/data/track.csv’ -m -M

Leads to :

ERROR

int() argument must be a string or a number, not ‘NoneType’
##############################

Tab delimited or coma delimited file load gives the same error. The “csv2scidb”, however, produces expected result and allows load to proceed with the same file and

Can at least some one point out whether it is a user error? a bug? a feature?[/quote]

i donnot konw if it is a bug.But i solved it.


#4

Is it possible for you to post (some part of) your CSV file that led to the failure?


#5

Here is the CSV example from my load - file contains 1 million rows; homogeneous fields with no missing data or null(s):

12,0,0,1197,0,792,110,51,0,38,1446524293,1,2013-11-12 13:20:00,7978,2013-11-12 13:20:01
12,0,0,1176,0,764,26,61,0,32,1446524639,1,2013-11-12 13:20:00,111832,2013-11-12 13:20:01
12,0,0,1196,0,792,110,51,0,38,1446524297,1,2013-11-12 13:20:00,19628,2013-11-12 13:20:01
7,0,6.45,708,4539,724,97,61,39487,33,1446525869,1,2013-11-12 13:20:00,26259,2013-11-12 13:20:01
9,0,0,1220,6376,818,26,60,45497,39,1446518805,1,2013-11-12 13:20:00,127609,2013-11-12 13:20:01
9,0,0,1224,3849,76,79,61,47122,39,1446497938,1,2013-11-12 13:20:00,235278,2013-11-12 13:20:01
8,0,1.33,1177,3596,818,26,50,33482,32,1446525985,1,2013-11-12 13:20:00,228944,2013-11-12 13:20:01
12,0,0,1176,0,276,37,51,0,32,1446521181,1,2013-11-12 13:20:00,47510,2013-11-12 13:20:01
9,0,0,2892,3596,682,26,50,33482,39,1446509134,1,2013-11-12 13:20:00,323279,2013-11-12 13:20:01
12,0,0,1224,0,380,26,50,0,32,1446523861,1,2013-11-12 13:20:00,349244,2013-11-12 13:20:01

Additional information:
SciDB: 13.12
Distro: CentOS (fresh install for the purpose of SciDB evaluation)
Python: 2.6.6 (not altered stock RPM from CentOS)

config.ini:server-0=localhost,5
install_root=/opt/scidb/13.12
metadata=/opt/scidb/13.12/share/scidb/meta.sql
pluginsdir=/opt/scidb/13.12/lib/scidb/plugins
logconf=/opt/scidb/13.12/share/scidb/log4cxx.properties
base-path=/home/scidb/data
tmp-path=/tmp
base-port=1239
interface=eth0
network-buffer=1024
mem-array-threshold=1024
smgr-cache-size=1024
execution-threads=12
result-prefetch-queue-size=4
result-prefetch-threads=8
chunk-segment-size-in-mb=128
merge-sort-buffer = 2000
result-prefetch-threads = 4
mem-array-threshold = 7000
smgr-cache-size = 16384
max-memory-limit=28000

Array structure:
CREATE ARRAY tracking
<
action_id: uint8 default uint8(0),
detail: uint64 default uint64(0),
seconds: float default float(0.0),
product_id: uint64 default uint64(0),
offer_id: uint64 default uint64(0),
country_id:u int16 default uint16(0),
language_id: uint8 default uint8(0),
os_id: uint8 default uint8(0),
instance_id: uint64 default uint64(0),
client_version_id:uint8 default uint8(0),
session_id: uint64 default uint64(0),
pipe_id: uint64 default uint64(0),
apache_when: datetime default datetime(‘1970-1-1 00:00:00’),
apache_mcs: uint64 default uint64(0),
changed_when: datetime default datetime(‘1970-1-1 00:00:00’)

[i=0:*,4000000000,0];

Appreciate any lead.

Thank you for all your work.


#6

[quote=“mvolkov”]Here is the CSV example from my load - file contains 1 million rows; homogeneous fields with no missing data or null(s):

12,0,0,1197,0,792,110,51,0,38,1446524293,1,2013-11-12 13:20:00,7978,2013-11-12 13:20:01
12,0,0,1176,0,764,26,61,0,32,1446524639,1,2013-11-12 13:20:00,111832,2013-11-12 13:20:01
12,0,0,1196,0,792,110,51,0,38,1446524297,1,2013-11-12 13:20:00,19628,2013-11-12 13:20:01
7,0,6.45,708,4539,724,97,61,39487,33,1446525869,1,2013-11-12 13:20:00,26259,2013-11-12 13:20:01
9,0,0,1220,6376,818,26,60,45497,39,1446518805,1,2013-11-12 13:20:00,127609,2013-11-12 13:20:01
9,0,0,1224,3849,76,79,61,47122,39,1446497938,1,2013-11-12 13:20:00,235278,2013-11-12 13:20:01
8,0,1.33,1177,3596,818,26,50,33482,32,1446525985,1,2013-11-12 13:20:00,228944,2013-11-12 13:20:01
12,0,0,1176,0,276,37,51,0,32,1446521181,1,2013-11-12 13:20:00,47510,2013-11-12 13:20:01
9,0,0,2892,3596,682,26,50,33482,39,1446509134,1,2013-11-12 13:20:00,323279,2013-11-12 13:20:01
12,0,0,1224,0,380,26,50,0,32,1446523861,1,2013-11-12 13:20:00,349244,2013-11-12 13:20:01

Additional information:
SciDB: 13.12
Distro: CentOS (fresh install for the purpose of SciDB evaluation)
Python: 2.6.6 (not altered stock RPM from CentOS)

config.ini:server-0=localhost,5
install_root=/opt/scidb/13.12
metadata=/opt/scidb/13.12/share/scidb/meta.sql
pluginsdir=/opt/scidb/13.12/lib/scidb/plugins
logconf=/opt/scidb/13.12/share/scidb/log4cxx.properties
base-path=/home/scidb/data
tmp-path=/tmp
base-port=1239
interface=eth0
network-buffer=1024
mem-array-threshold=1024
smgr-cache-size=1024
execution-threads=12
result-prefetch-queue-size=4
result-prefetch-threads=8
chunk-segment-size-in-mb=128
merge-sort-buffer = 2000
result-prefetch-threads = 4
mem-array-threshold = 7000
smgr-cache-size = 16384
max-memory-limit=28000

Array structure:
CREATE ARRAY tracking
<
action_id: uint8 default uint8(0),
detail: uint64 default uint64(0),
seconds: float default float(0.0),
product_id: uint64 default uint64(0),
offer_id: uint64 default uint64(0),
country_id:u int16 default uint16(0),
language_id: uint8 default uint8(0),
os_id: uint8 default uint8(0),
instance_id: uint64 default uint64(0),
client_version_id:uint8 default uint8(0),
session_id: uint64 default uint64(0),
pipe_id: uint64 default uint64(0),
apache_when: datetime default datetime(‘1970-1-1 00:00:00’),
apache_mcs: uint64 default uint64(0),
changed_when: datetime default datetime(‘1970-1-1 00:00:00’)

[i=0:*,4000000000,0];

Appreciate any lead.

Thank you for all your work.[/quote]
you can try to edit loadcsv.py file(in /opt/scidb/[scidb version]/bin),that’s because in my case,i find that configCsv outputs a line “Query execution time:”.
i just modified the loadcsv.py file by modifing the function “getInstances” below:

for item in reader: if type(item["instance_id"])!=type(None): item["name"] = item["name"].replace("'", "") item["csv_fragment"] = "%s_%04d" % (outputBase, int(item["instance_id"])) item["dlf_fragment"] = "%s/%s" % (item["instance_path"].replace("'", ""), dlfFragmentName) instances.append(item)


#7

This fix workes! Thanks for sharing it!