SciDB connection from Python


#1

Hello.
I’m using SciDB 18.1.
To connect to Scidb from Python I use scidbpy.connect().

I have a question. When connection is no longer needed, how to disconnect and release resources?
Sometimes when the amount of connections is big, the following message is received and no new connection can be established for a while.:

File "/usr/lib64/python3.6/weakref.py", line 624, in _exitfunc
f()
File "/usr/lib64/python3.6/weakref.py", line 548, in call
return info.func(*info.args, **(info.kwargs or {}))
File "/usr/lib/python3.6/site-packages/scidbpy/db.py", line 56, in _shim_release_session
req.raise_for_status()
File "/usr/lib/python3.6/site-packages/requests/models.py", line 939, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Session not found for url: ttp://localhost:8080/release_session?id=tg7un5fy24esrs4uvacz38ywnls1qrzz

Thank you.


#2

Hi Igor,

Would you mind providing a bit more information about the issue you are experiencing?

The shim created by db = scidbpy.connect(..) is cleaned up by means of python weak_refer.finalize being called when the object is garbage collected.

How many active simultaneous shim sessions are you attempting to use?

Shim sessions can also timeout and be reclaimed based upon the timeout configuration setting.

The maximum number of sessions in shim is configurable by the max_sessions configuration setting.

It appears from the backtrace that the session with which your connection was established may have timed out and been destroyed prior to the garbage collection of the scidb.connect object.

Do you have a python code block that describes the issue you are encountering?


#3

Hello.
My programs loads data to SCiDB and redimension arrays using multi threading. For each input file 8 processes are created. Each process uses 1 SciDB connection. Number of files is quite big (around 400) and they are loaded one by one without any delay. The loading time for each file is about 5 seconds.
The settings for Shim:
timeout = 10
max_session_count = 5000


#4

Hi Igor,

Just as a thought - you may want to consider the AIO multiple-file input option.
Multiple writes to the same array will bottleneck due to the transaction lock. The operator aio_input can read data from multiple files (max one per instance) and is usually decent at parsing tab-delimited text. If it helps.


#5

Hello.
Thank you for a suggesting but currently I cannot change the logic to use aio_input. So I would like to know why the problem with connection happens and any possible ways to fix it without changing the logic with threads.


#6

Currently there is no issue with loading performance. The problem is that python program crashes with the call stack I provided before and it is impossible to connect to SciDB without restarting of shim service.


#7

So shim creates a fixed-size array of connections, by default the number is 50.
If 50 connections are created, then when 51st connection is requested, shim drops the least recently used connection (X) to serve the new one. Then if X tries to run a query again, it can get the error. That is the most likely root cause. There’s also a timeout and if all connections ran the last query less than timeout seconds ago, you get an error.

To drop connection explicitly, in python - delete the “db” object. When “db” is deleted, its destructor releases the session:

You can also change shim settings to increase the number of connections. Usually the config file for shim is in /var/lib/shim/conf. So you can do something like:

max_sessions=2048
timeout=1200

And that gives you a lot more room. Hope this helps.


#8

Thank you for a fast reply. Itried to reset shim parameters as you recommended. Now the error call stack looks like this:

Traceback (most recent call last):
File “/usr/lib/python3.6/site-packages/background_task/tasks.py”, line 43, in bg_runner
func(*args, **kwargs)
File “/home/igor/DJANGO/SIVP/BackgroundProcessing/views.py”, line 106, in FileMonitoring
SciDBInterface.UploadFileToSciDB(metadata_mgr.ARRAY_NAME, file_path)
File “/home/igor/DJANGO/SIVP/BackgroundProcessing/SciDBInterface.py”, line 313, in UploadFileToSciDB
connection_array[i] = scidbpy.connect()
File “/usr/lib/python3.6/site-packages/scidbpy/db.py”, line 164, in init
admin=admin_shim).text
File “/usr/lib/python3.6/site-packages/scidbpy/db.py”, line 522, in _shim
req.raise_for_status()
File “/usr/lib/python3.6/site-packages/requests/models.py”, line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 503 Server Error: Out of resources for url: http://localhost:8080/new_session?admin=0
Rescheduling task BackgroundProcessing.views.FileMonitoring for 0:00:21 later at 2019-05-17 04:12:08.066335+00:00

Memory on the server is fine.
I initialize the connection array for each file uploading as follows:

num_cores = multiprocessing.cpu_count()
connection_array = np.empty((num_cores), dtype = object)

for i in range(num_cores):
    connection_array[i] = scidbpy.connect()

num_cores is 8. So for each file uploading 8 new sessions are created as I understand.
Is there any things I can try?
Thank you very much


#9

And when I try to restart my python progam, I get the exception like this:

Traceback (most recent call last):
File “/usr/lib/python3.6/site-packages/urllib3/connectionpool.py”, line 600, in urlopen
chunked=chunked)
File “/usr/lib/python3.6/site-packages/urllib3/connectionpool.py”, line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File “/usr/lib64/python3.6/http/client.py”, line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/usr/lib64/python3.6/http/client.py”, line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/usr/lib64/python3.6/http/client.py”, line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/usr/lib64/python3.6/http/client.py”, line 1026, in _send_output
self.send(msg)
File “/usr/lib64/python3.6/http/client.py”, line 964, in send
self.connect()
File “/usr/lib/python3.6/site-packages/urllib3/connection.py”, line 181, in connect
conn = self._new_conn()
File “/usr/lib/python3.6/site-packages/urllib3/connection.py”, line 168, in _new_conn
self, “Failed to establish a new connection: %s” % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fb146940f28>: Failed to establish a new connection: [Errno 111] Connection refused

After shim service is restarted manually, the connection can be established.


#10

Just to clarify a few details:

  • Shim frees a session if the following conditions are all true:
    • A new session is requested
    • All the session slots are allocated
    • One of the allocated session slots reached its timeout
  • Shim returns HTTP 404 Session not found if you are trying to use a session that was freed (as per conditions above)
  • Shim returns HTTP 503 Out of resources if the following conditions are all true:
    • A new session is requested
    • All sessions are allocated
    • None of the allocated sessions be be freed

I’m not sure why you get Connetion refused but I would guess that you server is overloaded and the HTTP server embedded in Shim can’t handle any new HTTP connections. This request does not even get to Shim.

Based on the code you provided, opening num_core = 8 sessions should work well. Each time you call scidbpy.connect() a Shim session slot is used. You should not run out of sessions and get HTTP 503 Out of resources with just this code.

Restarting Shim will free all the sessions. max_sessions seems to be capped to 100. So, max_sessions=2048 will only start 100 sessions.


#11

Thank you very much. Your explanation helped me a lo to solve the problem.