Fast queries of the data through the web using SciDB, a parallelized database for high performance computing, make this process operate quickly. By using scripting containers, such as IPython or Jupyter, to analyze the data, scientists can utilize a wide variety of freely available graphing, statistics, and information management resources [… namely SciPy, NumPy, Pandas etc.].
SciDB is a multidimensional array database that can manage massive datasets (Terabytes, Petabytes) over a hardware cluster. The analytical Python stack you mentioned above does not do this. What the above mentioned paper did (and what many of our customers do) is to use SciDB to store the data, run certain computations in the database, and then select and download smaller chunks of data for processing with the Python analytical stack. Now SciDB is very fast on the ‘select’ and ‘in-database operations’. Yet, not all the operations of the Python analytical stack are available within SciDB. Hence this way of separating storage, in-database computations, and out-of-database computations makes a lot of sense for many people who use SciDB.