This post takes a quick survey of the recent developments in our community. We aim to share our findings among SciDB users to promote collaboration.
I. Tutorial Blogs
Excited and grateful to see folks starting to write instructional blogs on SciDB.
Thinking in SciDB
Bringing people’s attention once again to this very informative website by Dr. Rares Vernica. A superb tutorial on data loading and hopefully many more topics to come. Complete with amazing pictures!
Scalable Earth Observation Analytics with R and SciDB
By Marius Appel and Dr. Edzer Pebesma. The authors have also significantly extended SciDB with the scidb4gdal and scidb4geo packages. So they have developed an entire approach for scaling up earth observation analytics using SciDB and R.
II. Docker Containers
This looks like a very popular thing to do and perhaps folks can start re-using each-other’s work. Here’s a sampling of repositories in no particular order:
- https://github.com/rvernica/docker-library (along with a few other open source tools)
- https://github.com/cerbo/scidb-python-client (a dockerized Python client)
- https://github.com/cerbo/scidb-iquery (a dockerized iquery)
III. UD* Extensions
Various User-Defined Functions, Aggregates, Data Types and Operators for SciDB. Quite useful - both directly and as a starting point for someone writing their own. Note that a lot of work exists targeting older versions of SciDB. Here’s a list of a few repositories that have been updated recently:
- Operators by Yiqun Zhang investigating high-performance linear algebra in SciDB, with some references to GPUs:
- Work by Alber Sánchez on running boost::geometry in SciDB: https://github.com/albhasan/geosdb
- Marius Appel’s work, see his blog link above: https://github.com/mappl/scidb4geo
- Work by Dr. Douglas Slotta:
- VCF genotypes as a SciDB User-Defined Type: https://github.com/slottad/scidb-genotypes
- Some extensions to summarize for stored arrays: https://github.com/slottad/summarize
IV. Plugins from P4
You can find many other plugins at the P4 Github Page. Many of these are exploratory prototypes that are candidates for future productization. So far, the plugins we use most often are as follows:
- dev_tools: simply, a plugin to easily install other plugins
- accelerated_io_tools: fast and error-tolerant text loading
- streaming: run various programs, such as R scripts, invoked on SciDB data in parallel
- equi_join: easily join large arrays by attributes and/or dimensions
- grouped_aggregate: easy aggregation grouped by attributes and/or dimensions
- limit: return the first K cells of an array, just like the SQL LIMIT clause
- summarize: very quick chunk density and size statistics
- superfunpack: a few miscellaneous UDFs: Fisher’s exact test, regular expressions,…
V. Additional work
Some folks have also built extra R packages, interfaces, connectors and so on:
- SciETL - extract, transform, load of Geospatial Data: https://github.com/e-sensing/scietl
- scidbst - an R package for additional geospatial functionality: https://github.com/flahn/scidbst
- SciDB_Manager - a GUI management toolbox: https://github.com/slottad/SciDB_Manager
VI. Whom did we miss?
Apologies to anyone we didn’t notice. Always excited to hear about folks’ work as it relates to SciDB. Please feel free to share your results here. Cheers!