Do you know how does SciDB performs (benchmark and capacities) against any of these… ?
Cassandra, Airospike, Hyperdex, MonetDB (generic DB)
Droid, InfluxDB (time series)


We are not aware of such comparisons. There are some ongoing work comparing SciDB against some other systems though, such as Vertica.


I could try to run a benchmark myself, I’ve found two well-known benchmarks: YCSB and HammerDB.
Do you have “scripts” to add SciDB to those benchmarks?.


Regarding benchmarks …

It bears pointing out that you are what you measure. If you pick up a collection of data and generic SQL-like queries, what you’ll find is that SciDB does OK on some stuff (scalable OLAP style workloads), and not so well on other stuff (OLTP style data processing).

If you want benchmarks that show SciDB in a better light, I suggest you start with workloads and queries that are very hard to do with SQL … image processing, scalable linear algebra challenges like all-pairs correlation or singular value decomposition, time-series and signal analysis, whole graph analytics. We didn’t design SciDB as a replacement for SQL (though we have kept a flavor of SQL as a query language). We designed SciDB to address workloads characterized by queries that are very awkward to express in SQL, and we adopted a physical design that includes features SQL engines don’t consider interesting.

The point being that generic benchmarks will inform you about performance on features and functionality all systems share. But the world of data management is fragmenting. It’s no longer true that everyone’s building systems for business data processing, or parallel log analysis. One size doesn’t fit all.

So rather than picking a generic benchmark only to learn something about the suitability of technologies for a problem you don’t actually have, a more agile practice would be to build a prototype of the application(s) you have in mind (nothing fancy … build a mini-benchmark; data + 10 queries in proportion to your expected workload) and asses both ( a ) how hard is it to implement this with technologies X, Y and Z, and ( b ) once implemented, how well do technologies X, Y and Z perform. Doing this will teach you a lot about your problem, and about the relative strengths and weaknesses of the technologies you’re assessing.

Hope this helps!


OK, thank you for sharing your thoughts.
At the moment I’m not trying to implement anything, just trying to decide what DB learn next, preferably something powerful and with job posibilities.