 # Dynamic Time Warping

#1

Hello,

I am having trouble conceptualising how to implement Dynamic Time Warping (DTW) as a query. (See http://en.wikipedia.org/wiki/Dynamic_time_warping for addtional information.) For example, using AQL, execute

``````create array cos <cos_x:float> [i=1:10,10,0];
select * into cos from build(cos,cos(i));
create array sin <sin_x:float> [j=1:10,10,0];
select * into sin from build(sin,sin(j));``````

Starting with the traditional example, the goal is to create the following matrix:

``create array dtw <dtw_x:float> [i=0:10,11,0, j=0:10,11,0];``

such that, in psuedo-code,

``````// Boundary conditions
dtw(0,0) = 0
dtw(i,0) = inf
dtw(0,j) = inf
// Algorithm
dtw(i,j) = abs(cos(i) - sin(j)) + min(dtw(i,j-1), dtw(i-1,j), dtw(i-1,j-1)
// Result
dtw(10,10)``````

Despite the tail recursive definition above, the process is iterative by starting with dtw(1,1) and working forward to dtw(10,10) given the above boundary conditions. For example, one could simulate this example in a spreadsheet.

What I am having difficulting seeing is how create this iterative process as a query, AQL or AFL:

``````diff = abs(cos(i) - sin(j))
+-----------+-----------------------------+
| (i-1,j-1) | (i,j-1)                     |
+-----------+-----------------------------+
| (i-1,j)   | diff + min of prevous cells |
+-----------+-----------------------------+``````

I have read the user guide a few times and worked with every operator and function at least once so please feel free to give me a cryptic pointer, and I will do the rest.

Thanks in advance for the help.

#2

For the record, I was not fully appreciating the utility of redimension. Section 7.2 of the 14.7 user manual entitled “Creating a sparse array” is worth careful consideration. Attached is a DTW.bash script that generates a crude and unoptimized query for a 10x10, cos versus sin, traditional Dynamic Time Warping example.

I had two problems: one with redimension under AQL and one with insert under AFL that prevented me from creating an all AQL or an all AFL example. See
[ul]
[li]http://www.scidb.org/forum/viewtopic.php?f=11&t=1422[/li]
[li]http://www.scidb.org/forum/viewtopic.php?f=11&t=1424[/li][/ul]

Here is the simple example DTW.bash just for fun:

``````#!/bin/bash
#
# Tested with 14.7.
#

# Setup:
# f(x) = cos(x) for 0 to 10.
# g(x) = sin(x) for 0 to 10.
# dst  = cross product of abs(f(x) - g(x)).
# dtw = Accumulates the Dynamic Time Warping boundary conditions,
#       calculation, and result.
cat > DTW.qy <<EOF
set no fetch;

set lang afl;

store
(
build(<f_x:double NULL DEFAULT null> [i=0:10, 11, 0], cos(i)),
f
);

store
(
build(<g_x:double NULL DEFAULT null> [j=0:10, 11, 0], sin(j)),
g
);

store
(
project
(
apply(cross_join(f,g), x, iif(i=0 and j=0, 0, iif(i=0 or j=0, inf, abs(f_x - g_x)))),
x
)
, dst
);

create array dtw <x:double NULL DEFAULT null> [i=0:*, 100000, 0,  j=0:*, 100000, 0];

EOF

# Boundary conditions:
# dtw(0,0) = 0
# dtw(i,0) = inf
# dtw(0,j) = inf
cat >> DTW.qy <<EOF
store
(
redimension
(
join
(
apply
(
build(<i:int64> [I=0:10, 11, 0], I),
j,
0
),
slice(dst, j, 0)
),
dtw
),
dtw
);

insert
(
redimension
(
join
(
apply
(
build(<i:int64> [J=0:10, 11, 0], 0),
j,
J
),
slice(dst, i, 0)
),
dtw
),
dtw
);

EOF

# loop and generate DTW.qy: 10x10 example
#
# This is the easy, and slow, approach.  The question is whether or
# not there is a method with only SciDB primitives to accomplish the
# same construction of DTW.
for j in 1 2 3 4 5 6 7 8 9 10; do
for i in 1 2 3 4 5 6 7 8 9 10; do
cat >> DTW.qy <<EOF
store
(
redimension
(
join
(
join
(
build(<i:int64>[I=0:0,1,0],\${i}),
build(<j:int64>[I=0:0,1,0],\${j})
),
cast
(
aggregate
(
filter(dtw, (i=\$((i-1)) and j=\$((j-1))) or (i=\${i} and j=\$((j-1))) or (i=\$((i-1)) and j=\${j})),
min(x)
),
<x:double NULL DEFAULT null> [I=0:0,1,0]
)
),
dst
),
min
);

set lang aql;
insert into dtw select (dst.x + min.x) from filter(dst, i=\${i} and j=\${j}), min;
set lang afl;

EOF
done
done

# dst and min no longer needed.
# Print result
cat >> DTW.qy <<EOF
remove(dst);
remove(min);

set fetch; filter(dtw, i=10 and j=10);

EOF

# Print instructions
cat <<EOF
# Run this command:
iquery -f DTW.qy > /dev/null

# Run the following to print the results:
set lang aql;
select * from dtw where i=10 and j=10;

# Run the following queries to clean up:
set lang aql;
drop array f;
drop array g;
drop array dtw;
EOF``````

It’s slow.

#3

You’ll forgive the tardiness of this reply, but I didn’t even know what Dynamic Time Warping was before google opened my eyes …

So …

1. Early as I’m looking at it http://en.wikipedia.org/wiki/Dynamic_time_warping%20on%20the%20Internet’s%20most%20authorative%20sources I’m struck by the observation that this is an O ( N ^ 2 ) operation. Which means it’s going to be slow. Very slow. And slower as the data gets larger. In fact, a lot slower as the data gets even slightly larger.

2. Now, there are faster methods like http://cs.fit.edu/~pkc/papers/tdm04.pdf%20FastDTW. These look promising, from a performance point of view. And the whole “regrid to summarize regional values and recursively drill in” suggests to me that this is something SciDB can do. The trick is wrapping each of the iterative “recursive” steps into a convergent framework.

3. Looking at it … I suspect, given that the complexity of the problem is the same as the complexity of basic linear algebra operations such as “multiply()”, that there might be a linear algebraic approach to the problem. But I’ve not been able to find one, or construct one. I suspect a linear algebraic approach to DTW might be one of those “open research questions” we’re not in a position to answer.

Anyway … if I was you I would look at implementing FastDTW, using SciDB to do the roll-up of the detailed data, to handle queries at (say) a 10 x 10 or 100 x 100 scale, and then recursively descend into each “interesting” region.

#4

Thanks for the feedback. Not to worry, I am aware of FastDTW along with other DTW methods, for example, R supports some standard ones.

I chose the traditional case because it can be still useful for small data sets in signal processing where the signals come from well understood physical systems, e.g., physiological systems, and therefore the small data sets can be choosen in advance by other knowledge and methods. Small is 10x10, 100x100, or a 1000x1000 data points. I also wanted to see how SciDB could handle this tail recursive case. It was a learning exercise.

For example, my current SciDB attempt that I posted here takes minutes compared to a fraction of a second in Haskell. Thus, I feel that I am not fully understanding something. I will look at this further some time in the near future.