Combining multiple depth-based descriptors for hand
gesture recognition
Fabio Dominio, Mauro Donadeo, Pietro Zanuttigh
Multimedia Technology and Telecommunications
Laboratory, Department
of Engineering, University of Padova, Italy
This page contains
additional material for the paper "Combining multiple depth-based
descriptors for hand gesture recognition " by Fabio Dominio,
Mauro Donadeo and Pietro Zanuttigh submitted to Pattern Recognition Letters.
In particular it contains additional data about the experimental result that
was not possible to include in the paper due to the limitations on the number
of figures and pages.
This webpage contains:
·
Sample images of the gestures included in the two
databases
·
The
recognition accuracies of the various descriptors on the
various gestures on our database
·
Some
examples of how the proposed approach can be applied to dynamic
gestures
The dataset
from Ren et Al. contain the 10 gestures in the
following image:
This instead
are the gestures in the dataset we acquired
An interesting
aspect is to understand how the different features are suited to recognize the
different gestures. In this page we present a detailed report of the accuracy
of the proposed approach. We here report
the accuracy (confusion matrices) for each different gesture and each different
descriptor. Each row correspond to an input gesture and each column to the
corresponding output of the system.
The database contains 10 gestures made by 10 people
each repeated 10 times, i.e. there are 100 execution for each gesture for a
total of 1000 data samples. Here is reported the most difficult case, i.e., the
generic training with users different from the ones using the system.
Distance
features alone (accuracy 92,5%)
G1 | G2 | G3 | G4 | G5 | G6 | G7 | G8 | G9 | G10 | |
G1 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G2 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G3 | 0 | 1 | 19 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G4 | 0 | 0 | 0 | 16 | 1 | 3 | 0 | 0 | 0 | 0 |
G5 | 0 | 0 | 0 | 0 | 20 | 0 | 0 | 0 | 0 | 0 |
G6 | 0 | 0 | 0 | 0 | 1 | 19 | 0 | 0 | 0 | 0 |
G7 | 0 | 0 | 0 | 1 | 0 | 0 | 17 | 0 | 2 | 0 |
G8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 | 0 |
G9 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 2 | 17 | 0 |
G10 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 17 |
Elevation features alone (accuracy 43,5%)
G1 | G2 | G3 | G4 | G5 | G6 | G7 | G8 | G9 | G10 | |
G1 | 13 | 0 | 0 | 0 | 2 | 0 | 3 | 0 | 2 | 0 |
G2 | 0 | 16 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G3 | 0 | 7 | 9 | 0 | 0 | 0 | 0 | 4 | 0 | 0 |
G4 | 0 | 2 | 0 | 8 | 3 | 3 | 0 | 4 | 0 | 0 |
G5 | 3 | 3 | 2 | 1 | 4 | 3 | 0 | 2 | 0 | 2 |
G6 | 3 | 0 | 0 | 0 | 6 | 0 | 0 | 3 | 0 | 8 |
G7 | 4 | 5 | 0 | 1 | 0 | 1 | 6 | 3 | 0 | 0 |
G8 | 0 | 2 | 3 | 0 | 0 | 6 | 0 | 8 | 0 | 1 |
G9 | 6 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 10 | 0 |
G10 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 5 | 13 |
Curvature features alone (accuracy 92%)
G1 | G2 | G3 | G4 | G5 | G6 | G7 | G8 | G9 | G10 | |
G1 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G2 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G3 | 0 | 0 | 16 | 0 | 0 | 0 | 2 | 2 | 0 | 0 |
G4 | 0 | 0 | 0 | 19 | 1 | 0 | 0 | 0 | 0 | 0 |
G5 | 0 | 0 | 0 | 0 | 17 | 3 | 0 | 0 | 0 | 0 |
G6 | 0 | 0 | 0 | 0 | 0 | 20 | 0 | 0 | 0 | 0 |
G7 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 | 0 | 0 |
G8 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 18 | 0 | 0 |
G9 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 0 |
G10 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 19 |
Area features alone (accuracy 60%)
G1 | G2 | G3 | G4 | G5 | G6 | G7 | G8 | G9 | G10 | |
G1 | 11 | 1 | 0 | 0 | 0 | 8 | 0 | 0 | 0 | 0 |
G2 | 5 | 8 | 1 | 0 | 0 | 2 | 3 | 0 | 1 | 0 |
G3 | 2 | 2 | 8 | 1 | 2 | 0 | 4 | 0 | 0 | 1 |
G4 | 1 | 0 | 0 | 14 | 3 | 2 | 0 | 0 | 0 | 0 |
G5 | 0 | 0 | 0 | 0 | 11 | 9 | 0 | 0 | 0 | 0 |
G6 | 0 | 0 | 0 | 0 | 0 | 20 | 0 | 0 | 0 | 0 |
G7 | 1 | 1 | 0 | 0 | 0 | 0 | 17 | 1 | 0 | 0 |
G8 | 8 | 6 | 0 | 0 | 1 | 0 | 2 | 2 | 1 | 0 |
G9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10 | 10 |
G10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 19 |
Distance and
curvature features together (accuracy
98,5%)
G1 | G2 | G3 | G4 | G5 | G6 | G7 | G8 | G9 | G10 | |
G1 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G2 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G3 | 0 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G4 | 0 | 0 | 0 | 18 | 2 | 0 | 0 | 0 | 0 | 0 |
G5 | 0 | 0 | 0 | 0 | 20 | 0 | 0 | 0 | 0 | 0 |
G6 | 0 | 0 | 0 | 0 | 1 | 19 | 0 | 0 | 0 | 0 |
G7 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 | 0 | 0 |
G8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 | 0 |
G9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 |
G10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 |
Distance,
curvature and area features together (accuracy 99%)
G1 | G2 | G3 | G4 | G5 | G6 | G7 | G8 | G9 | G10 | |
G1 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G2 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G3 | 0 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G4 | 0 | 0 | 0 | 19 | 1 | 0 | 0 | 0 | 0 | 0 |
G5 | 0 | 0 | 0 | 0 | 19 | 1 | 0 | 0 | 0 | 0 |
G6 | 0 | 0 | 0 | 0 | 0 | 20 | 0 | 0 | 0 | 0 |
G7 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 | 0 | 0 |
G8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 | 0 |
G9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 |
G10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 |
All the 4 features
(accuracy 99%)
G1 | G2 | G3 | G4 | G5 | G6 | G7 | G8 | G9 | G10 | |
G1 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G2 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G3 | 0 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
G4 | 0 | 0 | 0 | 19 | 1 | 0 | 0 | 0 | 0 | 0 |
G5 | 0 | 0 | 0 | 0 | 19 | 1 | 0 | 0 | 0 | 0 |
G6 | 0 | 0 | 0 | 0 | 0 | 20 | 0 | 0 | 0 | 0 |
G7 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 | 0 | 0 |
G8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 | 0 |
G9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 |
G10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 |
The database contains 12 gestures made by 14 people
each repeated 10 times, i.e. there are 140 execution for each gesture for a
total of 1680 data samples. Here is reported the most difficult case, i.e., the
generic training with users different from the ones using the system.
Distance
features alone (accuracy 70,4%)
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
1 |
20 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
1 |
19 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
3 |
0 |
0 |
17 |
0 |
1 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
4 |
0 |
0 |
0 |
16 |
0 |
0 |
2 |
1 |
1 |
0 |
0 |
0 |
5 |
0 |
0 |
12 |
1 |
6 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
6 |
0 |
0 |
0 |
0 |
0 |
3 |
0 |
0 |
17 |
0 |
0 |
0 |
7 |
0 |
0 |
0 |
0 |
5 |
0 |
11 |
4 |
0 |
0 |
0 |
0 |
8 |
0 |
0 |
0 |
4 |
0 |
0 |
3 |
13 |
0 |
0 |
0 |
0 |
9 |
1 |
3 |
1 |
4 |
1 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
18 |
0 |
0 |
11 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
18 |
0 |
12 |
0 |
0 |
2 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
18 |
Elevation features alone (accuracy 47,5%)
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
1 |
16 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
0 |
0 |
2 |
2 |
0 |
8 |
0 |
2 |
0 |
0 |
0 |
0 |
0 |
0 |
10 |
0 |
3 |
0 |
0 |
5 |
1 |
2 |
0 |
5 |
6 |
0 |
1 |
0 |
0 |
4 |
0 |
2 |
0 |
12 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
5 |
5 |
1 |
0 |
1 |
0 |
1 |
0 |
8 |
1 |
0 |
5 |
2 |
1 |
6 |
6 |
0 |
0 |
1 |
0 |
11 |
0 |
0 |
0 |
1 |
1 |
0 |
7 |
0 |
1 |
1 |
2 |
2 |
0 |
10 |
3 |
0 |
1 |
0 |
0 |
8 |
0 |
1 |
0 |
0 |
1 |
0 |
8 |
5 |
0 |
0 |
1 |
4 |
9 |
0 |
0 |
0 |
0 |
2 |
0 |
1 |
3 |
13 |
1 |
0 |
0 |
10 |
2 |
0 |
0 |
2 |
0 |
6 |
0 |
0 |
0 |
8 |
2 |
0 |
11 |
0 |
2 |
0 |
2 |
0 |
0 |
0 |
5 |
0 |
0 |
8 |
3 |
12 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
2 |
17 |
Curvature features alone (accuracy 88,3%)
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
1 |
20 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
1 |
19 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
3 |
0 |
0 |
13 |
0 |
0 |
3 |
0 |
0 |
4 |
0 |
0 |
0 |
4 |
0 |
0 |
0 |
12 |
0 |
0 |
0 |
8 |
0 |
0 |
0 |
0 |
5 |
0 |
0 |
0 |
0 |
20 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
6 |
0 |
0 |
0 |
0 |
0 |
19 |
0 |
0 |
1 |
0 |
0 |
0 |
7 |
0 |
0 |
0 |
0 |
0 |
0 |
20 |
0 |
0 |
0 |
0 |
0 |
8 |
0 |
0 |
0 |
3 |
0 |
0 |
0 |
17 |
0 |
0 |
0 |
0 |
9 |
0 |
0 |
3 |
0 |
0 |
1 |
0 |
0 |
16 |
0 |
0 |
0 |
10 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
1 |
18 |
0 |
0 |
11 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
19 |
0 |
12 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
19 |
Area features
alone (accuracy 54,2%)
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
1 |
12 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
4 |
0 |
0 |
3 |
2 |
2 |
8 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
10 |
3 |
8 |
2 |
5 |
0 |
2 |
0 |
0 |
1 |
2 |
0 |
0 |
0 |
4 |
0 |
3 |
1 |
15 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
5 |
0 |
0 |
0 |
0 |
18 |
1 |
0 |
0 |
0 |
1 |
0 |
0 |
6 |
0 |
0 |
0 |
1 |
0 |
10 |
0 |
0 |
1 |
6 |
0 |
2 |
7 |
0 |
0 |
0 |
0 |
3 |
0 |
6 |
10 |
1 |
0 |
0 |
0 |
8 |
0 |
0 |
0 |
0 |
1 |
0 |
4 |
12 |
1 |
0 |
0 |
2 |
9 |
2 |
1 |
8 |
0 |
1 |
1 |
1 |
2 |
3 |
1 |
0 |
0 |
10 |
0 |
1 |
3 |
0 |
1 |
0 |
1 |
0 |
0 |
13 |
0 |
1 |
11 |
0 |
2 |
0 |
0 |
0 |
1 |
0 |
0 |
4 |
2 |
11 |
0 |
12 |
1 |
2 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
17 |
Distance and
curvature features together (accuracy 89,6%)
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
1 |
20 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
3 |
17 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
3 |
0 |
0 |
20 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
4 |
0 |
0 |
0 |
14 |
0 |
0 |
0 |
6 |
0 |
0 |
0 |
0 |
5 |
0 |
0 |
0 |
0 |
20 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
6 |
0 |
0 |
0 |
0 |
0 |
15 |
0 |
0 |
5 |
0 |
0 |
0 |
7 |
0 |
0 |
0 |
1 |
0 |
0 |
14 |
5 |
0 |
0 |
0 |
0 |
8 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
20 |
0 |
0 |
0 |
0 |
9 |
0 |
0 |
2 |
0 |
1 |
0 |
0 |
0 |
17 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
19 |
0 |
0 |
11 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
19 |
0 |
12 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
20 |
Distance,
curvature and area features together (accuracy 92,9%)
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
1 |
20 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
3 |
17 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
3 |
0 |
0 |
16 |
0 |
3 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
4 |
0 |
0 |
0 |
19 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
5 |
0 |
3 |
0 |
0 |
17 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
6 |
0 |
0 |
0 |
0 |
0 |
19 |
0 |
0 |
1 |
0 |
0 |
0 |
7 |
0 |
0 |
0 |
0 |
0 |
0 |
20 |
0 |
0 |
0 |
0 |
0 |
8 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
19 |
0 |
0 |
0 |
0 |
9 |
0 |
0 |
3 |
0 |
0 |
0 |
0 |
0 |
17 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
19 |
0 |
0 |
11 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
20 |
0 |
12 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
20 |
All the 4 features
(accuracy 93,8%)
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
1 |
20 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
0 |
20 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
3 |
0 |
0 |
18 |
0 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
4 |
0 |
0 |
0 |
20 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
5 |
0 |
0 |
0 |
0 |
20 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
6 |
0 |
0 |
0 |
0 |
0 |
20 |
0 |
0 |
0 |
0 |
0 |
0 |
7 |
0 |
0 |
0 |
1 |
0 |
0 |
16 |
3 |
0 |
0 |
0 |
0 |
8 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
20 |
0 |
0 |
0 |
0 |
9 |
0 |
0 |
3 |
0 |
0 |
0 |
0 |
0 |
17 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
18 |
0 |
0 |
11 |
0 |
0 |
0 |
2 |
0 |
0 |
2 |
0 |
0 |
0 |
16 |
0 |
12 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
20 |
The proposed
approach can also be used to follow the trajectory and orientation of the hand
during the execution of dynamic gestures. The figure shows the trajectory of
the hand during 3 different repetitions of the gesture in the first example.
Notice how the proposed approach is able to reliably identify the hand and to
compute the three-dimensional trajectory associated to the gesture. The
following videos instead show the output of the hand detection phase of the
algorithm on two sample dynamic gestrues. Note that
the videos show only the first detection step (i.e. simple circle fitting and
PCA, the main axis is reversed w.r.t. to the notation used in the paper).
[1] Ren, Z., Yuan, J., Zhang, Z., "Robust hand gesture recognition based
on nger-earth mover's distance with a commodity depth camera", in: Proc. of the 19th ACM international conference on Multimedia, ACM, New York, NY, USA. 2011, pp. 1093-1096.