Combining multiple depth-based descriptors for hand gesture recognition

Fabio Dominio, Mauro Donadeo, Pietro Zanuttigh

Multimedia Technology and Telecommunications Laboratory, Department of Engineering, University of Padova, Italy

 

This page contains additional material for the paper "Combining multiple depth-based descriptors for hand gesture recognition " by Fabio Dominio, Mauro Donadeo and Pietro Zanuttigh  submitted to Pattern Recognition Letters. In particular it contains additional data about the experimental result that was not possible to include in the paper due to the limitations on the number of figures and pages.

This webpage contains:

 

·         Sample images of the gestures included in the two databases

·         The recognition accuracies of the various descriptors on the various gestures on our database

·         Some examples of how the proposed approach can be applied to dynamic gestures

 

Datasets

The dataset from Ren et Al. contain the 10 gestures in the following image:

This instead are the gestures in the dataset we acquired

 

 

 Results

An interesting aspect is to understand how the different features are suited to recognize the different gestures. In this page we present a detailed report of the accuracy of the proposed approach. We here report the accuracy (confusion matrices) for each different gesture and each different descriptor. Each row correspond to an input gesture and each column to the corresponding output of the system.

Results on the Database of Ren et Al [1]

The database contains 10 gestures made by 10 people each repeated 10 times, i.e. there are 100 execution for each gesture for a total of 1000 data samples. Here is reported the most difficult case, i.e., the generic training with users different from the ones using the system.

 

Single descriptors

Distance features alone (accuracy 92,5%)

  G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
G1 20 0 0 0 0 0 0 0 0 0
G2 0 20 0 0 0 0 0 0 0 0
G3 0 1 19 0 0 0 0 0 0 0
G4 0 0 0 16 1 3 0 0 0 0
G5 0 0 0 0 20 0 0 0 0 0
G6 0 0 0 0 1 19 0 0 0 0
G7 0 0 0 1 0 0 17 0 2 0
G8 0 0 0 0 0 0 0 20 0 0
G9 0 0 0 1 0 0 0 2 17 0
G10 0 0 0 1 2 0 0 0 0 17

 

 

Elevation features alone (accuracy 43,5%)

  G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
G1 13 0 0 0 2 0 3 0 2 0
G2 0 16 4 0 0 0 0 0 0 0
G3 0 7 9 0 0 0 0 4 0 0
G4 0 2 0 8 3 3 0 4 0 0
G5 3 3 2 1 4 3 0 2 0 2
G6 3 0 0 0 6 0 0 3 0 8
G7 4 5 0 1 0 1 6 3 0 0
G8 0 2 3 0 0 6 0 8 0 1
G9 6 0 0 0 0 2 2 0 10 0
G10 0 0 2 0 0 0 0 0 5 13

 

 

 

Curvature features alone (accuracy 92%)

  G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
G1 20 0 0 0 0 0 0 0 0 0
G2 0 20 0 0 0 0 0 0 0 0
G3 0 0 16 0 0 0 2 2 0 0
G4 0 0 0 19 1 0 0 0 0 0
G5 0 0 0 0 17 3 0 0 0 0
G6 0 0 0 0 0 20 0 0 0 0
G7 0 0 0 0 0 0 20 0 0 0
G8 0 0 2 0 0 0 0 18 0 0
G9 0 5 0 0 0 0 0 0 15 0
G10 0 0 1 0 0 0 0 0 0 19

 

Area features alone (accuracy 60%)

  G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
G1 11 1 0 0 0 8 0 0 0 0
G2 5 8 1 0 0 2 3 0 1 0
G3 2 2 8 1 2 0 4 0 0 1
G4 1 0 0 14 3 2 0 0 0 0
G5 0 0 0 0 11 9 0 0 0 0
G6 0 0 0 0 0 20 0 0 0 0
G7 1 1 0 0 0 0 17 1 0 0
G8 8 6 0 0 1 0 2 2 1 0
G9 0 0 0 0 0 0 0 0 10 10
G10 0 0 0 0 0 0 0 1 0 19

 

 

Combination of multiple descriptors 

 

Distance and curvature features together (accuracy 98,5%)

  G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
G1 20 0 0 0 0 0 0 0 0 0
G2 0 20 0 0 0 0 0 0 0 0
G3 0 0 20 0 0 0 0 0 0 0
G4 0 0 0 18 2 0 0 0 0 0
G5 0 0 0 0 20 0 0 0 0 0
G6 0 0 0 0 1 19 0 0 0 0
G7 0 0 0 0 0 0 20 0 0 0
G8 0 0 0 0 0 0 0 20 0 0
G9 0 0 0 0 0 0 0 0 20 0
G10 0 0 0 0 0 0 0 0 0 20

 

 

Distance, curvature and area features together (accuracy 99%)

  G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
G1 20 0 0 0 0 0 0 0 0 0
G2 0 20 0 0 0 0 0 0 0 0
G3 0 0 20 0 0 0 0 0 0 0
G4 0 0 0 19 1 0 0 0 0 0
G5 0 0 0 0 19 1 0 0 0 0
G6 0 0 0 0 0 20 0 0 0 0
G7 0 0 0 0 0 0 20 0 0 0
G8 0 0 0 0 0 0 0 20 0 0
G9 0 0 0 0 0 0 0 0 20 0
G10 0 0 0 0 0 0 0 0 0 20

 

 

All the 4 features (accuracy 99%)

  G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
G1 20 0 0 0 0 0 0 0 0 0
G2 0 20 0 0 0 0 0 0 0 0
G3 0 0 20 0 0 0 0 0 0 0
G4 0 0 0 19 1 0 0 0 0 0
G5 0 0 0 0 19 1 0 0 0 0
G6 0 0 0 0 0 20 0 0 0 0
G7 0 0 0 0 0 0 20 0 0 0
G8 0 0 0 0 0 0 0 20 0 0
G9 0 0 0 0 0 0 0 0 20 0
G10 0 0 0 0 0 0 0 0 0 20

 

Results on our database

The database contains 12 gestures made by 14 people each repeated 10 times, i.e. there are 140 execution for each gesture for a total of 1680 data samples. Here is reported the most difficult case, i.e., the generic training with users different from the ones using the system.

 

Single descriptors

Distance features alone (accuracy 70,4%)

 

1

2

3

4

5

6

7

8

9

10

11

12

1

20

0

0

0

0

0

0

0

0

0

0

0

2

1

19

0

0

0

0

0

0

0

0

0

0

3

0

0

17

0

1

0

0

0

2

0

0

0

4

0

0

0

16

0

0

2

1

1

0

0

0

5

0

0

12

1

6

0

0

0

1

0

0

0

6

0

0

0

0

0

3

0

0

17

0

0

0

7

0

0

0

0

5

0

11

4

0

0

0

0

8

0

0

0

4

0

0

3

13

0

0

0

0

9

1

3

1

4

1

0

0

0

10

0

0

0

10

0

0

0

0

0

2

0

0

0

18

0

0

11

0

0

0

1

0

1

0

0

0

0

18

0

12

0

0

2

0

0

0

0

0

0

0

0

18

 

 

Elevation features alone (accuracy 47,5%)

 

1

2

3

4

5

6

7

8

9

10

11

12

1

16

0

0

0

0

2

0

0

0

0

0

2

2

0

8

0

2

0

0

0

0

0

0

10

0

3

0

0

5

1

2

0

5

6

0

1

0

0

4

0

2

0

12

0

0

0

1

0

0

0

5

5

1

0

1

0

1

0

8

1

0

5

2

1

6

6

0

0

1

0

11

0

0

0

1

1

0

7

0

1

1

2

2

0

10

3

0

1

0

0

8

0

1

0

0

1

0

8

5

0

0

1

4

9

0

0

0

0

2

0

1

3

13

1

0

0

10

2

0

0

2

0

6

0

0

0

8

2

0

11

0

2

0

2

0

0

0

5

0

0

8

3

12

0

0

0

0

0

1

0

0

0

0

2

17

 

 

Curvature features alone (accuracy 88,3%)

 

1

2

3

4

5

6

7

8

9

10

11

12

1

20

0

0

0

0

0

0

0

0

0

0

0

2

1

19

0

0

0

0

0

0

0

0

0

0

3

0

0

13

0

0

3

0

0

4

0

0

0

4

0

0

0

12

0

0

0

8

0

0

0

0

5

0

0

0

0

20

0

0

0

0

0

0

0

6

0

0

0

0

0

19

0

0

1

0

0

0

7

0

0

0

0

0

0

20

0

0

0

0

0

8

0

0

0

3

0

0

0

17

0

0

0

0

9

0

0

3

0

0

1

0

0

16

0

0

0

10

0

0

1

0

0

0

0

0

1

18

0

0

11

0

0

0

0

0

0

0

0

0

1

19

0

12

0

0

0

1

0

0

0

0

0

0

0

19

 

Area features alone (accuracy 54,2%)

 

1

2

3

4

5

6

7

8

9

10

11

12

1

12

1

0

0

0

0

0

0

4

0

0

3

2

2

8

0

0

0

0

0

0

0

0

0

10

3

8

2

5

0

2

0

0

1

2

0

0

0

4

0

3

1

15

0

1

0

0

0

0

0

0

5

0

0

0

0

18

1

0

0

0

1

0

0

6

0

0

0

1

0

10

0

0

1

6

0

2

7

0

0

0

0

3

0

6

10

1

0

0

0

8

0

0

0

0

1

0

4

12

1

0

0

2

9

2

1

8

0

1

1

1

2

3

1

0

0

10

0

1

3

0

1

0

1

0

0

13

0

1

11

0

2

0

0

0

1

0

0

4

2

11

0

12

1

2

0

0

0

0

0

0

0

0

0

17

 

Combination of multiple descriptors 

 

Distance and curvature features together (accuracy 89,6%)

 

1

2

3

4

5

6

7

8

9

10

11

12

1

20

0

0

0

0

0

0

0

0

0

0

0

2

3

17

0

0

0

0

0

0

0

0

0

0

3

0

0

20

0

0

0

0

0

0

0

0

0

4

0

0

0

14

0

0

0

6

0

0

0

0

5

0

0

0

0

20

0

0

0

0

0

0

0

6

0

0

0

0

0

15

0

0

5

0

0

0

7

0

0

0

1

0

0

14

5

0

0

0

0

8

0

0

0

0

0

0

0

20

0

0

0

0

9

0

0

2

0

1

0

0

0

17

0

0

0

10

0

0

0

0

0

1

0

0

0

19

0

0

11

0

0

0

0

0

0

0

0

0

1

19

0

12

0

0

0

0

0

0

0

0

0

0

0

20

 

 

Distance, curvature and area features together (accuracy 92,9%)

 

1

2

3

4

5

6

7

8

9

10

11

12

1

20

0

0

0

0

0

0

0

0

0

0

0

2

3

17

0

0

0

0

0

0

0

0

0

0

3

0

0

16

0

3

0

0

0

1

0

0

0

4

0

0

0

19

0

0

0

1

0

0

0

0

5

0

3

0

0

17

0

0

0

0

0

0

0

6

0

0

0

0

0

19

0

0

1

0

0

0

7

0

0

0

0

0

0

20

0

0

0

0

0

8

0

0

0

1

0

0

0

19

0

0

0

0

9

0

0

3

0

0

0

0

0

17

0

0

0

10

0

0

0

0

0

1

0

0

0

19

0

0

11

0

0

0

0

0

0

0

0

0

0

20

0

12

0

0

0

0

0

0

0

0

0

0

0

20

 

 

All the 4 features (accuracy 93,8%)

 

1

2

3

4

5

6

7

8

9

10

11

12

1

20

0

0

0

0

0

0

0

0

0

0

0

2

0

20

0

0

0

0

0

0

0

0

0

0

3

0

0

18

0

0

0

0

0

2

0

0

0

4

0

0

0

20

0

0

0

0

0

0

0

0

5

0

0

0

0

20

0

0

0

0

0

0

0

6

0

0

0

0

0

20

0

0

0

0

0

0

7

0

0

0

1

0

0

16

3

0

0

0

0

8

0

0

0

0

0

0

0

20

0

0

0

0

9

0

0

3

0

0

0

0

0

17

0

0

0

10

0

0

0

0

0

2

0

0

0

18

0

0

11

0

0

0

2

0

0

2

0

0

0

16

0

12

0

0

0

0

0

0

0

0

0

0

0

20

 

 

Recognition of dynamic gestures

The proposed approach can also be used to follow the trajectory and orientation of the hand during the execution of dynamic gestures. The figure shows the trajectory of the hand during 3 different repetitions of the gesture in the first example. Notice how the proposed approach is able to reliably identify the hand and to compute the three-dimensional trajectory associated to the gesture. The following videos instead show the output of the hand detection phase of the algorithm on two sample dynamic gestrues. Note that the videos show only the first detection step (i.e. simple circle fitting and PCA, the main axis is reversed w.r.t. to the notation used in the paper).

 

Example 1 (video of the gesture)

Example 1 (output of the proposed approach)

Example 2 (video of the gesture)

Example 2 (output of the proposed approach)

 

[1] Ren, Z., Yuan, J., Zhang, Z., "Robust hand gesture recognition based on nger-earth mover's distance with a commodity depth camera", in: Proc. of the 19th ACM international conference on Multimedia, ACM, New York, NY, USA. 2011, pp. 1093-1096.