This work attempts to research methods that allows one to perform automatic Sign Language Recognition (SLR) from video streams. To this end, novel machine learning models and algorithms are proposed to produce classifiers that can detect and recognise segments in video sequences that contain a meaningful sign.
Automated Sign Language Recognition remains a challenging problem to this day. Like spoken languages, sign language feature thousands of signs, sometimes only differing by subtle changes in hand motion, shape or position. This, compounded with differences in signing style and physiology between individuals, makes SLR an intricate challenge.
Our approach to automated SLR uses discriminative spatio-temporal patterns, called Sequential Patterns (SPs). SPs are ordered sequences of feature subsets that allow for explicit spatio-temporal feature selection and do not require dynamic time warping for temporal alignment. We have further improved SPs by integrating temporal constraints into the pattern, resulting in a Sequential Interval Pattern (SIP). Detecting SPs and SIPs in an input sequence is done by finding their representative ordered events (feature subsets) that satisfy the temporal constraints (for SIPs). A multi-sign detector and recogniser can be built by combining a set of SPs or SIPs into a hierarchical structure, called a Hierarchical Sequential Pattern Tree (HSP-Tree). An example of an HSP-Tree can be seen below:
We have evaluated the method on a dataset of continuous sign sentences, demonstrating that it can cope with co-articulation. The proposed approach was shown experimentally to yield significantly higher performance than SP-Trees for unsegmented SLR, in both signer dependent (49 % improvement) and independent datasets (12% improvement). Additionally, comparisons with HMMs have shown that the proposed method either equal or exceed the accuracy of HMMs, in signer dependent (71% HSP vs 63% HMMs) and signer independent (54% HSP vs 49% HMMs). This is achieved at a significantly reduced processing time: (2 minutes HSP vs 20 minutes HMMs for processing 981 signs).
Eng-Jon Ong, Nicolas Pugeault, Oscar Koller , Richard Bowden Sign Spotting using Hierarchical Sequential Patterns with Temporal Intervals. In IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2014.
This work was part of the EPSRC project “Learning to Recognise Dynamic Visual Content from Broadcast Footage“ grant EP/I011811/1.