GEZR

Authors
Baisan Razvan (razvanb135@gmail.com)
Florean Oana-Lavinia (florean.lavinia@gmail.com)
Gafitescu Petru-Marian (pmgafitescu@gmail.com)
Bugs & Feedback
Issues and PRs welcome!
Stages of project development
The stages and the tasks done by each member can be seen on github.

Abstract

Gezr is a web application that has the purpose of creating a small structure out of Webcam-captured video-streams, by detecting, classifying and comparing hand/arm gestures of users, providing also information about rules resulted from the input gesture. For this analysis, the data will be modeled using a Resource Description Framework (RDF) schema. Simple users will connect to the application with camera on and they could see, by request, gestures data resulted until that point and, eventually, they can start a quiz game.

Introduction

This report focuses on presenting the considerations about the internal data structures/models to be used and the external APIs managed by the application. It starts by describing the main used classes, presented through a RDF Data Management, and the graphical representation of how these interact within the application. This is followed by a description of external libraries and models used in the process of gesture analysis.

Internal data structure and models

For representing the internal data of the application and the linking structure, a Resource Description Framework (RDF) was created. The main classes are:

Since MediaPipe was used for extracting metadata about the analysed hand, other structures were created for handling such data. In the end we didn't used this in our application, but is still kept as part of the ontology. The main classes are:

RDF schema

The following code section provides a short version of our schema, with only one instance of each class Gesture or Rule. Also, for better understandings, a visual representation was added after.


    <?xml version="1.0"?>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
              xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
              xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
              xmlns:owl="http://www.w3.org/2002/07/owl#"
              xml:base="http://fiigezr.org/fiiGezr.owl"
              xmlns="http://fiigezr.org/fiiGezr.owl#">
    
    <owl:Ontology rdf:about="http://fiigezr.org/fiiGezr.owl"/>
    
    <owl:ObjectProperty rdf:about="#is_caused_by">
      <rdfs:domain rdf:resource="#Rule"/>
      <rdfs:range rdf:resource="#Gesture"/>
      <owl:inverseOf rdf:resource="#causes_rule"/>
    </owl:ObjectProperty>
    
    <owl:ObjectProperty rdf:about="#causes_rule">
      <rdfs:domain rdf:resource="#Gesture"/>
      <rdfs:range rdf:resource="#Rule"/>
      <owl:inverseOf rdf:resource="#is_caused_by"/>
    </owl:ObjectProperty>
    
    <owl:ObjectProperty rdf:about="#makes_gesture">
      <rdfs:domain rdf:resource="#User"/>
      <rdfs:range rdf:resource="#Gesture"/>
    </owl:ObjectProperty>
    
    <owl:DatatypeProperty rdf:about="#has_gesture_time">
      <rdfs:domain rdf:resource="#Gesture"/>
      <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#dateTime"/>
    </owl:DatatypeProperty>
    
    <owl:DatatypeProperty rdf:about="#has_rule_time">
      <rdfs:domain rdf:resource="#Rule"/>
      <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#dateTime"/>
    </owl:DatatypeProperty>
    
    <owl:DatatypeProperty rdf:about="#has_gesture_name">
      <rdfs:domain rdf:resource="#Gesture"/>
      <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
    </owl:DatatypeProperty>
    
    <owl:DatatypeProperty rdf:about="#has_gesture">
      <rdfs:domain rdf:resource="#Rule"/>
      <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
    </owl:DatatypeProperty>
    
    <owl:Class rdf:about="#User">
      <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
    </owl:Class>
    
    <owl:Class rdf:about="#Gesture">
      <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
    </owl:Class>
    
    <owl:Class rdf:about="#Wave">
      <rdfs:subClassOf rdf:resource="#Gesture"/>
    </owl:Class>
    
    <owl:Class rdf:about="#Rule">
      <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
    </owl:Class>
    
    <owl:Class rdf:about="#CloseCamera">
      <rdfs:subClassOf rdf:resource="#Rule"/>
    </owl:Class>
    
  </rdf:RDF>
              
            

RDF schema diagram

Evolution of the model

The architecture of the initial proposed model has not changed much. In general, the focus was shifted from the data that would form a gesture, to the gesture itself and what it causes . Following this idea, classes like Webcam or Data were removed and instances of the Gesture or Rule class were added.

Use of concepts

The web application creates and registers Gesture objects from Webcam-captured video streams, with the time when it was created. Gestured are continuously interrogated through SPARQL and based on the most frequent gestures in the last 10 seconds, if a certain threshold is exceeded, Rules instances are created and bound to the gesture that caused it, for future references.

External data sources

For the process of extracting data and gestures from the video-streams we are using an external model specialized in gesture control systems.

On-Device, Real-Time Hand Tracking with MediaPipe

MediaPipe Hands is a high-fidelity hand and finger tracking solution. It employs machine learning (ML) to infer 21 3D landmarks of a hand from just a single frame. Compared to other solutions, MediaPipe has good results on both desktop environments and mobile devices, scaling also on multiple hands.

MediaPipe Hands utilizes an ML pipeline that consists of multiple models merged together: A palm detector model that has the purpose to find a bounding box around the areas of interests, the hands; a hand landmark detector model that receives as input the cropped image and return high-fidelity 3D hand keypoints.

Media Pipe hand crops with ground thruth annotation

Owlready2

Owlready2 is a package for ontology-oriented programming for Python. It supports various operations on ontologies as Python objects. Mainly we used it for automatically creating the schema though simple objects and for saving changes.


              
              class Gesture(Thing): pass
              class Wave(Gesture): pass

              class Rule(Thing): pass
              class CloseCamera(Rule): pass

              class has_gesture_time(DataProperty):
                domain = [Gesture]
                range = [datetime.datetime]

              class causes_rule(ObjectProperty):
                domain = [Gesture]
                range = [Rule]
                inverse_property = is_caused_by
            

RDFLib

RDFLib is a Python library for working with RDF. We used it for creating SPARQL queries.

The following example of SPARQL counts all instances of class Gesture, or subclasses of it, that were registered in a specific interval of time.

                PREFIX gezr: 
                SELECT ?x ?name (count(distinct ?x) as ?count)
                WHERE {
                    ?x rdf:type/rdfs:subClassOf* gezr:Gesture .
                    ?x gezr:has_gesture_time ?data .
                    ?x gezr:has_gesture_name ?name .
                    FILTER (?data > ' 
                    start_time.strftime("%Y-%m-%dT%H:%M:%S.%f") /
                    '^^xsd:dateTime 
                    && ?data < ' /
                    current_time.strftime("%Y-%m-%dT%H:%M:%S.%f") /
                    '^^xsd:dateTime)
                }
                GROUP BY ?name
                ORDER BY DESC(?count)
              

In this example we form triples of form ?gesture ?causes_rule ?rule . These are added afterwards in the owl graph.


                PREFIX gezr: 
                CONSTRUCT {
                  ?x gezr:causes_rule ?r
                }
                WHERE {
                  ?x rdf:type/rdfs:subClassOf* gezr:Gesture .
                  ?r rdf:type/rdfs:subClassOf* gezr:Rule .
                  ?x gezr:has_gesture_name ?name .
                  ?r gezr:has_gesture ?name .
                }
              

Trivia API

The Open Trivia API provides a free JSON API for generating questions with the following features: Category, Difficulty, Type. Inside the application it was used only for True / False questions.

References

  1. An Ontology for Reasoning on Body-based Gestures by Mehdi Ousmer , Jean Vanderdonckt , Sabin Buraga (accessed on ) .
  2. Defining N-ary Relations on the Semantic Web ; published in (accessed on ) .
  3. On-Device, Real-Time Hand Tracking with MediaPipe ; published in (accessed on ) .
  4. Google open-sources gesture tracking AI for mobile devices ; published in (accessed on ) .
  5. MediaPipe Hands , published in (accessed on ) .
  6. Open Trivia Database ; (accessed on ) .
  7. Welcome to Owlready2’s documentation! ; (accessed on ) .
  8. rdflib 5.0.0 ; (accessed on ) .