Gezr is a web application that has the purpose of creating a small structure out of Webcam-captured video-streams, by detecting, classifying and comparing hand/arm gestures of users, providing also information about rules resulted from the input gesture. For this analysis, the data will be modeled using a Resource Description Framework (RDF) schema. Simple users will connect to the application with camera on and they could see, by request, gestures data resulted until that point and, eventually, they can start a quiz game.
This report focuses on presenting the considerations about the internal data structures/models to be used and the external APIs managed by the application. It starts by describing the main used classes, presented through a RDF Data Management, and the graphical representation of how these interact within the application. This is followed by a description of external libraries and models used in the process of gesture analysis.
For representing the internal data of the application and the linking structure, a Resource Description Framework (RDF) was created. The main classes are:
The following code section provides a short version of our schema, with only one instance of each class Gesture or Rule.
Also, for better understandings, a visual representation was added after.
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xml:base="http://fiigezr.org/fiiGezr.owl"
xmlns="http://fiigezr.org/fiiGezr.owl#">
<owl:Ontology rdf:about="http://fiigezr.org/fiiGezr.owl"/>
<owl:ObjectProperty rdf:about="#is_caused_by">
<rdfs:domain rdf:resource="#Rule"/>
<rdfs:range rdf:resource="#Gesture"/>
<owl:inverseOf rdf:resource="#causes_rule"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:about="#causes_rule">
<rdfs:domain rdf:resource="#Gesture"/>
<rdfs:range rdf:resource="#Rule"/>
<owl:inverseOf rdf:resource="#is_caused_by"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:about="#makes_gesture">
<rdfs:domain rdf:resource="#User"/>
<rdfs:range rdf:resource="#Gesture"/>
</owl:ObjectProperty>
<owl:DatatypeProperty rdf:about="#has_gesture_time">
<rdfs:domain rdf:resource="#Gesture"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#dateTime"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:about="#has_rule_time">
<rdfs:domain rdf:resource="#Rule"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#dateTime"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:about="#has_gesture_name">
<rdfs:domain rdf:resource="#Gesture"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:about="#has_gesture">
<rdfs:domain rdf:resource="#Rule"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:Class rdf:about="#User">
<rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
</owl:Class>
<owl:Class rdf:about="#Gesture">
<rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
</owl:Class>
<owl:Class rdf:about="#Wave">
<rdfs:subClassOf rdf:resource="#Gesture"/>
</owl:Class>
<owl:Class rdf:about="#Rule">
<rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
</owl:Class>
<owl:Class rdf:about="#CloseCamera">
<rdfs:subClassOf rdf:resource="#Rule"/>
</owl:Class>
</rdf:RDF>
The architecture of the initial proposed model has not changed much. In general, the focus was shifted from the data that would form a gesture, to the gesture itself and what it causes . Following this idea, classes like Webcam or Data were removed and instances of the Gesture or Rule class were added.
The web application creates and registers Gesture objects from Webcam-captured video streams, with the time when it was created. Gestured are continuously interrogated through SPARQL and based on the most frequent gestures in the last 10 seconds, if a certain threshold is exceeded, Rules instances are created and bound to the gesture that caused it, for future references.
For the process of extracting data and gestures from the video-streams we are using an external model specialized in gesture control systems.
MediaPipe Hands is a high-fidelity hand and finger tracking solution. It employs machine learning (ML) to infer 21 3D landmarks of a hand from just a single frame. Compared to other solutions, MediaPipe has good results on both desktop environments and mobile devices, scaling also on multiple hands.
MediaPipe Hands utilizes an ML pipeline that consists of multiple models merged together: A palm detector model that has the purpose to find a bounding box around the areas of interests, the hands; a hand landmark detector model that receives as input the cropped image and return high-fidelity 3D hand keypoints.
Owlready2 is a package for ontology-oriented programming for Python. It supports various operations on ontologies as Python objects. Mainly we used it for automatically creating the schema though simple objects and for saving changes.
class Gesture(Thing): pass
class Wave(Gesture): pass
class Rule(Thing): pass
class CloseCamera(Rule): pass
class has_gesture_time(DataProperty):
domain = [Gesture]
range = [datetime.datetime]
class causes_rule(ObjectProperty):
domain = [Gesture]
range = [Rule]
inverse_property = is_caused_by
RDFLib is a Python library for working with RDF. We used it for creating SPARQL queries.
The following example of SPARQL counts all instances of class Gesture, or subclasses of it, that were registered in a specific interval of time.
PREFIX gezr:
SELECT ?x ?name (count(distinct ?x) as ?count)
WHERE {
?x rdf:type/rdfs:subClassOf* gezr:Gesture .
?x gezr:has_gesture_time ?data .
?x gezr:has_gesture_name ?name .
FILTER (?data > '
start_time.strftime("%Y-%m-%dT%H:%M:%S.%f") /
'^^xsd:dateTime
&& ?data < ' /
current_time.strftime("%Y-%m-%dT%H:%M:%S.%f") /
'^^xsd:dateTime)
}
GROUP BY ?name
ORDER BY DESC(?count)
In this example we form triples of form ?gesture ?causes_rule ?rule . These are added afterwards
in the owl graph.
PREFIX gezr:
The Open Trivia API provides a free JSON API for generating questions with the following features: Category, Difficulty, Type. Inside the application it was used only for True / False questions.