The Intelligent Bartender

Tuesday, May 31, 2016

Final post

Today we handed in the final code, the report and a video of the software in action. Overall, the project was very successful, as shown in the video below. Note that the clips in the video show typical interactions with the software and were not hand-selected to show particularly successful test runs. In fact, our software will detect the correct gesture in an overwhelming majority of the cases.

Thanks for a fun and a very instructive course. /Sverrir, Pablo and Ismael.

UPDATE: Our project has now been formally assessed. We received the highest grade and the following feedback:

Excellent work comparing different machine learning-based algorithms. Comprehensive description of all steps involved in building the classification systems. Critical analysis of the results. Very clear and organized report. Very nice and informative blog, nice video. Well done!

Tuesday, May 24, 2016

User Interface

One of the features we wanted to include in our final functional prototype was a usable and coherent user interface to display different options selected and also give the user feedback of what Leap Motion is actually capturing or detecting.

To do so, we have sketched this first version of user interface:

In the black placeholder we intend to display a 3D version of user's hands. Also, as soon as different gestures get detected, the options will be automatically selected. Using a text-to-speech toolkit, selected options will also be said for user confirmation.

Sunday, May 22, 2016

Definition of the gestures and their meanings [UPDATE 31/05]

At this point, we are able to describe the gestures that we considered to form the main block of our system. As Pablo and Sverrir show before, there isn't any protocol or steps that must be follow to configure the final order, due to the back end of the program is able to identify the gesture and classify it depending on the gestures that the user did.

Now, we are using only 8 different gestures, minimum number described in the requirement paper of this project. Furthermore, is important to remember that all the gestures begin in the same position due to the Leap Motion module sometimes fails detecting the real position of the hand and fingers.

The initial position is:

Picture	Gesture Description
	Initial All the gestures begins with the same static gesture.

From this we can achieve this others:

Picture	Gesture Description
	*CIRCLE* Select pasta as a meal
	*PISTOL* Drink cola
	*PINKY* Pay with cash
	*ROLL* Cancel the last selection

Picture	Gesture Description
	*ROCK* Select hamburger as a meal
	*SCISSORS* Drink Cocktail
	*COME* Pay with card
	*STOP* Accept the order. Confirmation.

Recording all the samples to the neural network we figured out that there is a problem with the Leap Motion module. That mean, when we introduce the user hand into the range of the module (as you can see in the 'Initial Position' picture) is it possible that the output data could be with not the correct roll value. This is, if the value expected is 0º, we read 180º proximately. Indeed, unless the system detect this problematic initial position, is impossible to continue with the complete gesture; the result will not be the correct one.
Furthermore, recognizing the position of the fingers suppose other point of fault because sometimes the module misunderstand which is the extended finger.

This non-deseable scenarios change all the steps to record the samples because a preprocessing is needed using the visualization and the input data, which the program use to train the neural network.

Thursday, May 19, 2016

Latest developments

There have been two group meetings since the weekend update, which I will summarize in this blog post:

Recording the rock'n'roll gesture with a helpful friend.

As mentioned in the previous post, we planned a meeting on Tuesday in order to use the Leap Motion controller to record the hand gestures of some people unacquainted with our work. I invited two friends to come and we recorded both of them doing ten different gestures ten times each. In addition to that, we also asked a female student in the building who graciously gave us some of her time in order to record the gestures. This gave us a total of 300 recordings and I would like to thank all of them for the help.

Another friend attempts the scissors gesture.

However, in spite of our efforts we found out that our Leap Motion controller is not a terribly accurate device. After analyzing the data, we saw that the controller often confused the right and the left hands of the user and was not able to accurately detect some of the gestures we had intended to use, even some relatively simple ones. We nevertheless trained a neural network on the data and saw, not surprisingly, that it was not performing as well as the neural network we had used last week. There can be various reasons for this, but probably it is a lot harder to train a neural network to recognize 9-10 gestures than 5 gestures (which we did last week).

After this slight disappointment, we decided to simplify our gestures a little bit and to only work with eight gestures instead of ten. We also decided that in future recording sessions, we would use the Leap Motion visualizer in order to see if the recording seems to be happening successfully or not. The final version of our software should also have a video output in order to show the user how their gesture is being recorded, so that the user can cancel the recording if the Leap Motion appears to be misbehaving. We are also going to see if using a different framework, library or a machine learning technique works better for our data than using PyBrain's artificial neural network.

The third feedback session happened today, Thursday. We presented the current state of our project and got some feedback from the instructors; they suggested trying a different Leap Motion controller to see if it worked better, and maybe to try some new machine learning algorithms. We also saw the project work of another group which is also using the Leap Motion controller; in case the reader is interested, their blog is at cvml1.wordpress.com. Our work is a little bit different though, since they are only working with static gestures while we expecting the user to do some motion. One aspect of their work interested me though in particular; they said that they are planning to use a weighted k-nearest neighbour classifier, which might not be such a naive method like I thought at first.

To conclude this post: today, we are going to redo some of the recordings that we did on Tuesday while being careful that the Leap Motion controller is working properly. Then we will see if our machine learning algorithm works better with the new data. We will be sure to post an update sometime soon and report if it will finally work correctly, after which we can start working on the user interface and other parts of the final version of the software.

Saturday, May 14, 2016

Experimenting with the Leap Motion controller (bottom right corner)

Yesterday, we presented the results that we have achieved to half of the class at the second feedback session. It was valuable for two reasons:

to get some feedback from the course instructors (more on that later).
to listen to the other groups, particularly another group that is also doing a Bartender project with the Leap motion controller.

We found out that we are doing quite well compared to the other students (we have a "head start" like one instructor said) although there is a lot of work that remains to be done in the next two weeks. On a meeting we had after the feedback session, we decided that the next immediate step should be to finish with all necessary data collection, so we plan to ask some friends to attend a meeting on Tuesday where we will collect video clips of them performing hand gestures on the Leap Motion controller. Then we will use that data to train our final version of the artificial neural network (see the previous blog post) which is probably the crucial part of our project.

On the feedback session, there was some discussion about what one needs to do in order to achieve a high course grade. The head teacher said that for the machine learning groups, it is important to compare two different techniques, which means that we need to implement an additional learning method than a neural network. A simple method to implement (just for the sake of doing a comparison) would be the k-nearest neighbor algorithm, although we might also do something a little more complicated. However, the neural network appears to be working so well at the moment (at least with the data that we have collected so far) that it might be difficult to find a method that will perform better. Possibly we can find a method that performs equally well but has some advantages over neural networks, such as simplicity or ease of implementation. In addition to that, one instructor proposed that we try to go deeper into the inner workings of our neural network through visualization of its last layer.

Thursday, May 12, 2016

Data analysis, formatting and neuronal network

Once we achieved to successfully receive data frames from the Leap Motion, we decided to format the data so that we could use it for later processing, analysis and feature extraction.

This parsing process involved creating our own data structure during the reading process from the Leap Motion in order to output a JSON file we could store in the end.

This is an example of how our data structure looks like for a gesture in time:

With this very format, we created a list of five first gestures we would use for training during the creation of the neuronal network. For these gestures, we captured up to 10 times the same gesture, so that we had enough data to see differences and use it for training.

During the data analysis, we decided to base our machine learning algorithm on the difference in specific features of the hand and its fingers. You can read further about the features selected in future blog posts.

Using the principle of difference in features, we crated another parser to turn our JSON input structure to the final datasets we were going to use for the neuronal network. This is an example of a line of dataset refering to an specific gesture to train or test:

[1, -78.42485301863498, -11.56365593923438, -2.682151848258158, -79.59669543723484, -72.97278368984328, -65.2333200146023]

Once the data was ready to be fed to any neuronal network, we decided to use PyBrain for the network and the machine learning algorithm. First trials were successful, and we could difference and recognize the gestures we had recorded and trained with great confidence.

After all this process, we decided to create the first usable command-line application which applies the complete process (feeding the network from our structured files, training the network and testing the network against files of gestures).

|||||||||||||||||

Choose an option:

|||||||||||||||||

1. Feed from folder

2. Train

3. Test from file

> 3

File path: test/test_rnr.txt

 -> Results:

     pistol: 7%

     rnr: 85%

     rock: 3%

     scissors: 1%

     yaw: 2%

 -> Match: rnr

Monday, May 9, 2016

First contact with the Leap Motion Module

Next day of our meeting in the Café Storken, Pablo and me decide to start with the module. The aim of our meeting was:

Installing the module drivers and Python libraries needed to run it in Windows and Mac OS. It wasn't too complicate in Mac but in Windows was a special challenge to be solved.
Create a program to see what is going on in the module or in which format it works. The first idea to do that was developing a program with the APIs help. Due to the OS couldn't recognize the module we decide to divide the work. Pablo continued developing our program while I tried to search other one. Finally, I found other possibility in the examples of the module [link]. This other program reads from the input port (in leap motion is the standard USB) and print a line with the data captured.
Save the example data into a file to analyse it and then to develop a method to treat it and through it to the Machine Learning Algorithm. Here you can take a look into it [link].

Tuesday, May 3, 2016

The aim of this blog is to document our project work in the course Intelligent Interactive Systems at Uppsala University. Our project is about creating software for a bartender robot that can recognize hand gestures. We will familiarize ourselves with state-of-the-art technology and will therefore work with (among other things):

(hardware) a Leap Motion controller for recognizing user input
(software) an artificial neural network library, such as Caffe or Google's TensorFlow, in order for our robot to distinguish between different gestures.

Today the weather in Uppsala was beautiful, so me, Pablo and Ismael met at Café Storken to create a timetable for our project. We realized that we have just about three weeks left until the final deadline, so we decided to split the work at hand into more manageable subtasks. The first order of business was to book a time slot with the Leap Motion controller so that we can get a better feeling for the format of the data that we will need to work with. We agreed that Pablo and Ismael will take a look at the controller tomorrow and on Saturday, the three of us will meet and experiment with the TensorFlow library. At this time, we believe that the most challenging part of the project will be to train the artificial neural network, so it is important to start as early as possible.

Other things left to do include the text-to-audio module (see the System Architecture diagram) and to generate data, in the form of video clips of people showing different hand gestures. However, we believe that those parts of the project will take relatively short time to implement, so we will put them on hold for the times being.