Project: Convolutional Neural Networks

The way I got interested in the neural networks was their ability to solve complex problems which were marked as unsolvable or extremely hard to solve and because they are cool. The topic of interest in this article will be solely about CNNs which are neural networks used mainly for detection, recognition and image processing.

I will not go into detail of implementation of some of mine CNN models (the code and talk is on my GitHub).

So, the first thing whenever I’m interested in something, I try to have general understanding of the topic. I watched a lot of videos, read lots of articles and In the beginning, when I wanted to look at the low level programming of the neural network I read a book Make Your Own Neural Network by Tariq Rashid, which explains the math and basic principles of neural networks.

The time flew by and I had some other things in my life that I gave more attention to whether it was 3D designing/modeling, programming microcontrollers, school etc. But I finally dove a bit deeper and I’m very glad I did.

To know the implementation procedure of TensorFlow and to generally understand the topic I got help from programmers/educators to name a few Harrison Kinsley (Sentdex), 3Blue1Brown.

My main intentions during this summer holidays were to get to know the options and be able to make some basic neural network models. I discovered that the field of neural network is very diverse and is getting more diverse each day, there are various solutions offered to you if you have a problem to solve.

Right know (at the time of writing) I more-less settled on the CNNs but plan to play with other architectures as well.

I had some foundation in OpenCV which I gained from previous projects, so some things were easier when working with neural networks especially open-source high level programming library Keras in TensorFlow.

When using TensorFlow Keras I didn’t have to worry about the mathematics and low level programming structure of neural networks, which sped the part where you have to create your model but I know mathematics behind NNs is very important to know in order to have successful and well trained models but I have general knowledge of the functions and use/change them accordingly and the knowledge I have right now is sufficient enough for me to build something. But if you know everything about neural networks that doesn’t prevent you from having to find your own problem to solve, having to find a database/creating your own database, fitting the database to your requirements, training the model and then applying the model to your problem.

As they say, repeating is the mother of knowledge, so I tried to create a few models which could be interesting:

Hand recognition

This model was created using transfer learning with object detection TensorFlow API, I had to create my own database and spend like 2-3 hours downloading and labeling each hand in an image and couple more hours debugging. The model isn’t very robust, but it is a proof of concept.

I used a cloud GPU service paperspace for retraining, which really sped things up as I don’t have any Nvidia GPU. If you want to try the service here is a referral and you will get some credit to try it .

When I first tested the model, it wasn’t very good, but it was okayish but after I tested it on hd video it was impressive to say the least. I “just” gave it over 550 pictures and it can recognize a hand. Anyway, I thought that I would pair it with another NN for recognizing the rock, paper, scissors and be able to play a game. But I continued along building other models saying it wasn’t that good enough, maybe it was because of the bad lightning in my room.

Face gender recognition

I was searching for models to build and found a great dataset with thousands of pictures of people faces with different age, gender, race etc… (dataset/code on GitHub).

I was searching for models to build and found a great dataset with thousands of pictures of people faces with different age, gender, race etc… (dataset/code on GitHub).

Paperspace cloud GPU service again helped a lot and I could test different variations of the model. I finally settled on the one I am showing in the video and which I also tested on some other pictures.

You can notice that it is wrong sometimes when it has 50/50 probability of recognizing fe/male as I didn’t gave it any threshold but yeah, sometimes it is just plain wrong, but I would say it performs well. My intentions weren’t to perfect the network as much as I can but to build a model which could recognize the faces and it wouldn’t be randomly mismatching that much. I built it for educational purpose only.

Facial emotion detection

I was looking for something slightly more complex and not binary, so I’ve decided to detect people emotions. I found an amazing dataset from previous Kaggle challenge link to the dataset is on my GitHub but you could find it by typing FER Kaggle Challenge. I was extracting and cropping people faces using OpenCV (with imported haarcascades for face and eye detection) and for emotion extraction I firstly tried to do transfer learning using VGG16, I’ve deleted top layers of the VGG16 CNN and added my own dense layers at the end. Then I ran VGG16 through all my training as well as testing dataset to get the last layers values {features from the VGG16 extracted from the snipped end} and saved them in a separate file.

Then I loaded this data into my added dense layers and trained just the layers I added on the values from the VGG16. After training and testing I’ve decided that I would be better of training my own CNN because the results weren’t that impressive. Lots of times the last layer output from VGG16 contained a lot of empty values. Maybe I just chose bad model or did something wrong.

Training my own CNN was rather quick paced because I already had my data prepared, I had to change just a couple of lines adding my own convolutions etc... Because of the time It took to train one model I started training again on cloud GPU service (it really is unbearable to train on CPU). After training dozens of models, I settled on the one I am using right now.

For those who care about the model itself:

  • Input shape is (48,48,1)
  • 3 convolutional layers with 256 filters of size (3,3)
  • Activation function is always relu except last layer where the activation is sigmoid.
  • After each convolution and activation there is a max pool layer of size (2,2).
  • At the end there are two 128 dense layers.
  • Batch size was 25 and I used dropout throughout of value 0.2.
  • Loss function is categorical_crossentropy

Facial emotion detection and extraction from video

To put my model in practice I first tested it on my own web cam feed. The results were admirable at least by me. When you get neural network working it never gets old.

To put my model in practice I first tested it on my own web cam feed. The results were admirable at least by me. When you get neural network working it never gets old.

I ran the video combined with OpenCV for face detection (to improve emotion detection success I also used eye detection because my model was trained on frontal face with eyes, nose, mouth etc.. it doesn’t matter that there are sometimes 5 eyes recognized in a face because this mostly happens when the person’s face is well aligned, I could tweak the detection a bit but it’s not necessary because the eyes are mostly around the mouth/nose area which is desirable).

Those detection (eye and face) are represented as XML files and can be imported to OpenCV, they are called haarcascades and they are basically a trained classifiers. They have their Intel license so if you want to use them commercially you can’t but you can search the internet for some other detection models as well, python library face_recognition has full-fledged license I think, maybe you would want to look at that. You can search for these classifiers, just type OpenCV haarcascades GitHub.

After recognizing emotions, which is a lot of times chaotic I saved all the data I got from the video into the csv format. The information I saved were (frame sequence number, number of faces in a frame, emotions detected, name of the file and if there are faces then face coordinates added at the end).

After I had the data, I could do some interesting things with it. The thing I did was I had other python file which read all the data in csv and saved the cropped faces into other directory base on emotion and some smoothing algorithm.

Certainly, anybody who stopped the video of a person talking it looked a little crazy at times and I don’t think that if you showed the face to another person that they would correctly interpret the face emotion. The same goes with neural networks.

The thing I added to the algorithm to have better success rate at extracting correct emotions was something I called repeater, the program checks whether the emotion of the actor stays the same for a couple of frames and assumes that at that time the person is not talking/quickly changing emotions. I had a more successful classification if the frames stayed the same after using the repeater for 8 frames than say 2 frames.

Also, for simplicity I extracted the emotions just from the frames with one face.

The result can be seen here:

Without repeater:

With repeater:

Facial emotion detection and extraction based on persona

To go a step further I added face recognition algorithm into the equation (python face_recognition library) and now I can find the desired emotion of the person/people I choose to recognize. For example, I want all happy faces of Robert Downey Jr. from Iron Man trilogy, well now, if I have the whole movie in an mp4 file for example I can simply crop his face photo and put it in a folder with known faces, run recognition through the whole movie (speed would depend on the Nvidia GPU you have and your decision to change movie resolution but I would suspect couple hours if you want every frame, which you don’t have to) and then run program to extract those faces.

Optimally what you would end up with would be all the face pictures of Iron Man in the movie where he is happy. Of course, there would be a lot of misclassified images but that is mostly taken care of by repeater (checking how many times the emotion didn’t change over time, meaning the person is not talking).

In the end this is not a finalized production ready model just a proof of concept and some application on top. I didn’t have intentions for releasing this code but it is in the same directory on GitHub, if you want you can play with it. {1. 2.}

The emotion detection with facial recognition is a lot slower than I expected. It takes couple times longer for the model with recognition to generate all the data but at least it is somehow working.