Project: Proto

The reasoning behind this project was just excitement and building up self confidence that I can do anything. Also, I’ve been watching a lot of Iron Man at the time and wanted to build something similar to J.A.R.V.I.S. Plans for this project weren’t very clear and I learned a lot from it. I wanted to Proto (his name) to be able to move, hear, speak and listen to my commands. It was rather big project. Proto was something close to assistant. I was rather a beginner in electronics and programming. I’ve known Arduino or microcontroller programming not more than 4-5 months.

I started with motors and movement. I used pair of NEMA 17 stepper motors which were connected to CNC shield for Arduino. These motors were powered with ATX from older computer, which I would consider rather noisy. The communication between Arduino and PC was conducted by serial port. The mounted camera was connected to PC by USB cable. That’s one of the problems I later had with this project- too much cables. You can’t design 360-degree rotational head and cable management not be one of your top priorities.

I continued designing the case, rotational part of the case a rotational part for camera, which was mounted at the top. This design had 3 iterations. This was the first project and first design I’ve done with Fusion 360 and 3D modeling in general.

After installing everything into one case I had camera working together with movement, which could be controlled by software Gcode sender either with buttons or command.

After this I decided that it would be great to give proto ability to track me in my room. I’ve done this with OpenCV library and C++. I used already trained library for face recognition and merged it with movement control so when I got the coordinates of the position of my head I could easily control the movement.

After this I paused the main project a little bit and learned more about computer vision and how does it really work. How can we use more frames to tell and point out movement, trying to implement simple line detection program, how can we use color in our image or face recognition to replace our head with another picture or blur it. So, I was having fun for a while, but I dont have most of the media, because I didn't think I would need it later.

The next thing I was working on was text to speech program. This wasn’t a really hard part after all, Visual Studio had built in library for this. The harder part was speech to text. In other words, converting your speech word by word into digital form. Again, speech to text was a long-time computer science problem and I didn’t have the capacity for any of that. I used speech recognition built in the Visual Studio. The problem this speech recognition had was that it only could recognize words not sentences and only the words I gave the computer in form of a dictionary, which was basically a txt file with one word in each row or sentence in each row. But recognition of words in mentioned sentence could be done only when the sentence was spoke.

I wanted for example for Proto to play some of my favorite music off YouTube. Firstly, I created a command for hearing my voice to avoid constant recognition of me, other people or noises in my room. I had a command “Hey, Proto” and couple more. After this Proto played quiet sound and was ready for recognition. I also had to add if statements for different kind of commands. For starting a browser, searching the song on YouTube and later playing the song I had “Play”. After this command, Proto started hearing my voice word by word and creating a longer sentence (song name). If the band name was rather unusual, let’s say ACDC I had to add this word to my dictionary. If I had a mistake in my song name and I noticed it, I just used command (“Delete” or “Reset”) for changing last word or every word in my song name. After I said the whole name I just told Proto to “Search” the song. And the song started playing.

I implemented couple more commands f.e. for opening and closing browser, opening and closing specific web page, implementing different eater-eggs (f.e. you could re-perform the famous scene from 2001 Space Odyssey between Dave and HALL 9000. HALL 9000 was the desktop picture of Proto), starting different programs on my pc, telling jokes, shutting down my PC.

It was kind of cool and useful. But I realized this wasn’t even near the intelligent computer I fantasized about. It could seem to have some form of intelligence just for people who haven’t programmed every command etc. but it wasn’t for me.

This project wasn’t very realistic but was fun to do and I learned a lot about different topics. After I decided that this project has no future I no longer worked on it. The main problems were that I struggled to combine different program parts together (Visual Studio, Python and Gcode sender), not too much practical and also some minor problems like bad cable management, noisy power supply, not good enough design for camera rotation.

If I was to do this kind of project again I would preferably use only one rotational horizontal axis with less powerful and less noisy motor and some fish camera lens to capture more in the room, I would make specific goals for the assistant to be able to fulfill, I would use designated microcomputer for assistant to be standalone (f.e. Raspberry pi -no more USB camera cable or COM cable). Minimalistic design would be great and also a simplistic portable power supply too. I should have had lesser ambitions for this project because they outgrow me in the end. But after all we learn from our mistakes and experiences.