For this week’s image and video processing project, I decide to explore face recognition tools in a live video capture. Conceptually, I wanted to write a code that could detect a moving face and substitute the face with a random image. I came across a library called OpenCV that allows easy face detection processes. To mark the area where the person’s face is, I drew a rectangle approximately where the face is and substituted it with an image file drawn from an array of many.
One of my biggest challenges was to scale the images to the size of the face captured in the live video to generate a smoother more accurate final output. I scaled the images manually but I am wondering if there are ways to adapt the code based on the size of the face captured.
Golan Levin’s piece on computer vision was an extremely useful read. The piece is divided into various sections, offering instructions, tips, examples and ideas for beginners dabbling in computer vision. The examples were a startling look into what computer vision was like before the technology of today existed. It was mind blowing, for example, to see the interactive artwork Videoplace and realize that it existed in a time when computer mouses were not a staple. The thought is incredible — looking at Videoplace is like looking at history come alive.
The paper also details certain specific aspects to consider while creating a computer vision project that are helpful guidelines for beginners: detecting motion, detecting presence, detection through brightness thresholding, simple object tracking, and basic interactions. The paper also launches a discussion of computer vision in the physical world — an example of which is the Suicide Box — and discusses how objects and events int he physical world can affect how we parse out the algorithm in our computer vision projects.
I also must mention the large collection of resources that this article presented us with, which I think I will continue to peruse and use in my future ventures with computer vision. Overall, an interesting and super informative read!
The Crystallic visualization transforms live video frames into a grid of interconnected areas of distinct colors.
Input frames are sampled at every nth pixel, in both the x and y dimensions (where n is a pre-set constant number, for example 7). The selected pixel’s color is compared to a list of 26 colors; the closest color among the options is identified. Then, the algorithm considers the neighbors of the sampled pixel (where neighbors are n pixels away from the sampled pixel in each dimension). If the neighbor has the same identified color, a line is drawn between the two pixels. This produces white-space boundaries between the distinct color bands identified in the frame.
It is possible to change the look of the visualization by selecting a different set of clockwiseExtraX and clockwiseExtraY. The different values in these two arrays represent the different neighbors to consider. By removing some values, the visualization considers fewer neighbors.
The visualization can thus be modified to have a square pattern
a slanted-squares pattern
or a drawing-like, diagonal-line pattern
Furthermore, changing the sampling distance changes the granularity of the visualization. This produces a more modern-art look:
However, reducing the sampling distance slows down the visualization; if near-real-time responsiveness is desired, it was determined that the values should not be reduced below 7.
An additional problem concerned the choice of colors in the color palette. Originally, the visualization used only 9 colors – all the combinations of 0 v. 255 for RGB values. This led to visualizations that featured too many flat surfaces; the banding effect was too extreme. To increase the variety of colors, the set of HTML/CSS named colors was considered instead. However, since this palette contrasted the “extreme” colors (using only 0 and 255 in RGB) with two non-extreme colors (orange, and rebeccaPurple), the two non-extreme colors proved closest to too many sampled colors. The result was an over-abundance of purple in the output.
A solution was to return to the constructed palette, increasing the number of different combination to 27 by adding a third RGB color level. Thus, again, each palette color should have an equal slice of the sampled color space. This was still not optimal, however:
There was an overabundance of gray in the output visualization in bad light conditions (which means, basically, all the time), causing the person’s face to blend with the background. Removing the gray color from the palette proved to be an appropriate solution to the problem; thus, the final number of colors in the palette was reduced to 26.
I liked Golan Levin’s overview of the techniques for computer vision. His exposition allowed me to look at the complex problem with new eyes, and made me realize that simple algorithms may be used for a complex effect – frame differencing, background subtraction, color tracking, and thresholding; all of which we have mentioned in class. At the same time, I liked Levin’s mention of the state-of-the-art techniques, and everything in between. I felt like that provided perspective to the field and showed me that despite its accessibility, computer vision can also answer some complicated questions. (Consider the question of gaze direction detection – not only does it require tracking of one’s pupils; the orientation of the face in 3D space is also required, as is some notion of depth in the field of view.)
I learned the most from Levin’s emphasis on the importance of physical conditions when using computer vision. His insistence that the assumptions of the different algorithms be taken into account when designing the interactive art piece made me realize how prevalent these problems are. At the same time, it illustrated how impossible-to-solve software questions (e.g. how can I know whether this dark spot in the frame is a person’s hair or a black area on the background wall that just happens to be next to the person’s head?) can be solved by preparation of the scene (e.g. perhaps just use a green screen behind the person. Or the person can be illuminated by sharp light and stand in front of a black wall.).
I have one complaint about the article – despite all of its talk about bringing a fresh, artistic, set of perspectives to computer vision, four out of the six examples revolve around surveillance. Although it is an important topic – and perhaps very natural, given the fact that computer vision systems must necessarily use a video-recording device – I would have appreciated to be exposed to more variety, to get my creativity going in more directions than just surveillance.
For this week’s assignment I wanted to work with color tracking in Processing. I wanted to alternate objects that a person is supposedly holding in his or her hands and initially I thought of substituting an object of a certain color with a respective picture of another object. However, I didn’t even need to substitute the colored object, I could just display the image in a certain distance from the object instead. I wanted to create the illusion of being able to hold and move around different animals, therefore in my project I have 5 animals each appearing on the screen when there is a certain color present (the colors I used are pink, blue, green, red and yellow). These are the steps of creating my project:
I found pictures of 5 different animals and resized the pictures in Photoshop to approximately 200×200 pixels to make the animals smaller.
I cut thin slips of paper in 5 different colors – one for each animal. I then printed out the RGB values of these colors from the web cam so I could code it for each of the animal. Once the sketch is run, it displays the image from the computer’s web cam. When there is one or several of the five colors present in the range of the web cam, the respective animal shows up on the screen following the object of the color they are assigned to. If this color is not present, then the animal doesn’t show up either. If a person is holding a colorful slip of paper, he or she can then move it around the screen to make the animal follow it and therefore control its motion with his or her hand.
One of the challenges was determining the right threshold value that is the difference between the pixels from the camera and the coded color. In my case this difference has to be very small in order for the animal to show up, otherwise it can get confused and start showing the animal where it is not supposed to show up. However, that also means that if the lighting changes significantly, the RGB value of the color of the slips of paper appearing on the web cam might also change and the animal might not appear.
Here are pictures of the animals that show up on the screen depending of the color:
Here is a video of just one animal moving around the screen:
P.S. Because at first the code for creating what I just described didn’t properly work, I started working on a slightly different idea. Even though Aaron helped me fixing the code above (thanks for that!!), I decided to also include the other code. The idea behind it is that there are also 5 animals uploaded to the sketch, however instead of following precoded colors, the color of interest can be adjusted by a mouse press. Once you press the mouse on a, for example, pink object, it will then follow the pink object. Also, there can only be one animal present at a time, but they can be changed by pressing the key “c”. There is a random function that randomly displays one of the images of the 5 animals.
I have decided to work on live video for this project, which was mostly inspired by the Computer Vision article as well as I just thought that I can do more things with live video rather than a still image.
My initial idea was to track a certain color, say the color of the lips, and then once those pixels were detected, I would change the color of those pixels so that they affect live image. So instead of having pinkish lips, I would be able to change the color to orange, green, or whatever color I would choose in the live video.
Doing this, I faced one problem with the pixel color detection. Because the lips are pink and somewhat similar to the skin tone and the rest of our body, it is very tricky to get just the pixels of the lip color selected.
For example, if I pick just one certain color of the lips and leave the threshold for searching for the similar color in the image as very small, lets say the threshold is 5, then it would select just a tiny bit of the lips:
And if I increase the threshold to 25, it would select a lot more than just the lips:
So this made me give up on this idea just because I realized I would not be able to reach the level of accuracy I am looking for. Almost instantly then another idea came to mind, which also involved tracking color, but now I was using tracked color for a different purpose.
So the idea is that the program tracks light green color, which in my case is a cap for a pen, and draws a point on every pixel that it finds close to the green color within a threshold of 20.
Then, on button press, the program saves the coordinates of those circles, which allows the person who is in the video to draw shapes or whatever (s)he wants while the live video is recording. You can change the colors as well!
This is one the drawings I’ve made, which was fun, but I forgot I had to record a video, so I tried replicating it in the video once again and just having a little fun with it.
I would say that the biggest challenge I faced was figuring out the arrays for colors in a way that when the color is changed, it is changed only for something that is about to be drawn, rather than changing the color of everything that has already been drawn.
This reading was very inspirational as it gave me a lot of ideas to choose from when it came to deciding what I wanted to make for this project. After reading Computer Vision in Interactive Art chapter I knew I wanted to work with live video and tracking, be it tracking color or brightness or movement or whatever I could possibly imagine tracking.
The elementary computer vision techniques mentioned in the article, which are detecting motion, detecting presence, detection through brightness thresholding, simple object taking and basic interactions, gave me a better idea of what I could use in my project. All of these techniques were well explained which gave me enough understanding of what I should be aiming for in my work.
The one project I was impressed by the most was the Suicide Box by the Bureau of Inverse Technology installed in 1996. I did not think that a computer vision project, with the help of machine-vision based surveillance, could have such a big social impact and cause ethical controversy. At the same time, this project was a proof (if the data of the amount of recorded suicides was real) that machines can record data with the accuracy that humans cannot.
When working on my project and color detection, I was suggested to watch a YouTube video of Daniel Shiffman explaining the algorithms of color tracking. And he also had this article on Computer Vision open throughout the most part of the video, and even referenced it a couple of times, so this definitely helped me understand what Daniel was talking about!
For the Processing meets Arduino assignment I’ve decided to expand on my jumping ball game project and add a joystick to it. Thus, instead of being controlled by arrows on keyboard, the ball is now controlled by the joystick. This was fairly simple, since the ball’s movement is only controlled on the x axis, however, it took me a little to get used to figure out the adjustments I had to do in terms of screen width/canvas width so that the ball moves just like I want it to.
Another thing that was a sort of an easy fix but took a little while to figure out was how to restart the game on button press rather than restarting the whole processing sketch. The first idea was to restart the port connection on button press, but then an error would pop up and freeze the whole computer, because reestablishing the port connection would not work. The second thing I tried was rerunning the setup void (on button press) again once the game is over, however, that would crash processing and freeze the computer as well. Then I got lucky because Pierre walked into the room and suggested something very simple that I have not thought of for some reason. His idea was to create a gameRunning Boolean that would start the initialize void and the draw void. Thus, rather than restarting the whole setup void, I would just have a gameRunning Boolean on button press, and then depending on that the initialize and draw would either run or not, which will then be connected to the gameOver Boolean, and together, if game is over and game is not running the button press would restart the initialize and draw loops to start the game again.
As some improvements from the last time I’ve presented the game, I’ve added a couple of things. First of all, the platforms are now fading in rather than just suddenly appearing out of nowhere. Then I’ve also added the “Welcome” screen (I forgot to change ‘lol’ from when I was trying out if it works, and now I think it is just a part of the project), and the “Game Over” screen.
Before coming to this class I was not exposed to any kind of programming or coding whatsoever, so it definitely was challenging for me at first to understand the concepts of coding and how things work in general. However, as I was getting more and more into it, I started liking what I was doing, because I started seeing connections between things we do in class and things that surround me in my everyday life. I started looking at things around me with a different perspective, things as simple as light switches and as complicated as some computer games would make me think “khm, I sort of know how to make it”, which was an amazing feeling. I am not sure if it had necessarily “made me a better person”, but I think the exposure to this new world of programming and building things in this class has changed the way I look at everyday objects and made me start appreciating them more.
Computer vision is a dynamic field of computer science that is actively improving our lives and the technologies we use. Accessibility of computer vision is becoming more and more important to people who are working on projects that utilize it. While working on my project this week – I found so many libraries that allow people to do things like facial recognition with just a few lines of code.
This paper showed the various ways creative coding and computer vision techniques intersect to be able to create endless projects. Computer vision seems to really help create an interactive and immersive experience. This reminds me of the readings and conversations we’ve had in class about making interactivity about more than just touch. Now we could use anything from blinking, smiling, or other facial expressions, or even movements, and the colors around us in day to day life to create an experience that is interactive in non-conventional ways.
One popular application of computer vision is the snapchat filters that have become wildly popular. For many teenagers, computer vision has become a part of their daily lives without them even realizing it.