The project converted a photo portrait of Marilyn Monroe into an image with Andy Warhol’s silk-screening printed effect. There were two approaches developed, one being portrait specific while the other being more general. In those approaches, image processing methods and filters used include conversion among RGB color images, grayscale images, and binary images, thresholding, and median filter.
The “low resolution memory series” project is about people’s remembering and forgetting. Through an interactive dynamic pixelated process, I would like to continue the exploration of a low resolution memory world.
There is an artist called Jim Campbell, famous of his sculptural LED Light Installations. I like his idea of low resolution. We live in a high resolution world and strive to make the image of entities clearer; the real impression, however, stays shattered in our minds. In memory, we grasp the silhouette and feeling, and let the details elude and fade.
My work conveys the idea by pixelating images to different levels dynamically, and let users interact with the interface to somehow control the pixelating level and area. 1. Forgetting process: Make an image become more low resolution during time. Let the detail information be lost step by step. Show this process as an animation. 2. Remember process: Let the user control the area of respectively higher resolution through interaction. A metaphor that what you focus on will be relatively clearer and less likely to fade away.
In my final project, I used processing to create a sound generator which will perform the Inverse Fourier Transform on the spectrum drew by the user.
Since I didn’t found many sound analyzing library in processing, I decided to use minim. I used the x-axis to present the index of the frequency and the y-axis to present the amplitude. While I drawing the curve, the y-value of the curve while saved in an array which has 1024 elements. 1024 is also the width of the windows generated by the windows. Because in minim library, there is no float number index. And after I finishing drawing the curve, I should press “x”key on the keyboard to performing ifft based on the array I just generated and play this sound. I also changed the amplitude while playing the sound to realize a fade-out effect.
In the computer vision part, I choose the color tracking strategy to track a color on the screen. Basically, I set a color I want to track first and use two for loops to go through all the pixels on the screen, calculate the RGB value of that pixel and using dist() function to calculate how much the color of that pixel is similar to the color I am tracking and save the position, which is x and y, of the most similar color. To eliminate the interference, which is the similar color in the background, I saved background as an image and compared the color of the pixels on the background with pixels of run-time video. If the difference between two colors is smaller than a threshold, that pixel on the video will be regarded as the background of the video, and I will display black on this pixel.
Within the Ecology, Evolution, and Marine Biology department at UC Santa Barbara, Douglas McCauley’s lab researches wildebeest in the Serengeti. One aspect of this research is understanding the movement of the wildebeest aggregate. To do so, they have fixed GPS-equipped collars on a number of the wildebeest in the wild and get a fix on their location twice a day. This collection of GPS data began in 2013 and is ongoing. Soon, they will be implementing sound recording devices on the collars that will collect small audio recordings each time the GPS gets a fix on the collar.
The addition of sound data into the research opens up a wide variety of analytical possibilities. In this project, I conducted a pilot study to determine the feasibility of using human voice recognition methods to identify individual animals within a group.
In this study I recorded audio from my neighbor’s three dogs (Milo, Eddie, & Britannia) in conjunction with video recording to establish a ground truth (i.e. confirm which dog is making which bark). Then, based on the methodology presented in by Muda, Mumtaj Begam and I. Elamvazuthi (see below), I created an algorithm that utilizes Mel Frequency Ceptsral Coefficients (MFCC) and Dynamic Time Warping (DTW) to compare the barks of each dog and identify the source dog for each sound.
The resulting study was very successful in identifying the source of each bark. The challenge when implementing this study with the wildebeest (in the field) is the lack of control I will have over the recording process. Unlike my study, the recordings that will come from the collar recording devices may be very noisy, contain overlapping sounds from animals, and will be fairly short in duration.
Given this limitation, my goal for the full study will be to determine less specific features and not a particular, individual animal (i.e. distinguish between adults, calves, males, females, and non-wildebeests (i.e. Zebra)).
This project explores the Image Lossy Compression algorithm used in JPEG, which is called Discrete Cosine Transformation. Firstly, It explains how DCT works on image compression. Then, It tries to replace the DCT by FFT that learnt in class. And then It also evaluates these two methods and compare the performances and tries to analyse the reason why JPEG use DCT instead of FFT. At last but not least, It tries to explore the left-out coefficients in DCT.
The goal of this project is to explore using linear predictive coding to implement a vocoder-like synthesis algorithm. Linear predictive coding (LPC) operates under the idea that the human voice can be modeled as an excitation source (i.e. the vocal chords) run through a filter (i.e. the throat and mouth cavity). LPC attempts to predict the value of a sample based on a linear combination of the p previous samples. In this project, I am using p=12. The LPC algorithm chooses an approximation of these p coefficients by finding the coefficients that minimize the mean square error between the original sample and the predicted sample. These coefficients can then be used to construct an FIR filter that models the formants of the signal. Running a constructed excitation source through this filter then recreates a rough version of the original speech signal.
In my project I will be taking two audio samples and will be applying one filter for each of my three projects. I am doing two audio samples for consistency and three filters for exploring how different image filters can effect aduio sound. The process of each Project is import a wav file then convert the file into a image. After conversion I apply a image filter and kernals on the image. After, I then convert the filtered image back into a audio file thus bringing the result. After I have obtained my audio samples I will then use them in my composition.
In this composition I utilized the audio files that I made from image processing plus the two sound sources. One of my goals with this piece was to experiment with depth. In the very beginning I used the Fox theme song with no effects. Though out the piece I use spear to manipulate the sounds. I also use several effects such as reverb, flanging, echos, delays, limiters. My favorite section is at 54 seconds due to all of the frequencies dancing together and the heavy delay. Then after the tranquil delay fade out its suddenly interrupted by another section which I used a sequencer to chop up certain sections and I used spear to try make a instrument with the 2a sample (by 2a I mean second project first filter effect. This piece was definitely out of my comfort zone but I felt like I learned some new techniques especially with the vst LA convolver. There were several times where I would convolve 2 signals together (especially at the 54 second mark) to form a more complex sound.
Image retargeting is the problem of displaying images without distortion of content when varied sizes are needed. A variety of displays are currently used to view images and hence their aspect ratio and size needs to be changed adaptively. In some cases it is necessary that resizing should not only use geometric constraints but consider image content as well. The algorithm for Content aware image resizing solves this problem. The idea behind the algorithm is to locate the optimal seams in the image. A seam is an 8 connected path of pixels on an image travelling from top to bottom (vertical seam) or left to right (horizontal seam). A seam consists of only one pixel per row or column. Content aware seam carving is sensitive to image content and hence gracefully carves out or inserts pixels in an image to provide effective resizing.
This includes a Processing sketch, which shows a visual part, and max patch for the sound mapping. In Processing, the left screen shows the original image that I will read. The right screen displays the scanning spot from the left. It consists of a single spot, vertical and horizontal mode.
Single Pixel Mode You can choose a single pixel of the image and get RGB values.
Vertical and Horizontal Mode It detects 480 pixels at once, so you can have 480 RGB values per column or row.
Interaction The knobs above the screens change the percentage of RGB values and the reading speed.
Spacebar: Move or stop cursors. R: Change direction. . : Single Pixel Reading. Up or Down Key: Vertical Mode Left or Right Key: Horizontal Mode
- Max Patch For the sound making, Processing sends RGB values to Max/MSP, and I converted them into HSL (note, volume, pitch scale), but I removed the connection of saturation and lightness for the demonstration. In terms of hue value (0 to 360), I mapped it from F to G#. and then I converted the midi notes into sound frequencies. In the case of vertical and horizontal mode, I divided an image width and height by 6 and calculated mean values and lastly combined six mean values to synthesize sound.
IMAGE FILTER DESIGN FOR GLASS STAIN EFFECT The project objective is to develop a image filter to produce a glass stain effect (which is a bit similar to a bokeh effect applied on images ). The following are the steps to achieve this objective : Segment the image into sections of size blocksize and call each section a region Generate a random x,y co-ordinate in every region , these are the vornoi points Using one of the distance algorithms ( euclidean, chebychev or manhattan ), assign every pixel a region. These regions are called vornoi regions. Assign all the pixels in one region the same color . The color can be dervied by the mean , media or mode of the pixel color values in the region. p_x,p_y - co-ordinates of every pixel v_x,v_y - co-ordinates of region vornoi point EUCLIDEAN DISTANCE : sqrt((p_x-v_x)^2+(p_y-v_y)^2) CHEBYCHEV DISTANCE : max (|p_x-v_x|,|p_y-v_y|) MANHATTAN DISTANCE : |p_x-v_x|+|p_y-v_y| (looks like road structures in Manhattan ) REFERENCE
Photographic mosaic, also known as Photomosaic, a portmanteau of photo and mosaic, is a picture that is divided into small sections. When viewed as a whole, it appears to be one image, when in fact the image is made up of hundreds or even thousands of smaller images. [Cartwright, Angela (2007)].
Though many photo mosaic software exist, such as MOSAnICK, Andrea Mosaic, Mosaizer Pro, Mosaically, EasyMoza, etc., the goal of this project is to develop the simplest but effective photographic mosaic generating algorithm using python, which can be embedded into web applications in the future.
Component image resize and preprocessing Feature extraction Feature match (Three values (H, S, V) of the absolute difference between the value of blocks of the mosaic image and its component images. The match is achieved through compute a overall score equals to the weighted sum of different type of features) Tuning the weight coefficients to maximum the effectiveness of the matching.
This final project focuses on establishing a connection between visual perception and audition. Two methods are involved. One is generating piano pieces from images. The choice of chords, the arpeggiation or chords and the tempo of music are based on the brightness of images, and the main melodies are based on each pixel of images. Another method is generating piano pieces from video stream captured in real-time. This method detects movements in video and plays piano keys accordingly.
Gaussian Blur Filter is very popular filter and be applied in various applications, which has strong blur effect. However, it blurs everything. The Bilateral Filter is a filter based on the Gaussian Blur Filter. It not only considers the Euclidean distance, but also the photometric similarity between two pixels. So it smooths images while preserving edges of images.
For my final project, there are three major components. First of all, I tried to use python to implement Bilateral Filter by myself. This part includes the most basic functions of bilateral filter. Then, I did several experiments inside the filter, such as “Glow Effect” “Edge Selection” to find some interesting results. For the last component, I used the bilateral filter function in OpenCV to realize this filter real time in computer vision area.
The goal of this project is to free the small or mobile digital cameras from the limit of its size to take attracting pictures with shallow DOF just as those professional DSLRs do. To do that, image processing of low-pass filleting is utilized to blur different areas in the image, thereby simulates the out-of-focus effect. The extent of blurring of each pixel is proportional to the distance between it and the focus location. Therefore, taking a flat image with deep DOF and its depth map as input, a visual-appealing image with shallow DOF can be generated through the whole process.
Download the notebook and materials:
- Change the name of the notebook to include your name and add your name as a comment on the top of the notebook body.
- Complete the missing parts of the notebook.
- At the end of class submit the jupyter notebook file only.
due: Mon February 22th
Submit in writing your final project proposal. Provide details on the goals, the deliverables and the technologies to achieve them.
Steven W. Smith, The Scientist and Engineer's Guide to Digital Signal Processing. Chapter 6: Convolution. Available from www.dspguide.com.
- Mrinalini Anand
- Jacob Burrows
- Qiaodong Cui
- Atakan Gunal
- Hilda He
- Mark Hirsch
- Woohun Joo
- Jingxiang Liu
- Lulu Liu
- Weihao Qiu
- Ehsan Sayyad
- Yitian Shao
- Rachel Sowa
- Ambika Yadav
- Jing Yan
- Junxiang Yao
- Zhenyu Yang
due Tuesday February 9th
Use cross-correlation (or auto-correlation) to find features or similarities/differences in images. Treat this as an exploratory excersice where you bring in different things to compare, and then analyze/discuss the results. You should also explore analyzing the cross-correlation output matrix to extract the highest/lowest values, in a way that makes sense with your particular material.
due Thursday February 11th
Download the three HW4 audio files below. For each clip, compute the DFT of the entire file and then identify the index of the most prominent peak in its magnitude spectrum. The audio files are encoded with a sampling rate of 44100 Hz -- what frequencies (in Hz) correspond to the bins with the most prominent peaks?
Hint: use the fft.rfft and argmax functions. Compute one real FFT with frame size equal to the number of samples for each file.
due Thursday February 18th
Using the same audio files from HW4, compute each clip's autocorrelation function and use it to identify the most salient frequency component (in Hz) in the signal. How do these results compare to your findings in HW4?
Hint: use the acorr() function and use the lags associated with the most prominent peaks in the autocorrelation output to calculate the corresponding frequency (recall that these files are sampled at 44100 Hz).
Note -- If you're not familiar with command line interfaces, these instructions might not be complete enough. If you don't feel comfortable at the command line, please take a look at the following primers:
Furthermore, these instructions are biased toward OS X. If you are a Windows user, please email me ASAP and I will point you to resources tailored to that system. Linux users will follow similar steps but using the built-in package manager instead of homebrew.
- Open the terminal application
- Download and Install homebrew by running the command
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"(optional if brew is already installed on your machine). Note -- if you're in a Linux environment, you can skip to step 8. Just use apt-get or the package manager of your choice)
- Run the command
- Run the command
brew doctor. Address any errors as necessary (see me for help)
- Run the command
brew install python
- Run the command
which python. We expect this to output
- Run the command
echo $PATH. The $PATH variable determines the order in which the command line will search for executables -- we need it to look in the folder brew uses before the default system path, this way we'll be able to use our more up to date version of python that we just installed. Make sure
/usr/local/binappears in before
/usr/bin. If this is not the case, we need to add the line
export PATH="/usr/local/bin:/usr/local/sbin:$PATH"to your
~/.bashrc(or create/modify a bash profile file to modify the path variable).
- Now we're ready to install ipython and the ipython notebook. Run the command
pip install ipython, then
pip install jupyter, and finally
pip install pylab. Though we may need additional libraries later in the course, these should suffice for now.
- Test to make sure all this stuff worked by running the command
ipython. This should open the interactive ipython command line interface.
- Let's take a quick break.
- Now it's time to install git and download the course's source code, which lives inside ipython notebook files. Run the command
brew install git.
- Git is a widely used version control system, this is a super important tool for software development, but for now we're just going to use it as a convenient way to download some files. Andrés has a public repository containing the course materials hosted on GitHub. Navigate to a folder you which to use to store these files and run the command
git clone https://github.com/mantaraya36/201A-ipython.git. This will download all the files into a (special) folder called
201A-ipython. I say the folder is special because it is a repository and thus contains a hidden folder called
.git. Don't delete that folder! If Andrés updates any course materials in this repo, you'll be able to easily download those updates by running the command
git pullwhile you're inside
- Now that we have local copies of those files, let's test out our python stack by opening the ipython notebook for this workshop. From the
201A-ipythonfolder, run the command
ipython notebook Python\ Basics.ipynb. This will open a browser window to view the notebook. Now we can play around with some code in the nice browser interface instead of the command line.
- Time for another break
- Time permitting, we'll run through Andrés' Python Basics notebook and address any basic questions you might have about python and the other tools we've covered so far.
due Tuesday February 2rd
Use image data as the spectrum on which you perform an Inverse Fourier Transform. i.e. Load an image and use the pixel data as FFT bins in an STFT, then using the IFT, produce audio from it. The main complication is segmenting the image pixels into the right size for the IFT. You can interpret the pixels as the magnitude spectrum, or as the real and/or complex part of the complex spectrum.
For Tuesday January 12th
Steven W. Smith, The Scientist and Engineer's Guide to Digital Signal Processing. Chapter 3. http://www.dspguide.com/CH3.PDF
due: Tuesday Jan 26nd
Produce a soundfile from image data or vicecersa. Try to condition and select the data to make the results as interesting as possible. Hand in your results as an ipython notebook in nbviewer.