Graduate Thesis: Weekly Reflections

12 min readFeb 8, 2024

Feb 5 — Preliminary Prototype

As an extension of my research into the relationship between music and visual art from last semester, this work will continue to explore how music can impact the visual from an emotional and/or reactive standpoint.

To create a prototype of this concept, I had to consider the ways that relationship between the musical and the visual can be represented. The most intuitive method came to me in the digital form, and, more specifically, the form of a Youtube video. What better platform to draw inspiration around manipulations of digital footage and video than Youtube. To visualize this, using an iPad, Apple Pencil and Procreate combination, I sketched out a wireframe of what this representation could look like:

In this sketch, options for musical and video selections are displayed across the page for discovery and exploration by the user. Similarly to Youtube, a frame responsive to selections by the user is placed on the right side of the page. Here, different combinations of music and audio will be displayed. Great! However, I wanted to add a component of emotional responses to the combinations, as emotional impact is the core of the research goal.

To do so, I sketched out a section to display various emotional reactions to be experienced and selected by the user.

Cool. Now, as I thought through the concept of the prototype, I realized that to deliver the impact I want from the prototype, I would want the user to gain knowledge and exposure to what the various combinations of music and visual art — so they wouldn’t be doing the reacting, the prototype would. So instead, I removed the selectable emojis and added a singular emoji that would respond to the various combinations of selections with a variety of emotions:

Great! Happy with that! Now the fun part! What selections of the visual and the musical would be involved in this research? This is where things get tricky. In the previous semester, I’d studied what this looks like in the context of television, but for this prototype, I’m more inclined to move towards the neutral — visuals with no previously intended or attached emotion.

Content Details

10 Audio Clips
5 Video Clips
8 Emotions (as detailed in sketches above)

Using Photoshop, I put together a board of various visuals I’d like represented visually.

What do the combinations of visual and audio evoke from the Emoji?

The next steps of building out this prototype will involve finalizing the options for visuals as well as audio and mapping out appropriate responses!

Feb 12: Reflecting on scope!

Write about how your work fits in with other work in a similar vein. How is your contribution unique? How does it build off of other’s work?

Exploration of the relationship between music and art isn’t a particularly unique endeavor. The combination of both art forms is especially present in film, television, theater, dance, concerts, and countless other immersive experiences, fused together in efforts to evoke emotional resonance from an audience. Such an endeavor, no doubt, ultimately benefits from the symbiotic nature of the relationship between music and visual art. But how does exactly does it do so?

There are a number of researchers who have posed the same question! My research has involved parsing through numerous scholarly abstracts, papers, academic journals, publications, etc that have detailed scientific and empirical data surrounding the topic of music’s influence on visual and art and vice versa. The purpose of my research is not to mimic these research findings, but to put these findings to practical use.

The intended effect of my research is to provide practical tooling for deciphering the true impact of this relationship. To do so, the prototype will provide capabilities for the user to record emotional responses to multiple responsive combinations of visual art and music, as well as gain insight into the responses recorded by others (my initial thoughts were to focus only on recording responses, but I received some helpful feedback from my group this week that inspired me to add this functionality).

Feb 19: Demo Day Prep

Final Prototype Checklist:

an interface
musical sources
visual sources
data collection algorithm
emoji designs

Demo Day Checklist:

an interface
data collection algorithm
emoji designs

For demo day, I would like to have a working interface — this will be a bit of a challenge because ideally, this interface would have it’s own domain and would work as a responsive, high-latency single page application. However, building out an entire site including the UI and the backend is borderline impossible to do in the timespan we have. So I will have to find an interface that is able to be responsive to user flow, as well as the latency to handle audio/visual files. Figma is currently a top choice, however I will have to do further research to make sure this is the most appropriate tool for me.

Continuing in that vein, the interface will require explicit commands for how to respond emotionally to the user choices and return previous user data. This means creating an algorithm that will allow user responses to be recorded, as well calculating the aggregate of their responses to feed back to the following user. To do this, I’ll have to also figure out a way to store the user data that can speak directly to the interface selected (currently unsure if Figma has this functionality…).

Finally, I’d like to have all specified emotions and their visual representation (currently planning on designing these in Photoshop/Procreate). For demo day, I hope to present a demo that provides the core function of the application, allowing space in the following weeks for deeper introspection surrounding the actual content that I’d like to convey.

Feb 26: More Demo Day Prep

I found this portion of the process to be pretty daunting. A wireframe always seems ambitious in retrospect, especially when it comes to coding and programming which often involve solving for more edge cases than initially expected.

As I mentioned in my previous article, the biggest question I wanted to answer when building up my interface was which platform would work best for the intended function. My initial plan was to investigate the functionality of Figma, but I found Figma to be extremely limited in the following ways:

Unable to store user input
Limited display plugins (this was particularly an issue as my wireframe involves a carousel that would not have been easily been created)
Unable to track current state of interface at particular points in time.

Considering these limitations, and also considering my familiarity with programming, I decided I would spend less time attempting to squeeze Figma’s abilities into my project than using an open interface to build out my own.

Steps I followed to build out my own interface:

Download React and NPM packages
Install video and audio carousel packages
Document links to selected audio and video files (static)
Arrange components on local server using Javascript

For the demo, I was able to put together this interface with the static files and have dynamic user selection across and display the basic idea of the experience of cross modality.

March 4: Actual Demo Day!

For demo day, I was able to present a working interface, displaying my original concept as detailed by my wireframe.

As discussed in my last article, a major challenge that I had to overcome was attempting to build out this interface from scratch. Although I’d initially imagined a used case for Figma to host the interface, I found the features available on Figma limiting and not necessarily the best option for my intentions to build out this interface.

To sidestep this, I downloaded already available JavaScript packages and used them to build out the carousels that I intended to display the media sources i.e. audio and visual files. I was also able to utilize React to design the placements of the carousels as well as the controls to catalog emotions and reactions to the cross-modal experience.

Overall, I would say the biggest limitation here for building this interface was ultimately time, as building an interface from scratch will require that. However, I am happy with the final results as I feel I was able to communicate the basic function of the interface as was my initial intention. For demo day I expect to receive questions about the intended function and audience for the project and I am looking forward to an open discussion about the best use case for the project. I have ultimately gone back and forth between how I want the experience of the project to be utilized, as a learning tool or as an engagement tool, and I’m hoping to solidify this based on feedback from my audience.

March 18: Demo Day Feedback

The feedback I received revolved around intention and audience. For the project, there are multiple ways the tool can be used, either to simply collect data surrounding emotional resonance or to utilize that data to inform the audience of the effects of a specific cross-modal experience. In this scenario, the audience would fall more within the realm of creators in media spaces such as film and television, ie producers, editors, composers directors. To capitalize on this, there were suggestions to include an element of data processing using concepts such as machine learning to predict the emotional outcome of specific inputs. There were also suggestions to look into further ways of integration into other tools already equipped with the ability to display video and audio sources (Youtube, editing tools like Premiere Pro, etc).

March 25: Reflections…

The feedback I received on demo day aligned with my thought process when considering the state of my project. My main considerations involved the audience/purpose of my project, which was addressed during the demo. From the feedback, I’ve decided to cement the initial direction the project was going: to serve as a guideline for cross-modal viewing experiences.

To do this, instead of simply allowing users to record their emotion, I would like to add a level of learning and adaptation to the tool that will allow emotions to be predicted based on user selection. This way, users can gain insight into the effects of certain visual and audio effects that can then be transferred to other projects. Initial thoughts here on how this can be achieved… machine learning APIs or algorithms based on input frequency that can be used to process responses in a backend. This is going to be difficult as I’ve never worked with machine learning before! But I’m excited to learn. Next week, I will look into the sort of algorithms that will help me achieve this.

April 1 — Structuring My Paper

Over the next week I will be focusing on outlining my paper and wrapping up my final prototype for the project. I have started thinking about what the paper will look like from a literary analysis standpoint. Last semester, I got to dig through some literary sources that gave me some insight on how to structure the paper, but last semester, I was primarily focused on television whereas my project has pivoted over to focusing on film as a lens. I am still struggling with how I would like to pitch my target audience and tie it to the literary analysis — this will likely be my main challenge with the paper.

April 8 — Prototype II

Fun news! I’ve decided to pivot slightly with the emotion source control of my project. As I began to dig through literary sources, I found one too many studies that were based around collecting user emotions to visual scenes paired with music that made collecting this same data redundant and overdone. This also lines up with the feedback I’ve received and documented previously about a disconnect target audience and use case. Instead, I’ve started playing with the idea of using AI generating audio sources based on user selection, thus taking the onus off of users to do the work of the app and handing it to AI models.

The interface will stay similar to the original model. However, I’ve begun to look into AI models to accomplish this goal. Some models I’m currently considering include One AI, Assembly AI, and Suno AI. I have also set up a Digital Ocean account and deployed my first instance of Sonic Palette. It is now available at — https://sonic-palette-nw99k.ondigitalocean.app/

This is just a site hosting platform to host the interface so others can access the project off my local computer. I’ve currently signed up for a free trial and will have to look into what the cost of maintaining this site is once my project is over.

April 15 — Getting Closer

This week, I furthered my research on AI models to use. I have landed on Suno AI and Assembly AI as my models and as the support for my algorithm: the interface will send visual sources to Assembly AI to transcribe (so that means creating a text summary of the video) and use that transcription to send to Suno AI to generate a song fitting to the summary. I have attached here the wireframe of this architecture:

I’ve started work on setting these up. The Suno API is a little tricky —they do not have an official API, so a kind soul has set up an open source unofficial API here — https://github.com/SunoAI-API/Suno-API

It involves setting up a local instance of FastAPI where the interfafce will interact with, whereas Assembly simply requires an npm install assemblyai module that does all the work. Suno is still pretty new so this makes sense that they still don’t have an official API. To get the API working, I had to make sure Python and NodeJS were updated on my computer as the API is Python based.

April 22 — Wrapping Up…

This week involved updating the interface visually. The use of the AI models has been set up and working well. The UI is still a bit janky. I spent this week cleaning it all up to prepare for our final presentations next week. Here is the final interface:

April 29 — It’s Time!

As my project is technically sound, I spent the week working on my paper. I pulled an almost entirely different set of sources than the ones I explored last semester. This is because I decided to approach the paper from an evolutionary standpoint, so going back to the days of early cinema until today; this is relevant because Sonic Palette is a lens into the potential state of the dynamic between sound and image in the modern age.

My final presentation to the class went well! I received feedback that my project has (thankfully) come a long way since Demo day. This was good feedback for me because I feel justified in pivoting the design and functionality of Sonic Palette from a data collection tool to a tool with purpose and use. I also believe it supports the literary analysis in my pepr well because it provides a space to observe the dynamic between sound and image in real time.

May 6 — Last Day (Showcase)!

The end of the semester and grad school! Wow, it’s been real. We, as a program, presented all of our final projects at the IDM Showcase where we got to see other projects and interact with others in the program. It was a really exciting day, was so fun and inspiring to see what others have been up to, and I got a lot of positive feedback ffor my project. I’m happy with where Sonic Palette is and have dreams for where I could hopefully take it!