Why soundtracks for rhythm games require a different approach.

In the fall of 2019 I was part of a team that researched rhythm games and their design. Throughout the project we researched a number of existing rhythm games, and compiled that research into design documents as well as some of our own prototypes. While there were discoveries to be had for every discipline, one of my personal audio discoveries came in the beginning as we were creating our first prototype, Gang Beats. Turns out, composing soundtracks for rhythm games is not the same for more standard kind of game. Let me explain:

The “battle music” of most games lets you know there are enemies around, but doesn’t necessarily tell you where they are or how to defeat them

Musical soundtracks in most games act as an extension of the story, the environment, or the character at that point in time. Listening to a cutscene as Link explores a new area helps enforce a sense of awe. Battle music in Skyrim let’s the player know that there are enemies near by. An edgy soundtrack helps Need for Speed racers feel more “in the game”. However, due to this passive or reactive function, “standard” game soundtracks are rarely at the forefront of promoting a kind of action for a player.

This “promotion of action” is key for rhythm games.

Throughout our research, we found that rhythm games are games in which an understanding of rhythm, and being able to act accordingly, is necessary to succeed. You can’t react accordingly to a melody or rhythm if those elements are not the leading gameplay element.

Street Fighter 5, a quintessential fighting game, and one of the more “rock focused” soundtracks

Our first game was a multiplayer fighting experience where you could only attack on a particular beat. When I was researching typical soundtracks of fighting games, I found a lot of them were very fast paced with with some driving percussion and melodies to keep the action going. I figured composing a rock melody with claps at the attack cue points would fit perfectly. Seemed simple enough, right?

Our first playtest went horribly. Our playtesters had no idea where the attack points were, despite our instructions and demos. Simply adding claps on a particular beat of music was not enough information to promote an action from our players, especially those who have very little understanding of music and rhythm. 

So how could we fix things?

Note the distinct repetitive nature of the soundtrack, and how sound effects are a rhythmically related component.

In a rhythm game, action and cue points need to be extraordinarily emphasized, so the accompanying soundtracks and sound effects need to be created to work together accordingly. This is evident if you listen to the minigame soundtracks of Rhythm Heaven. The music for those games much is sparser than standard “accompanying” soundtracks, and is very intertwined with the sound design of the game. Any extraneous notes or “interesting” flourishes could be misinterpreted and could break player concentration. Also, rhythm game soundtracks are highly repetitive. Humans like repetition, and having a repetitive rhythm and melody helps increase a player’s confidence level as they begin to anticipate when their action needs to take place. These key elements are not usually considered when designing more standard soundtracks.

An Example in practice:

So, let’s take listen to the evolution of one of the loops we used. In the example above, you’ll hear the a loop that asks the player to act in a “boom boom PAP (BBP)” rhythm. (You’ll hear Old, New, Old New versions). The old version has the rhythm guitar mimicking the BBP Cue throughout. I had thought that having the music follow the same rhythm would help the players understand the rhythm, however that just made the Cue get lost. In the new version, you can hear the guitar playing individual chords, allowed them to “lead up” to the cue. Additionally, you may notice that the drum line is also less complicated in the new version. Initially, I had thought that using a crash cymbal regularly would help keep tempo, but players were mistaking it for the Cue (that being said, those crash cymbals were brought back in to keep time during the more complex rhythm sections, but only because we’ve built up enough trust with the player in the simpler sections). Also, you should notice the much brighter BBP Cue. A change in instrumentation and some additional effects brightened the claps in the Cue to allow them to “pop” more against the “darker” leaner background music, allowing for clarity of action to the players. These are just some of the tweaks that we needed to make during our overhaul of the soundtrack for this particular prototype.

A short excerpt of Gang Beats, featuring the final revision of the soundtrack encompassing the different “rhythm game" specific” elements we mentioned.

Taking the findings above, in addition to a more predictable repeating structure, allowed the later sonic iterations of Gang Beats to give our playtesters much greater instances of comfort and mastery of the prototype. In turn, those playtesters were more motivated to try the experience again to beat their opponents or the high scores.

Why Translating and Gamifying Jump Rope into The Virtual Realm Didn’t Work Well

Last semester, I was part of a pitch project team at the ETC that spent 16 weeks researching, exploring, and creating rhythm games. Each of the 8 prototypes that we created focussed on answering a specific game design question. Our 7th prototype, “Dutch Double” asked the question: 

“Can we make a rhythm game that simulates the real-life, rhythmic activity of jump rope/double dutch?”

First of all, some background. 

What is a Rhythm Game?

A common understanding according to Wikipedia defines rhythm games as a subgenre of action games. Nice, but we want a bit more clarity. After much more research, as a team we decided that a rhythm game is a game in which an understanding of rhythm, and being able to act accordingly, is necessary to succeed. Great! Now how about Jump Rope?

Jump Rope is one of those games that seemingly everybody knows how to play once they get going. The only major criteria of success is being able to swing a rope around so that you are able to jump over it. Almost all variations of rope movement+jumping technique revolve around that basic concept. The Rope Swinging is a constant motion, meaning it is predictable. Jumping over the rope has to happen at a particular moment to achieve success. Therefore, at it’s core, one can consider Jump Rope to be a kind of rhythm game.

So what happened when we tried to virtualize it?

Phase 1, Research

We set out to make a direct copy to port the experience onto a Dance Dance Revolution pad. A big item for discussion was, or course, timing. This turned out to be much more difficult than we had anticipated. When you physically participate in Jump Rope, your body does so many calculations in the background that you just focus on the feeling of the rope and jump. For the majority of playtesters, the action seemed so natural that almost unanimously there was “no thought” about when participants needed to jump. They just “felt it and did it”. 

A nice video showing proper and improper jump rope techniques

Digging in more scientifically, the auditory feedback that a physical jump ropist gets is quite interesting. There are two sound components: The swish of the rope around the participant, and the smack of the rope as it hits the ground. The Swish actually depends on the handling of the rope by the participant. Different people had different styles, so the swish could be constant, or only at a particular point. We figured that this wasn’t inherent to the experience, so we arbitrarily chose a timing position for that sound. The Smack, on the other hand is critical to the experience. The core of jumping rope is jumping after you hear the smack. More specifically, you have to be in the air at the time of the smack to ensure you are able to clear it. 

Different people have different jumping habits, so the time interval varies between the Smack and the Jump. Some people make big jumps, requiring there to be a long interval, and some people make tiny jumps, making the interval last milliseconds.

Perfect, we have all of our info and made a build. However, testing went completely awry.

Phase 2, The Direct Copy

We talk about Douch Double at 3:17

Using a DDR pad as the control interface to register jumping, we made a “direct copy” of jump rope in the virtual realm. For timing the smack, we tuned the average delay so that it was a comfortable “PAPAM”, where PA is the Smack, and PAM is the land of feet. When we tested it, it went okay, because we knew what to expect. However, Naive Guests were a completely different story: There almost no successful tests

Almost without fail, our playtesters just couldn’t grasp the concept. Which very odd. We spent a lot of research to make a direct replica of the experience, recreating timings and purpose being a huge focus, however playtesters just weren’t having any of it. The longest jump chain we got was maybe 2. So what happened?

Phase 3, Analysis

After looking over all of our playtest data, a couple of things became apparent:

1) Jump Rope is a “Full Body Experience”, and taking away the actual rope removes that feeling entirely. Now with the DDR pad, it turned into a weird timed jumping smulator

2) The act of Jumping varies from person to person. Heights, weights, and styles affect the timing of everybody’s jump mechanic

3) This direct copy completely violates traditional game design handeling of Cues (or action points)

This last point was key. You see, as we mentioned before, a rhythm game can only succeed when a player is able to act accordingly to a game’s rhythm. When a player sees or hears a Cue, they expect to perform an action in response. In real jump rope, it turns out that the auditory cue of the rope smack is NOT an indication that it’s time to perform a jumping action, as you should already be in the air. That cue instead is an indicator of something that is yet to happen, (the rope is about to pass under you). 

Having the player only experience a cue that indicates a delay of an action/event causes confusion and uncertainty for that player, and is a design element that goes completely against the traditional game design of a rhythm game, which is why our example of virtualization with a DDR pad did not work.

Our further design iterations made it so that a player has to Land when they hear the smack cue, and that instantly made playtesters able to attain longer combo chains, but ultimately it turned the experience from a Jump Rope simulator to a creative jumping game. 

Is the Legend of Zelda Link’s Awakening Nintendo Switch remake a faithful musical representation to the Legend of Zelda franchise?

The original Legend of Zelda Link’s Awakening on the Game Boy was my first foray into the adventures of Link. I was taken to wondrous places, fought dangerous beasts, and strove to uncover the mystery of Link and the island. Throughout the many adventures, 8-year-old me was accompanied by the game’s 8-bit score. While limited in its ability to produce a wide variety of sounds by the hardware of the time, my young imagination gobbled up the different melodies and helped transform them into more epic soundtrack versions within my mind.

My favorite track from the game was the Tal Tal mountain range. It was a dangerous area, and required dexterity and proper weapon technique to deal with the enemies and puzzles within. The soundtrack, with its fast repeating bass line, flute-like melody, and driving percussion track really promoted the Epic Adventurer Spirit that Link embodies. I remember frequently traveling to the mountains just to listen to the track. I’d clear an area of enemies and simply lay the gameboy on a table, running the batteries down to death just to bask in the glory of that particular soundtrack. So when I heard that Nintendo was making a remake, I got super excited.

Will it live up to my memories?

Will I be able to relive some of the greatest gaming moments of my childhood?

Yes. But, also very much no.

While updated graphics and mechanics really helped with adding visual eye candy and thematic juice to the game, I have some very hard reservations about the soundtrack direction. Let me explain…

What They Did

A snippet of the “overworld” theme from the Remake at 4:40

The Link’s Awakening Remake is treating the soundtrack in a very unique way. Almost all of the tracks are orchestrated (or written for) a very small collection of instruments, in the style of a small chamber orchestra. This frequently means that each melody line in a track is played by a single (or pair) of instruments. A great example of this is the classic “overworld” theme from Legend of Zelda. The main melody is played by a single violin, which is accompanied by another violin and a viola. A cello takes duty for the bass. It’s a simple, classic quartet. This follows in the footsteps of the limited instrument bandwidth of early NES and SNES consoles, which could only play 3-5 sounds at a time. However, does it provide the desired dose of epic adventure? I’d argue not. 

The original Gameboy Version of the Overworld Theme at 5:32

There is no doubt that the classic Link Adventure theme is essentially preserved in this updated version. The rhythm and melody are very obvious and recognizable. It is the context of the “classic quartet” that is an issue. In the standard classical music world, a quartet is frequently and historically thought of as an intimate instrumentation that finds itself in the intimate settings of one’s home. It has been this way for hundreds of years. If you listen with a trained ear, you can hear that the reverberation on those instruments is very short, reminiscent of a small room. In fact, listening to the majority of the soundtrack in headphones gives off that “small room” vibe. Therefore, I believe that the intimate setting that these recordings provide, coupled with the generally soloist nature of the instrumentation, and the sometimes out of place “retro” electronic bits hurts the sense of grandeur and adventurism that playing a Legend of Zelda game calls for.

What does this mean for my favorite track?

Tal Tal mountain range soundtrack for the Remake apparently has two versions: a flute version, and a orchestrated chamber version that mimics the minimal instrumentation of the rest of the game. 

Flute Version at 1:08:12

The flute version consists of 3 or 4 flutes that follow the general structure of the song, keeping the necessary driving rhythm of the adventure’s spirit intact. However it’s really hard to imagine doing anything epic while being accompanied by a generally weak instrument like the flute.

Orchestrated Version

The orchestrated version is quite different. We hear the addition of a few more instruments than expected, including a bit of strummed acoustic guitar, some percussion, as well as the addition of a small woodwind section to punctuate some notable lines. The original Game Boy track also finds its way for a brief moment to add some nostalgia. I will admit, that tiny hit of nostalgia really did work a bit of magic. The greater usage of instruments helps inch towards that cohesive sense of grandeur to the overall experience, and I can see how it can help the gameplay seem more epic at times. However I would argue that this “inching” is not frequently found throughout the entirety of the new soundtrack.

How does this relate to the overall Legend of Zelda Franchise?

I feel that that last point shows a departure from the anticipated musical direction when compared with the previous games in the series.

For the longest time, the soundtracks to Zelda games were plagued with hardware limitations. The NES and Game Boy games era had a limited sonic palette of just a few basic sound waves. The SNES allowed for “A Link to the Past” to have a soundtrack created with more “real sounding” orchestra instruments, and the greater voice count created larger/denser orchestrations. N64 ushered in an even more lifelike sound collection, and the classics of Ocarina of Time and Majora’s Mask attempted to create as much of an “epic orchestra sound” as the hardware would allow. I’d argue that Gamecube’s Twilight Princess was the epitome of an “epic sounding” soundtrack for a Zelda game. The use of mass amounts of instruments in a large high-quality sounding hall space made all of the action on screen seem much larger than life, and the Wii’s ability to play back greater quantities of CD quality music helped Skyward Sword in a similar vein. Breath of the Wild on Switch utilized fewer instruments but still frequently referenced the concept of vast space. The world instruments and solo piano melodies still had quite a bit of reverb and ambiance to them, promoting a sense of magnificence for the world.

The soundtracks to all of these titles work together with the their art, story and level design to consistently advocate a sense of grand exploration and adventure through a majestic world.

And so we are left with the odd one out, the Link’s Awakening Switch remake: 

Small Instrumentation and small/intimate sounding space

The original game had a hardware limitation that dictated its soundtrack’s simplicity. All nostalgia aside, the simplicity of 8-bit music allows for the possibility of imagining something more grand within the mind of the player. Hearing a chamber orchestra in a small sounding room offers a specific, concrete connotation of “smallness” that is very difficult to escape. Because of this, I feel that the remake’s decision to mimic the hardware limited minimalism doesn’t align with the franchise’s consistent drive to promote the sense of an “epic adventure”, and ultimately hurts the overall gaming experience, especially for those who have previously played the original and allowed their imagination to run wild on their grand adventures.

What are some effective uses of sound in horror games other than jump scares?

Since we’re in the new decade, I was looking over different gaming “Top 10” lists and came upon the top 10 horror games of the 2010s. It got me thinking about what makes a successful fright scenario, and how audio can play such a huge role in the overall impact. But after taking a look at several “long-plays”, I noticed that one of the most frequent uses audio in horror games was in conjunction with the Jump Scare tactic. At its core, a Jump Scare is simply a sudden, unexpected loud noise with accompanying visuals. Sure, it’s quite effective at getting one’s heart racing, but surely there is more to the genre than jump scares? I’ve selected 6 games most frequently found in the “best horror game of the decade” category to do some analysis on how their audio design strengthens their fright factor beyond simple jump scares.

Reverse Jump Scare

The Slender Man games have a pretty unique machanic that I’m conveniently calling the Reverse Jump Scare. The premise of the games is to search for objects while avoiding contact with the Slender Man. When the Slender Man is near, the player hears static. The greater the danger, the louder the static and accompanying sound gets, until ultimately it builds to a climax when you are captured (the exact opposite of an abrupt LOUD to quiet jump scare). 

Because the player has the capability to escape from those situations, that sound of static solidifies itself as the sound of imminent danger and death, thus forcing the player to change course and stop what they are currently doing. The buildup of that sound cue can be especially terrifying if Slender Man himself is not visible in the general vicinity. 


Silence

Faith is a neat little experience of minimalism in every theme. The art mimics early 8bit titles, and the audio follows suit. You’re greeted by a rendition of Beethoven’s Moonlight sonata (classical music being used in horror is a different article entirely) as you explore the dark forest in search of the main house, accented by some really weird occurrences. Entering the house marks the start of the second half of the game. What’s different here is that while you wander around the house searching for clues, soundscape is completely silent. You only get one single sound effect as feedback for shining a flashlight on an interact-able, and major “cut scenes” have some noisy sound effects. No music, no background sound effects, nothing. This provides an odd sensation. As one of your senses (hearing) is essentially rendered useless, your other senses (sight and touch) are heightened. The odd pixellation of the game’s design at times makes certain objects seem less definite, and your eyes begin to play tricks on you.

Eventually you gather enough clues to cause the demon to start chasing you. All you get is a singular note swell and fade out every so often, punctuated again by continued silence. Did something in the corner just move? What is that shape supposed to be? Even the lack of your own footsteps making a sound is unnerving. 

An overarching theme of silence provides the opportunity to to give extreme contrast and impact  when any sound does end up playing. It was surprisingly effective at bringing me goosebumps for seemingly no reason. 




Hyper-detailed sonification of the world

Alien: Isolation went into a completely different direction. The world is superbly detailed in terms of the sound it makes. Every little computer, device, machine, and trinket makes a sound. It’s a very mechanical environment, with human footsteps and other interactions being the only “natural” sound. The player is thrust into this world from the start (for quite some time as well), and as they get familiar with the game’s controls and mechanics, they grow accustomed to how the world sounds and behaves. One may even say that it becomes somewhat familiar. You go about the station trying to find your way out. Until suddenly, more than an hour into experiencing the world, something doesn’t sound right. You hear breathing that isn’t human. You were told about the alien, but this is the first time you’ve come face to face with it. And as suddenly as it appears, it leaves. While this first encounter is a semi-scripted event, it sets the precedence for what the player now needs to be listening for to survive.

The world is busy. The space station is moving. The people you encounter are moving and fighting. The machinery keeps beeping and booping. But now you have this new breathing and stomping sound that you have to keep a look out for. Because the sounds that have a direct danger to your character start out buried in a dense sound scape, the player’s ear might have a higher chance of giving a “false positive” for that danger. The player now has an additional difficulty to worry about, furthering the feeling of stress, and ultimately providing a great opportunity for an effective horror situation.





Ambiguous non-diagetic sound

Deadspace 2 takes it’s sonic world into a more fantastical direction. While the setting of a space ship/station is similar to Alien: Isolation, it is a lot less real. While the player is still exposed to doors and computers that make sounds, the overall ambiance has a much different feeling. We hear wind sometimes (wind in space?). We hear a singing voice deeply manipulated by a space station sound (is it real?). We hear whispers around us (are you going crazy?). Exposing the player to these kinds of sounds, especially when their origin is ambiguous, gives the opportunity for such questions to arise and become common. 

The player is then unable to completely trust the surroundings that they are exploring, fielding an environment of uncertainty and horror.






The downright creepy

Thousands of years of human evolution have ingrained certain instincts into our brains. There are certain sounds used to signify danger. The snap of a twig could be a predator. A baby crying is distress. A scream is danger. A rustle of leaves could either be wind or something unknown. A squish or splat could be a feeding animal. All of these could signify death.

P.T. Silent Hill capitalizes on these instinctual responses to certain sounds to create a really stupid scary experience. I made the mistake to watch someone playing this game at night, and then had to stop. There simply is just is a collection of sounds that are inexplicably creepy, and they start to cause irrational fears in a given situation. Games like this one blur the line between effective design and jump scares. I really can’t say much more about this one because it’s making me sick just thinking about the game.



The brain is a marvelous object that can be frightened quite easily, especially with the very common trope of using Jump Scares. While I will not deny that there are groups of people who enjoy that kind of horror, for the rest of the population there is a craving for deeper horror mechanics. These examples certainly aren’t the only way to effectively use audio for fright purposes, as there could be an entire article just focussing on the relationship between music and horror. Nonetheless, it was really interesting to see how certain games embrace a unique technique of audio horror to craft the overall experience.

What can we learn about immersion from NES music?

The Nintendo Entertainment System, or NES was released in 1985 in NA, and had about a 10 year life span. During those 10 years, there was an extraordinary evolution within the soundtracks that accompanied the some 700 total games. With only 5 limited channels of audio dedicated to all music/sound related events, the composers and sound designers that worked on NES titles needed to make some ingenious and deliberate decisions. A number of those decisions affect the player’s relationship with immersion within the game.

EMBRACING INSTRUMENTATION FOR GREATER IMMERSION.

Almost all music can be boiled down to melody and harmony. The melody serves as the central theme for the music, and harmony serves to support that theme. How the melody and harmony sounds is a different story.

The Overworld Theme from the original Super Mario Bros. (1983) is a classic tune that is very often referenced when mentioning “classic nes” sounds. You have two pulse waves behaving and melody and harmony, a triangle for bass, and a simple noise as a rudimentary percussion instrument. Being one of the first titles released for the fresh hardware of the NES, I will argue that overall, the entire soundtrack doesn’t take any particular risks. While there are some thematic developments that help connect the player to the environment, the entirety of the 9:33 minute soundtrack has a consistent sonic quality that arguably detracts from the player’s immersion with the world that the game creates.

Compare this to the Overworld Theme from the sequel, Super Mario Bros. 2 (1988). The track contains 4.5 distinct instrumentation changes within its more than double runtime. The composer, Koji Kondo has had 5 years to explore and push the limits of the NES audio system. The resulting expansion of sonic palette makes the game locations seem much larger than before. This particular title also begins the exploration of the audio channel as a sample playback system within the Mario Bros. trilogy. Showing up in the Underworld track, channel 5 is used as a playback for a drum sample. This “real” sounding drum furthers the player’s immersion within the created world.

Last in the trilogy comes Super Mario Bros. 3 (1988). Throughout the 47:46 minutes, you hear less of a reliance on changing instrumentations mid song, but instead much heavier use of real sounding drums samples to “worldize” the environments and scenarios. Because of the ability to depend on unique samples rather than instrumentation, the game is then able to focus more on developing musical themes that are unique to environments. 

The evolution of the sonic signature of Super Mario Bros. throughout the trilogy shows how important the balance is between the lyric content of soundtracks and how they physically sound. When experiencing the games first hand, it is almost shocking how much more immersive Super Mario Bros. 3 feels when compared to the 1st one, and instrumentation has a huge impact on that feeling.

TREATING REPETITION CORRECTLY

Hardware limitations within the NES included limited size restrictions for all data. This especially affected music. One way to go about reducing file size of music was to use the concept of looping or repetition. Repetition could have a huge influence on the player’s perception and enjoyment of games. 

For example, take Star Force (1984). Interspersed between some infrequent but extravagant transitional material, there is a single 4 bar loop that is considered the main soundtrack. These 8 seconds of music happens non stop, for practically the entirety of the 56 minutes of full gameplay. While there are brief periods of “boss-type” developments, they are too short and too far in between. The immersion factor is broken after the first 5 minutes of the game.

The complete other end of the spectrum is Kirby’s Adventure (1993). Featuring a whopping 29 different tracks spanning just 29.56 minutes, the average song in Kirby’s adventure is around  62 seconds. Within those 62 seconds, each of the tracks follows the lyric lines and repetition standards put forth by classical music theory. We are shown a main theme A, which is followed by a secondary theme B, which is followed by a recap of a slightly altered version of A. This ABA form is easily iterated upon to develop extended themes. Stage Music 5, for example follows an (intro)ABA’CA’’ form. It packs a lot of different ideas into a very tight 1:15 minute space, not to mention the changing instrumentations as well. While normally, such complexity could potentially break immersion because too much effort is required by the player to understand what is happening, we are reminded that Kirby himself is constantly changing forms and gaining new powers. A greater amount of compelling, but shorter tracks gives more items and locations in the game an identity. These two concepts together reinforce the sometimes quite chaotic nature of the game.

When coupled with the changing instrumentation techniques mentioned before, the Kirby’s Adventure soundtrack is an amazing example of the extent that music has the power to shape the entire player experience. The comparisons of Kirby’s adventure (one of the last games to be released), and Mario Bros 1 (one of the first games released) show a substantial evolution in soundtracks over the 10 year period of the NES, and arguably shaped the development of game composition and audio design for future generations.