Thanks for the feedback on this. It’s helpful, but also kind of disheartening since it brings up some issues that I was afraid might turn out problematic as soon as we started trying to put things together into an actual finished production.
Unfortunately, on this particular short, the aspect ratio was practically square, so I had to overlap way more of the film with the seat silhouettes than I would have liked in order to keep Crow visible on the right-hand side of the screen. (I would have cheated and zoomed in closer with them one more seat closer to the center, but that would have messed up my loops, and I wanted to retain the same format I intend to use on TIPW.) If we do shorts in the future, we may need to create an entirely separate set of theater silhouettes closer to the ones used during the original Comedy Central era where there are only about 6 seats visible on screen, but the guys themselves occupy a lot more vertical space.
The good news is, with a widescreen print like Petrified World, it shouldn’t be an issue since I’ll be aligning it so that the base of the movie lines up exactly with the lowest dip between the seat cushions, which means as much of the movie will be visible as possible without shrinking the silhouettes and adding more seats.
Okay, this is very important feedback, because, as I suspected, it’s going to impact what we’re doing with TIPW. Anyone else want to weight in on timing? It’d be nice to have a larger sample size.
Most of my riffs and captions are on screen a little more than 3 seconds (say 3.25 just for an easy fraction) which is usually composed of the 3 seconds it took me to speak the line plus an extra 0.25 of “buffer space” I tacked on for reading. If we need to increase the on-screen buffer space duration by 33% to 50% so people have more time to read it, that means any line that takes about 3 seconds to say, would ideally require between 4.32 and 4.5 seconds of on screen time for people to parse it.
So, doing the math, it basically boils down to:
Any riff caption needs to be on screen roughly 1.5 times the length of time it takes to actually speak it.
This sucks, for reasons I’ll go into shortly, but we’re better off knowing it now than before we go into production and I have to lay down 800 of these.
Sorry, do you remember which one was the Jenkins riff? I’m looking back and completely blanking on which one you’re referring to.
You did single out the “pull ahead on 4” riff as being problematic, which is interesting, because that one was the most problematic to try to time and the one and only instance where I attempted to split up what would have been a 4-line riff across multiple back-to back captions, because had I just put them all up on screen at once, the text captions would have preempted the trigger for the joke by approximately 5 seconds.
Which I think demonstrates one of the big problems we’re going to have with any of the longer mini-skit or rambling riffs, especially anything like the “conductor skit” during the opening titles which potentially names names and references actions that aren’t even on screen yet.
Here’s how this particular scene (which occurs at the 7 minute mark and only lasts about 6 seconds for anyone who wants to take a closer look) breaks down time-wise:
The first two sentences of the riff are on screen for 2.12 seconds. And that line takes about 3.04 seconds to say out loud, so I definitely short-changed it and the two subsequent lines, but I was trying to match the pacing of the sequence and not preempt the trigger event of her pulling away, which occurs about 2.12 seconds later, and the third trigger of the next driver pulling forward.
As an aside, you’ll notice I put small gaps between each caption. These are deliberate and I found out we absolutely need to have them because slamming right into the next caption doesn’t work. Even if it’s another color.
The instant transition is jarring and causes your brain to have a brief “whoa, where am I?” moment, like somebody bumped you mid-sentence while reading a book, and it actually takes you longer to reset and read the next line, because you’re looking around trying to reorient yourself, than it does taking the extra 0.15 seconds to insert a “off” gap, which cues our eyes/brain that the first caption has ended and we’ve moved on to something else.
Just going by what “feels right” I found that about 0.15 seconds is about the shortest gap I can get away with, which maybe not coincidentally is about the length of the average human blink.
But you can see right away that we instantly get ourselves into a log-jam the moment we start trying to apply buffer time to a two or three part riff.
So the array of bad choices are:
-
Preempt the joke and partially cover up Tom by putting the entire gag up on screen at once before the trigger occurs, which totally messes with the comedic timing, likely ruining the joke.
-
Short change the screen time for the caption(s), and assume that at least part of the audience is going to miss that particular joke because they didn’t finish reading it.
-
Edit the joke to be shorter (Which is what I believe we’re going to have to do a lot of to make our TIPW script work)
-
Kill the long joke entirely and write a new gag that fits on a single caption screen. Which in the case of TIPW, means we either start suggesting alternative/rephrased riffs ourselves, or resign ourselves to throwing the script back out to the group for a 2nd draft, saying, these are the beats where we need new gags and you need to keep your riff to under 3 seconds or approximately 32-36 words.
Thinking back on the writing of this, I did end up sacrificing a lot of longer riffs and substituting shorter ones because they just didn’t fit the 3 second rule and breaking them up over two captions would have killed the momentum or created other timing issues.
Which is a shame, because I had a bunch of stuff I really liked that ended up getting the chop including a whole Fury Road bit, a Trader Joe’s parking lot gag, references to Sammy Hagar, Isadora Duncan, Oakland sideshows, O.J. Simpson, Whitney Houston, Thelma and Louise, Albert Camus, Ernie Kovacs, plus almost every instance where I tried to work in some sort of Jimmy Stewart film reference.
Because of the aspect ration of the short, it occupies a lot less horizontal space. I deliberately set the text alignment to use approximately 75% of the total screen space. I don’t really know if I could get away with reducing that further, because we’re already running into space issues.
The alternative would be to potentially try reducing the overall font size, but I’m hesitant to do that. It may work for TV viewing, but viewing on an iPhone or tablet, the text is already pretty small. When I do the 1A test for TIPW, I can possibly create two short alternate versions of just the first couple of minutes reducing the font size by maybe 10% and see if that becomes too much of an eyestrain for people to handle. (I know my eyes aren’t what they used to be, but I can read the current font size fairly well, even on an iPhone)
I think there will be when we you factor in additional buffering time for each caption. But it depends on the timing and nature of the joke, how long the scene lasts, and where the gag trigger(s) are.
The meat grinder gag might work if I split off either the first or third line off onto it’s own caption because they linger on that shot longer than they do most others, but I’d probably still have to short change the 1.5x buffering time on the first caption because as soon as you’ve moved on to the next scene, it becomes confusing that Crow just started talking about ground chuck and the guys are making sick noises when the film has moved on to a shot of a smiling girl pulling two cars on pieces of string.
This short is tightly edited, which necessitated keeping nearly every riff to under 3 seconds and a single on-screen caption. We might have a little more leeway with TIPW because it’s slower… but only a little, and only in certain places.
Due to the discrepancy in timing, we may find that our caption and a potential later voice acting “cue” scripts need to be entirely different, and the caption version has to be majorly trimmed and pared down for timing.
Or we just flat out accept that it’ll be like watching closed captioned CNN in the airport and the words you’re seeing on screen are just there to signify what was said, not what is being said, and you shouldn’t expect anywhere near the same experience as if you were watching it with sound because they captions are still talking about the tragic orphanage fire, while the images you’re seeing on screen have already moved on to the weather.
Since I wrote the jokes in this example and have read those captions dozens of times over, I’m probably working from more of a “cue” perspective, so the 0.25 second buffer time seems perfectly fine for me, as it hopefully would for any voice actor who’ve read through it a couple of times before and are mainly using the captions as a cue to hit their marks.
I’ll try to wade in to the message discussions on individual riffs tomorrow, but wanted to get this out here before I went to bed, since I think it impacts the discussion.