TL;DR: separate files. Me unnecessarily thinking through implementation ideas below…
(collapsed because ain't nobody got patience for me to write this much)
I think it would need to be three separate files, also maintained separately (meaning duplication across all of them; one complete file and two extractions; considering just the movie vs riffs, as opposed to the additional pop-up video factoids, which would also be separate; then we get into factorial numbers of files to maintain). This is so we can keep it compatible with non-Gizmoplex players (your Rokus, your Android apps, your “I just download them and have my own local situation”).
That said, for maintenance it would be best to have one complete file with all the extra metadata included, which can be split into the three needed files (either at request time or - more likely - pre-compiled before uploading). That way, when changes need to be made, they are only made in one place. But as far as the end user is concerned, there are separate files.
A non-technical issue to face is when riffs talk over the movie. First of all, we’d have to determine what is being said in the movie in order to generate an accurate movie caption, but also those lines would have to be labeled as “from the movie, but lost in the riffing, so don’t bring this in.” And we’re still faced with the 3 line, 32 characters per line limit, which I’m pretty sure is going to make some really weird situations pop up when trying to extract or combine.
This does, however, open up the possibility of doing color coding (as someone else had brought up in another thread) of movie vs riffs, or even to color code each riffer. That would be a fourth file, though, as that additional data would not be backwards compatible with the standard format used around the world.
It’s going to require dev work to allow for switching between the various files (which should not be too much of a hassle; having multiple subtitle tracks is fairly standard; though maybe it’s Gizmoplex-only considering what can and can’t be done). The backend service for splitting/combining would need to be handled as well. Not insurmountable, though. The real bulk of the work remains in the hands of people who need to create the full transcripts and tag every single line. As you can imagine, it’s a lot more work than just standard captions, and those are an undertaking by themselves.
But also, I think that if the stated purpose of this request is coming from volume concerns - people just want to know what is said - then normal captioning addresses that issues. Can’t hear the movie? Turn on the captions. Can’t hear the riffs? Turn on the captions. We mustn’t forget that the purpose of the captions is for people who can’t hear everything, be that due to their biology (they are deaf or hard of hearing), environment (there’s loud construction outside), or personal choice (they decided to eat yummy crunchy food while watching). So we’re already addressing “I can hear some things but not all the things.”
Still, I think it’s an interesting idea. I’m more into the separate movie and riffs files as opposed to the factoids files (I personally feel those are best delivered via a wiki, because you often want to use a lot of words to explain some joke that may only take two seconds to tell), but mostly because I like the color-coding possibility, not because I want them separately displayed.