The Worst It Will Ever Be

I’ve been following deep learning and AI-generated media for nearly a deacde. Five years ago if you had asked me we would have an AI capable of generating hyper realistic videos of almost anything I would have called you crazy and told you we were still decades away from being able to pull that off…

Ten years ago deep learning was all the rage. Five years ago AI videos still created things of nightmares. Colors mismatched and vague shapes bleeding through whatever you could call its creation. They seemed like videos you’d see someone make to simulate the experience of an LSD trip gone bad and the vision receptors of your brain being fried. This was cutting edge technology at the time. It was also the worst it would ever be. Shortly after that we applied face-recognition technology to create deepfakes by processing a face frame-by-frame to swap it with another face. You’d need a good enough GPU to pull it off and it would take a long time to get a passable result. Deepfakes began being used for so much porn that Reddit eventually banned the deepfake subreddit. That did nothing to halt the inevitable progress that was to come. As we learned better ways to develop neural networks and better ways to train them (GANs) progress seemed to go from a breakthrough here or there to a spike of breakthroughs all at once all the time.

At the start of 2023 there was a meme AI video of Will Smith eating spaghetti - sorry I can’t find a version of only the 2023 AI video without the 2024 real Will Smith parody of it. But it was terible and the stuff of nightmares. This was created in 2023 using bleeding edge technology at the time! And that was the worst it will ever be. Now we can generate videos so well that it is hard to tell generated videos from reality except for some subtle tells. In time the tells, too, will go away. Leaving no trace of what is reality and what is generated video.

Describing a scene made it difficult to generate exactly what you were imagining. Enter Img2Img allowing you to guide the diffusion model into where you wanted things placed by vaguely drawing them for the AI. AI videos suffered from it being difficult to pose a character exactly as you wanted. Enter ControlNet and OpenPose. AI videos suffered from it being difficult to generate a consistent looking character - especially in videos - causing aspects of the character to change from frame to frame. Enter different diffusion denoising techniques to allow better stability. Similar techniques are being applied to backgrounds to make backgrounds more consistent between frames as well. An object in motion was difficult because the motion didn’t seem genuine. I won’t pretend to be smart enough to understand exactly how trajectory was solved but they’ve solved that problem too with tools like DragNUWA.I have to remind myself that all of this is bleeding-edge technology and the worst it will ever be.

I see things being created that absolutely blow me away and the rate at which I see new mindblowing things is increasing. It used to be years apart I’d stumble across some amazing thing on Github’s trending tab. Then it became every few months. We’re now at the rate that I’m discovering mindblowing, bleeding-edge breakthrough technology every few weeks! The rate of growth is insane and it doesn’t appear that it will be stopping anytime soon. I still can’t believe that ChatGPT isn’t even 2 years old yet - it’s second birthday is in November. There are already short stories and videos being created by humans using entire AI toolchains and workflows. The quality is still bad but is miles over what an individual person could ever hope to accomplish in the timeframe that these things are being made. An animation that would have taken a single person several years to animate can now be done in a few weeks with an “acceptable-enough” result. I cannot wait to see where this technology will be in a few more years! The things people will be creating now that they have the power to create anything they can imagine in almost no time at all.

Until the bubble pops and the money disappears - I do not see the current trend slowing down anytime soon. Even when the bubble pops it doesn’t mean improvements won’t keep getting made (the .com bubble didn’t freeze all internet progress back in the early 2000’s). The curve is no longer linear growth but exponential growth. Either the creation of an artifical general intelligence is truly impossible or it will happen within my lifetime. I’m not so hopeful as to think it will be within a few years or even within the next decade - despite what the marketers for AI companies want people to think - but fuck me if it isn’t seeming more and more likely that it will happen within my lifetime and probably before I have any grandchildren.

Every passing day the technologies making all of this possible get a little bit better and single every day continues to be the worst it will ever be.