[ad_1]
A group of researchers working for AI chipmaker NVIDIA, the third-biggest company in the world by market capitalization, in collaboration with graduate students from Stanford, UC San Diego, UC Berkeley, and UT Austin, trained an AI system on 81 classic Tom & Jerry theatrical shorts to see if they could create long-form animated sequences of up to a minute in length.
Ostensibly, the point of the exercise was to generate longer and more consistent AI video while overcoming the challenge of “self-attention,” a technical principle that currently makes generating long-form generated AI video computationally prohibitive. All of the current public APIs for AI video generation are still limited by length: as of last month, OpenAI’s Sora has a maximum length of 20 seconds, Meta’s Moviegen is 16 seconds, Luma’s Ray 2 is 10 seconds, and Google’s Veo 2 is eight seconds.
The research team (which includes NVIDIA’s Jiarui Xu, Shihao Han, Ka Chun Cheung, Jan Kautz, Yejin Choi, Yu Sun, and Xialong Wang) explained how they did it HERE as well as in a paper (download PDF) that was published yesterday. They wrote, “The videos tell complex stories with coherent scenes composed of dynamic motion. Every video is produced directly by the model in a single shot, without editing, stitching, or post-processing. Every story is newly created.” (In its paper, NVIDIA does not acknowledge receiving permission from Warner Bros. Discovery to use its characters and copyrighted films for such an experiment.)
Below is one of the videos created by NVIDIA et al. that shows Tom working as an office employee in the World Trade Center:
At first glance, it looks impressive, but watch for a few seconds and it quickly falls apart with illogical performance, weird objects, inconsistent hook-ups, and only the faintest suggestion of coherent storytelling. While this is an experiment and should be viewed in the context of the goals it set out to achieve, it only serves to show that AI is still many years away from generating an entire animated film on its own. Also notable is the amount of text prompting it took to even achieve this. Here is the full prompt that was required to generate the above video:
The World Trade Center towers stand tall against a clear, bright blue morning sky. Streets bustle with pedestrians in suits, and yellow taxis move slowly through heavy traffic. Sunlight reflects sharply from glass windows of nearby buildings. Tom, the blue-gray cat, walks briskly along the wide gray sidewalk with a single black briefcase in his hand.
Inside the World Trade Center lobby, expansive marble floors reflect warm golden recessed lighting. Gray marble pillars and brass fixtures highlight the elegant entryway, along with a brass elevator door. A uniformed doorman wearing a dark navy-blue suit stands behind a polished wooden counter. Tom, the blue-gray cat, has a single black briefcase in his hand slowly presses the elevator button on the left side of the elevator and waits. Tom’s left hand is empty.
Under the large oak desk, a single intact, black cable runs neatly along beige carpeting toward a silver desktop computer tower. A clean white wall socket securely holds a single plug. Jerry, the brown mouse, carefully chews the black cable. Tiny and hardly noticeable sparks fly as he bites the cable.
A bright corner office features floor-to-ceiling windows overlooking the vast New York skyline beneath sunny blue skies. Cream-colored plain walls complement a large glossy oak executive desk. On the center of the desk, in front of Tom, is a silver computer monitor and silver keyboard, a silver penholder, and a black leather office chair. Tom, the blue-gray cat, now with an upset face, leans to the floor to check under the desk.
Under the large oak desk, a single black cable that has exposed copper wire in one spot runs neatly along beige carpeting toward a silver desktop computer tower. A clean white wall socket securely holds a single plug. Jerry, the brown mouse, stands near the chewed threw black cable while Tom, the blue-gray cat, has his head in the scene angry and surprised. Jerry hurries away from Tom out of the scene, while Tom removes his head from the scene.
A bright corner office featuring cream-colored plain walls and a brown, oak door is to the right of the scene. There is a small mousehold slightly to the left of the door at the baseboard. Tom, the blue-gray cat, runs toward the mousehole and slams his body into the wall. Tom sits there stunned with small cartoonish stars circling around his head which indicate dizziness.
A bright corner office featuring cream-colored plain walls and a brown, oak door is to the right of the scene. There is a small mousehold slightly to the left of the door at the baseboard. Tom, the blue-gray cat, sits there stunned. Tom, the blue-gray cat, checks the a small watch on his right wrist. Tom, the blue-gray cat, reads the time on his watch and notices that he is late for a meeting. In a rush, Tom, the blue-gray cat, stands back up, no longer stunned, opens the oak door on the right of the scene and runs through the doorway.
In the large conference room, plain cream-colored walls surround a long wooden table positioned at the center. A small black wall clock is positioned in the top right of the screen. There are several executives who are sitting in black office chairs are neatly arranged around the table. The executives are quietly focused on Spike, the gray bulldog, who stands at a whiteboard at the front of the room, while Tom, the blue-gray cat, stands left of Spike. Spike, the gray bulldog, angrily points at the small black wall clock at the top right of the screen while maintaining aggressive eye contact with Tom, the blue-gray cat. Spike, the gray bulldog, loudly reprimands Tom. Tom, now with a scared expression, covers his ears due to the yelling and makes an apologetic gesture.
In the large conference room, plain cream-colored walls surround a long wooden table positioned at the center. A small black wall clock is positioned in the top right of the screen. There are several executives who are sitting in black office chairs are neatly arranged around the table. The executives are quietly focused on Spike, the gray bulldog, who stands at a whiteboard at the front of the room, while Tom, the blue-gray cat, stands left of Spike. Spike, the gray bulldog, angrily points and hold his pointing to the left while looking at Tom. Tom sulks and then sadly walks to the left out of scene.
In the cozy mousehole interior with rounded walls, wooden furniture, and warm, soft lighting, Jerry, the brown mouse, continues laughing cheerfully as the scene gradually fades to black.
[ad_2]
Source link