It's funny how the result kinda looks closer to the "gray era" of videogame graphics, ca. 2008. In fact, it makes GTA V look like GTA IV. It replaces sunny California with cloudy Germany because...
It's funny how the result kinda looks closer to the "gray era" of videogame graphics, ca. 2008. In fact, it makes GTA V look like GTA IV. It replaces sunny California with cloudy Germany because of the training data. It's doing wonders for vegetation, though. This could be something.
IMO they're "cheating" a bit by first using a training set that obviously has color problems (partly due to cloudy weather but I bet it also uses bad color profiles). The grayness looks...
IMO they're "cheating" a bit by first using a training set that obviously has color problems (partly due to cloudy weather but I bet it also uses bad color profiles). The grayness looks "realistic" in the sense of "they wouldn't make anything that gray on purpose, it's gotta be real". The more colorful examples they show don't look quite as convincing, mostly like someone turned up the dials for saturation.
The most impressive part remains the grass, I've long thought that the "painted cardboard" look of videogame grass should be replaced by something more clever and an "AI shader" like this could be an interesting solution.
Makes a pretty neat demo. Some things that stood out to me: I think they cherry-picked a good game for this demonstration. This works so well for GTA V because it is more or less already a...
Makes a pretty neat demo. Some things that stood out to me:
I think they cherry-picked a good game for this demonstration. This works so well for GTA V because it is more or less already a real-world simulation.
Many of the other solutions they compared to had issues with temporal stability or when the real-world photographic datasets diverged from the samples of the game. The proposed enhancement system is still reliant on real-world datasets. How can this be applied to games or simulations that are not modeled on a realistic world? There are stylized game worlds like Mario games where it may not make sense to apply this at all, but what about "magical realism" styled games, such as in the Diablo series or scifi stuff like Mass Effect?
The trees and cars look really good. Vegetation, specifically, is something that hardly any video game graphics produce photo-realistically. Between the work of actual 3D modeling (which is often tool-assisted), and texturing and placement in a world, much less the re-use of object models so that you find repeated instances of the same trees throughout a world, there is so much detail that is not usually modeled realistically. This solution seems to do a great job at filling in that last bit of detail and variation to make vegetation believable.
In the enhanced output, the original decals lose detail or have incorrect colors. Things like directions painted on the road, or traffic signs and billboards don't seem to pass through the enhancement process without losing sharpness of lines and colors also can become wildly inaccurate. This is problematic for real-world semantics of street signs especially where GTA V, in this instance, is set in a America. Contrastively their photograph data set came from Germany, which has entirely different standards for signage. I didn't see any stop signs, but that is a case where the red background with white text are semantically significant in the US. So, the model has possibly learned a representative distribution of German traffic signs, but the colors are wrong for American traffic signs. This was especially apparent in one of the examples with a "no u-turn" sign on the left. The original sign has a black arrow, but the enhanced version strangely makes the arrow a deep blue. Given the distribution of colors of text and decals in German road signs, however, I'm not sure where this color divergence is being introduced from. I'm not sure how this could be overcome without collected photos from a worldwide distribution with locale labels and then parameterizing the model to produce the correct signage based on the semantic segmentation of the sign objects and a given locale parameter value (essentially a locale transfer task).
Using a convolutional network to convert GTA V footage to photorealistic scenes. I wonder how this will be used in the future for gaming. Especially since this could create a whole new experience...
Using a convolutional network to convert GTA V footage to photorealistic scenes. I wonder how this will be used in the future for gaming. Especially since this could create a whole new experience for older games.
This could be a complete upset to the photorealistic graphics teams of the world. Why have a pretty base game if you’re going to have a full AI pipeline on top that redraws the scene? What used to...
This could be a complete upset to the photorealistic graphics teams of the world. Why have a pretty base game if you’re going to have a full AI pipeline on top that redraws the scene? What used to be the on screen result buffer could become just an intermediary (like a z buffer). Textures could all be color coded checkerboards.
One day we’ll look back on games of today and remark how people used to actually hand paint textures. Removing that bottleneck means humans can spend more time modeling and programming. In the end games can be bigger with less work.
Gathering training set stock imagery of different scenes will be a valuable job for a small number of people worldwide.
Maybe there will be more emphasis on the content of games rather than surface appeal? The gameplay would be the same with either version of the graphics.
Maybe there will be more emphasis on the content of games rather than surface appeal? The gameplay would be the same with either version of the graphics.
It's funny how the result kinda looks closer to the "gray era" of videogame graphics, ca. 2008. In fact, it makes GTA V look like GTA IV. It replaces sunny California with cloudy Germany because of the training data. It's doing wonders for vegetation, though. This could be something.
Check out the later example based on another city's data. Much more lively, and you can see its attempts at reflections much better.
IMO they're "cheating" a bit by first using a training set that obviously has color problems (partly due to cloudy weather but I bet it also uses bad color profiles). The grayness looks "realistic" in the sense of "they wouldn't make anything that gray on purpose, it's gotta be real". The more colorful examples they show don't look quite as convincing, mostly like someone turned up the dials for saturation.
The most impressive part remains the grass, I've long thought that the "painted cardboard" look of videogame grass should be replaced by something more clever and an "AI shader" like this could be an interesting solution.
Makes a pretty neat demo. Some things that stood out to me:
I think they cherry-picked a good game for this demonstration. This works so well for GTA V because it is more or less already a real-world simulation.
Many of the other solutions they compared to had issues with temporal stability or when the real-world photographic datasets diverged from the samples of the game. The proposed enhancement system is still reliant on real-world datasets. How can this be applied to games or simulations that are not modeled on a realistic world? There are stylized game worlds like Mario games where it may not make sense to apply this at all, but what about "magical realism" styled games, such as in the Diablo series or scifi stuff like Mass Effect?
The trees and cars look really good. Vegetation, specifically, is something that hardly any video game graphics produce photo-realistically. Between the work of actual 3D modeling (which is often tool-assisted), and texturing and placement in a world, much less the re-use of object models so that you find repeated instances of the same trees throughout a world, there is so much detail that is not usually modeled realistically. This solution seems to do a great job at filling in that last bit of detail and variation to make vegetation believable.
In the enhanced output, the original decals lose detail or have incorrect colors. Things like directions painted on the road, or traffic signs and billboards don't seem to pass through the enhancement process without losing sharpness of lines and colors also can become wildly inaccurate. This is problematic for real-world semantics of street signs especially where GTA V, in this instance, is set in a America. Contrastively their photograph data set came from Germany, which has entirely different standards for signage. I didn't see any stop signs, but that is a case where the red background with white text are semantically significant in the US. So, the model has possibly learned a representative distribution of German traffic signs, but the colors are wrong for American traffic signs. This was especially apparent in one of the examples with a "no u-turn" sign on the left. The original sign has a black arrow, but the enhanced version strangely makes the arrow a deep blue. Given the distribution of colors of text and decals in German road signs, however, I'm not sure where this color divergence is being introduced from. I'm not sure how this could be overcome without collected photos from a worldwide distribution with locale labels and then parameterizing the model to produce the correct signage based on the semantic segmentation of the sign objects and a given locale parameter value (essentially a locale transfer task).
Using a convolutional network to convert GTA V footage to photorealistic scenes. I wonder how this will be used in the future for gaming. Especially since this could create a whole new experience for older games.
Here's a video by twominutepapers detailing the paper
This could be a complete upset to the photorealistic graphics teams of the world. Why have a pretty base game if you’re going to have a full AI pipeline on top that redraws the scene? What used to be the on screen result buffer could become just an intermediary (like a z buffer). Textures could all be color coded checkerboards.
One day we’ll look back on games of today and remark how people used to actually hand paint textures. Removing that bottleneck means humans can spend more time modeling and programming. In the end games can be bigger with less work.
Gathering training set stock imagery of different scenes will be a valuable job for a small number of people worldwide.
Maybe there will be more emphasis on the content of games rather than surface appeal? The gameplay would be the same with either version of the graphics.
My guess is it will be more like games in the 90s. “Whoa, 3D!”