A peek into the future of interactive computer graphics

Ars Technica’s recently published interview with game developer extraordinaire Tim Sweeney has given me the perfect excuse to finally sit down and write a few thoughts on the future of GPUs and real-time graphics in general.

In his interview Mr. Sweeney makes some interesting points about the next generation of graphics hardware & software architectures:

  1. 3D APIs as we know them are a thing of the past and will soon die, replaced by more flexible software rendering pipelines implemented with CUDA/Compute Shaders/OpenCL or other languages.
  2. (Some) fixed function/not programmable hardware units will still make sense for the foreseeable future.
  3. DirectX 9 has been the last important revolution in 3D APIs, everything that followed or that will come next won’t have such a dramatic impact on computer graphics engineers and researchers’ life.
  4. A good auto-vectorizing C++ compiler on all next gen platforms is perhaps all we need, developers will take care of the rest.
  5. Next gen consoles might be based on a single massively parallel IC with general purpose computing capabilities plus some fixed function hardware units to speed up certain graphics related tasks such as texture mapping or rasterization.


Regarding the first point I believe Tim Sweeney’s view is quite optimistic, many developers will neither be able nor interested in implementing their own rendering pipeline. 3D APIs, as we know them, will perhaps slowly lose their relevance, though I don’t think their premature death is going to happen anytime soon.

There is some chance that all present and future big players in the market (namely NVIDIA, AMD, Intel and Microsoft) will agree on a common way to ‘hijack’ the current 3D pipeline, allowing developers to add new stages and to bypass old ones. This might sound like a good option for whoever wants to be creative without having to entirely lose the benefits of something which is proven, works well and can be efficiently re-used. It might even open a whole new world of possibilities for middle-ware developers.

Fixed Function What?

For all those skilled in the arts point number two is a no brainer. For example TMUs’ dedicated logic performs tasks such as texture addressing, fetching, de-compression and filtering; If you have ever written a software renderer then you have experienced first hand how most of these operations are not amenable to be easily and efficiently implemented in software.

Custom rasterization hardware won’t likely disappear that soon either. Even Intel, that will not employ rasterization logic on Larrabee, agrees that “rasterization is unquestionably more efficient in dedicated logic than in software when running at peak rates“. That’s why fixed function hardware will likely stay with us for many years to come.

It’s interesting to notice how with Larrabee Intel got rid of a long list of dedicated hardware blocks that have been part of GPUs for a long time. Here’s a list of the most important ones:

  • input assembly (already implemented in software on some GPUs).
  • pre and post transformed vertex caches.
  • primitive assembly, culling & setup.
  • hierarchical z-buffer.
  • rasterization.
  • attributes interpolation (partially implemented in software on NVIDIA GPUs).
  • all output merge stages: alpha/stencil/depth tests, blending, alpha to coverage, etc.
  • color, z and stencil compression.
  • a plethora of obscure and relatively small fifos and caches.

Intel has clearly made a bold move here. They are taking huge risks and only time (and competition from other companies) will tell whether they are right or not.

Their software renderer seems to be incredibly well architected and it’s a pity we had to wait so many years to see a big player adopting a tile based deferred renderer. One of the few advantages of TBDRs over immediate mode renderers is that they can be more efficient at using programmable hardware and memory bandwidth, making some  dedicated logic unnecessary. Say goodbye to color and z compression, and don’t forget to commemorate output merge stages (aka ROPs) for all the good work they have done over the last 15-20 years!

Unfortunately we all know that nothing is for free and increased flexibility will come at a certain cost (this kind of bills are usually paid in perf/mm2 and perf/watt). On the other hand, giving up a big chunk of often idle dedicated logic is a great way to have more & more programmable hardware on board, which is inherently less likely to be inactive at any given time. A simple picture of NVIDIA GT200 can give a rough idea of how much area is spent on fixed function units, as you can see at least a fourth of the chip area is devoted to non programmable hardware.


DirectX 9 was a huge step in the right direction, and DirectX 10 is helping consolidating that step adopting new render states, driver and unified shading models. In contrast, for a variety of reasons that go from <what am I supposed to do with this?> to <it’s not a very clean design> I am not exactly enamored with DX10’s geometry shaders or DX11’s three brand new tessellation stages. I think these recent developments show us that as we enter in partially uncharted territory we don’t know yet which direction should be taken.

That’s why as we move towards more flexible and open rendering pipelines computer graphics researchers and game developers will unleash their imagination and come up with new interesting ideas. We will certainly see old but high profile graphics research brought back to life again (A-buffer anyone?) and used in real-time applications such as video games. Perhaps in ten years or more, after long and fruitful experimentation, we will settle down for a new and specific rendering pipeline model and it will be “The Wheel of Reincarnation” all over again!

Is CUDA good enough?

We will soon have at least three different CUDA-like languages to play with: CUDA, OpenCL and DX11’s compute shaders and each of them seem to be well versed in exploiting data level parallelism. Sweeney thinks we can fully implement a modern rendering pipeline with languages like C++ or CUDA, though I have a couple of concerns about CUDA-like languagues:

  • CUDA memory model is complex and it’s tied to NVIDIA hardware. Will it scale well on future hardware?
  • Many algorithms map poorly to DLP.

Conversely I expect CUDA and its younger siblings to evolve quite rapidly and embrace other forms of parallelism (it seems OpenCL will support some sort of thread level parallelism..), and here lies my hope to see some major innovation in this area. Speaking of which Intel is also working on the Ct programming language that promises to breathe new life into the nested data parallel programming paradigm. Notice how all these new languages are based on dynamic JIT-style compilers: a necessary step in order to abstract code from specific hardware quirks, to maintain compatibility across the board and ensure scalability over next hardware generations.

Tim Sweeney also advocates the use of auto-vectorizing compilers, which occurs to me tend to be effective only at exploiting DLP and not much else. That’s perfect for pixel shaders et similia, not so good for all sort of tasks that don’t need to work on a zillion entities or that need some sort of control on how threads are created, scheduled and destroyed (unless you are brave enough to manually manage dozens or even hundreds of threads).

Can One Chip Rule Them All?

Following Mr. Swenney’s suggestion: how likely is to have in a few years a first game console entirely based on a single chip or at least on a single massively parallel architecture? I don’t want to dig too much into this extremely interesting topic as I would like to discuss it at length in a future post, but let me say that what is in the realm of possibility is not always feasible (yet).

A Glimpse Of The Future

In this long post I have been talking extensively about a future where a rendering pipeline is more general, flexible and less tied to a specific hardware implementation, so it is perhaps time to show what this all means in terms of real change. I don’t want to take in consideration particularly exotic and unproven stuff, as I believe there is a lot of cool work to be done without having to throw the metaphorical baby out with the water!

For instance, it occurred to me many times that there is nothing inherently special in a stencil buffer that diversifies it from a color buffer or a z-buffer, unless we take in consideration the status it assumes in the rendering pipeline thanks to the stencil test. While fifteen years ago made perfect sense to have such an hardwired capability, now it feels more like an old gimmick that was not improved over the years while the rest of the pipeline was getting more modern and flexible.

Since we are at it what about alpha test, alpha blending and alpha-to-coverage? Why is the stencil buffer just using 8 bit per pixel? Why is the set of operations it supports so limited? Why can’t I have my own special alpha blending operations? And most of all do these old features still make sense?

Of course they do, I use them all the time! But as it happens to many other engineers and researchers I find myself fighting them on a daily basis while trying to bypass their awkward limitations. There is clear lack of generality and orthogonality with respect to the rest of the pipeline, and that’s why I am convinced that the whole set of output merge stages need to be re-architect-ed. We know that as the hardware evolves it gets rid of fixed function units, but these changes won’t automagically fix the software layers that go on top of it.

It would be nice if we could remove these features:

  • stencil buffer & stencil test
  • alpha blending, alpha test and alpha to coverage.
and replace them with generic shaders that:
  • can be invoked before and/or after fragment shading
  • can read from and write to all render targets
  • can kill a fragment and/or generate a coverage mask for it (to avoid aliasing..)

For example a stencil buffer would be just another render target (don’t forget we had support for multiple render targets for years now) and these shaders could be automatically linked by the driver to the main fragment shader or kept separate and executed in multiple stages. I have to admit that while I’m writing these few lines I’m having something like Larrabee and its software renderer in mind, but I wouldn’t be surprised if in two years from now the rest of the graphics hardware landscape ends up being much more similar to Larrabee than current GPUs.

Final Words

Even barring incidental display devices breakthroughs I believe no one knows for sure how we will do graphics in 10-15 years from now. That’s why it is hard to disagree with Mr. Sweeney when he notes that the next few years are going to be very exciting for engineers and researchers!


14 Responses to “A peek into the future of interactive computer graphics”

  1. NocturnDragon Says:

    Very nice post Marco! It surely is an exciting time in the 3D graphics field.

  2. Kyle Hayward Says:

    Excellent post!

    Along the lines of point 5, I agree that the hardware will move from GPGPU to just general purpose. And Larrabee is a prime example of this. Sure CUDA is nice, but it still has to massage data into a friendly format for the gpu. But with Larrabee, the software stack can easily switch from rendering to physics.

  3. Vincent Scheib Says:

    The challenge of the “blank page” will be daunting for many developers. As you point out, traditional pipeline concepts will be essential to get people started, and the catch is how do developers easily customize them later. How will they easily do that cross platform?

    An interesting time indeed for developers, and a challenging time for API authors! (And… engine developers…)

  4. Marco Salvi Says:


    I guess cross-platform development will be done as it’s done now, that is with an extra abstraction layer wrapped around a specific platform API. Even though this time APIs might be quite different from one another as they might expose wildly different hardware capabilities (so it won’t be easy!)

    I’m also interested in hearing your opinion of middleware developer about the future of graphics middleware. With more open and flexible and programmable platforms will you have more opportunity to diversify your product from the competition or will the current situation go unchanged?


  5. Davide Pasca Says:

    About 3D APIs. I agree with Tim Sweeney, though I wonder if it’s more wishful thinking.
    I remember the pain of not having 3D hardware acceleration.. but now the frustration of having to guess what the hardware does behind the API calls.
    Some of us are very much willing to change how real-time rendering works.. and APIs like Direct 3D and OpenGL are only a burden.

    Sometimes I wish RenderMan (just the interface, not REYES) would take over. A lot of the code would be about scene construction.. but at least it would allow anyone to import scenes made for off-line rendering ..and work out the real-time implementation from there.

  6. Ignacio Says:

    Marco Salvi wrote:
    > There is some chance that all present and future big players in the market (namely NVIDIA, AMD, Intel and Microsoft) will agree on a common way to ‘hijack’ the current 3D pipeline, allowing developers to add new stages and to bypass old ones.

    What do you think about GRAMPS? Do you think that could provide such an abstraction?


  7. Sam Martin Says:

    Hi Marco.

    Nice article! It’s definitely an interesting topic. I’ve been a bit concerned for a while that simply adding ‘more flexibility’ to GPUs is not really the most effective way to improve things anymore.

    I tend to think there are now 3 rather than 2 ‘jobs’ for processing units in modern games/graphics: CPUs, GPUs, and something similar to an SPU. In a modern PC the responsibilities of the third type are being subsumed to some degree by both the CPU and GPU, weakening the focus of both in the process. I’d (tentatively and with caveats) suggest it would be a good thing to see these responsibilities clarified rather than blurred, and for the G in GPU to be re-emphasised. For example, I’d really love to see the blend unit improved (although to be fair it is better on recent gpus) and crazy stuff like nans and infs thrown out once and for all 🙂


  8. Sam Martin Says:

    Just saw your follow-up about middleware. Just in case: this is my own opinion and may not be the opinion of Geomerics :).

    Unsurprisingly it’s just difficult to ship a product that runs on GPUs themselves currently, but I think things are moving in a better direction in this regard.

    Our focus with Enlighten is really on the SPU-like area where we can generate data for use in a clients own shader code rather than trying to share the gpu itself. It really is a nice middle ground between the scripted-dedicated-hardware (GPU) and regular libs. I guess this highlights the ‘bias’ behind my comment above, but IMO:
    – There’s a demand for such a separation.
    – It’s been demonstrated to be helpful in practice.

    .. And I was only going to post a two sentence comment earlier :). Oops.


  9. Marco Salvi Says:

    Kyle: I hope my comments about CUDA didn’t sound a bit too harsh. It’s without a doubt the best GPGPU language we have to date (and it’s not like no one tried before..), though I’d like to know more about OpenCL and DX11 compute shaders as they will target a broader range of GPUs.

    Ignacio: Thanks for the link you posted, I wasn’t aware of this particular work. I found it extremely interesting and it’s way more broad/general than what I had in mind! If an hypothetical next gen console would come with a predefined 3D pipeline expressed though a similar abstraction I can certainly see an average graphics engineer having fun modifying it to taylor it to a particular game/application.
    What’s your take on it? Do you think it makes sense to go forward with something like GRAMPS?


  10. Marco Salvi Says:

    Davide: So when are we going to see your renderman-language based real-time renderer? 😉

    Sam: Why do you think we need such a precise distinction between different computing capabilities&models? Is it to make sure we are disciplined and we stick to precise requirements and limitations in order to maximize performance?

    Are you guys also planning to port your optimized SPU code over CUDA et similia? If all next gen consoles would come with support for a standard DPL language (let say OpenCL) it could perhaps make your life much easier.


  11. Sam Martin Says:

    Just to clarify – it’s certainly possible to take my categorisation too far 🙂 Naturally, there are some clear technological advantages to unifying things (memory access springs to mind), as well as business-oriented ones which I think are significant and may be a bit overlooked occasionally (e.g. the ability to address a broader market).

    But, as you note, we are still in a realm where specialised hardware can provide a significant advantage. I just worry that in our attempt to unify and generify the current state of play we could mix everything together too much, and I think there’s some evidence for a rough 3-tier system being a helpful framework.

    We’ve certainly discussed CUDA, but it’s not on the list for this year. Both OpenCL and Gramps look very interesting and something like them would probably make life easier. There’s a lot of potential devils in the detail though. I’ll give the gramps article a further read.


  12. TimothyFarrar Says:


    In terms of algorithms which map poorly to DLP, conceivably on Larrabee you would be running these algorithms non-vectorized using scalar x86 code (and loosing the 16x ALU capacity of vectorization)? Or is it just the granularity of thread creation which is the primary problem (ie being limited to sequential large batch invocation)?

  13. purpledog Says:

    The future, always the future… But I’m constantly fighting with the present! 🙂

    Swtiching between rendering targets (basically arrays) should be a basic operation but it is so costly that tricks are needed to keep it fast. For instance, I often end up allocating a big array and treat it like lots of small array using lots of potentially buggy indirections.

    Sounds like a stupid limitation but it seems to be quite a fundamental one as well (emptying the pipe and so on). I’d like to see an API coming with a neat solution here so I can create all the rendering target I want (and stop worrying about it).

    Also… memory transaction are still painful, especially in the wrong direction (video ram -> ram). At least we now have special command to achieve that asynchronously but wait, why do I have to move this chunk of memory in the first place?
    And ok, if I have, why do I have to (asynchronously) wait for an entire frame (or more!) to hide the latency… Does the pipeline *really* needs to be so long??

    And other stuff as well, especially with opengl, like not being able to render into a luminance 8 bits format, not being able to send shader uniform per-chunk, this stupid limitation with texture size, and power consumption, and all the rest of it…

    Ok ok… Those problems are resolved on some Platform/API, and there are always workarounds. But my point is that there’s still a LOT to do to improve what’s already there. Without even changing the API…

    So when I read “3D APIs as we know them are a thing of the past and will soon die”, well, I cannot help thinking that if this is true, that will be an unfinished business, because we can still learn a lot from them. Actually, this quote reminds me of a young programmer starting to write it all over again from scratch because he was not happy with the code style (me 5 years ago basically :-). Slow iterative steps toward maturity, come on, there’s still a lot more to come (imo).

    Very interesting post by the way!!
    And sorry for being so long.

  14. Drazick Says:

    Could write an updated outlook about the situation today?

    Many has changed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: