DirectX Forum FAQ

(Last Updated 14th August 2006)


Table of Contents


Forum Guidelines

Forum Best Practices

What Is DirectX?

-          Components

-          Deprecated Components

Getting Started

-          Software

-          Hardware

-          Books

-          Web Resources

Frequently Asked Questions

-          General

o        Gen #1: DirectX costs and licence

o        Gen #2: Using Managed DirectX 2.0

o        Gen #3: How to solve a problem yourself

-          Redistribution

o        Redist #1: Correctly redistributing DirectX components

o        Redist #2: The D3DX DLL’s

o        Redist #3: Integrating DirectSetup with Visual Studio

o        Redist #4: The DXUT framework and other SDK sample code

-          Direct3D

o        D3D #1: What should I learn first?

o        D3D #2: Fixed Function or Programmable Pipeline?

o        D3D #3: I only want to do 2D graphics...

o        D3D #4: Alpha blending doesn’t work correctly or how do I do blending?

o        D3D #5: Hardware capabilities and enumeration.

o        D3D #6: Depth buffering problems.

o        D3D #7: I don’t see anything on the screen, why?

o        D3D #8: I tried my application on a different computer and it didn’t work or looks wrong/different.

o        D3D #9: I’m leaking memory, help!

o        D3D #10: Is x faster than y?

o        D3D #11: Creating and exporting .x files.

o        D3D #12: My textures are blurry or distorted.

o        D3D #13: Resource allocation best practices.

o        D3D #14: Speeding up locking of resources.

o        D3D #15: Debugging vertex and pixel shaders.

o        D3D #16: Getting information from/about shaders and effects.

o        D3D #17: Picking geometry.

o        D3D #18: Draw call overhead.

o        D3D #19: DrawIndexedPrimitive parameters.

o        D3D #20: Direct3D and multi-threading.

o        D3D #21: How to interpret the data returned by a Lock() operation

o        D3D #22: Using the debug runtimes

o        D3D #23: Texture creating/loading enumeration and checking

o        D3D #24: A perfect Direct3D9 application structure (includes handling lost devices)




Overview [Table of Contents]


This document covers the Frequently Asked Questions for the DirectX forum. To save yours and the regular’s time, you should read this on your first visit(s).


Some basic information:

  1. All sections have been linked together using regular hyperlinks. You can click on the titles in the ‘Table of Contents’ (above) to jump straight to the section. It is also possible to post the links in forum posts if you wish to point someone towards a particular part of this document:
    <a href=”link goes here”>look at this part of the FAQ</a>

  2. There are many moderators and staff members that help keep GameDev.Net running smoothly, but you should direct any forum-specific requests/problems/fan-mail to jollyjeffers. The moderator has the ability to add/change the contents of this document as well as “stick” important threads to the top of the forum for greater exposure.

  3. Please report any broken links in this document. This document has been written to try and use external information rather than be yet another duplicate of information online. Obviously, this doesn’t work if links to external information are broken!

  4. If you wish to contribute to this document (or offer improvements/corrections to existing material) then please contact jollyjeffers. Provided it matches (or exceeds) the quality and depth shown throughout this document then 3rd-party content is more than welcome!




Forum Guidelines [Table of Contents]


Use of the forums is governed by the main FAQ. There are a number of additional, simple, rules that the forum moderator requests you adhere to:


  1. Use meaningful subject lines. The first part of your thread that anyone sees is the subject. If the subject line is meaningless people won’t even bother to read your thread. Ideally, a visitor to the forums should be able to have at least a reasonable idea as to what a thread is about before they choose to read it fully.

    Good examples:
    Extremely low frame rates with shadow volumes

    Bad examples:
    Problem! I need help!
    This makes no sense!

  2. Start the subject with [MDX], [C#], [VB.Net] (or similar) if you’re using the Managed DirectX library. Some people aren’t familiar with this method of DirectX programming so save their time and show up-front that you’re using Managed DirectX. Equally, some members may specialise in Managed DirectX development and having them identify your thread will only be a good thing!

  3. Pick the correct forum for your question. GameDev.Net has lots of sub-forums covering a wide range of different areas. Ask the correct people and you will get a better response in a shorter time period. Do not be offended if the moderators move your thread elsewhere – we do this to help you as well as other members of the community.

  4. Posting source code. In order to help us help you it may be necessary for you to post source code.

    1. Do not post ALL of your source code. Try to identify the important functions or fragments – the ones that are related to the problem you want help with.

    2. Try to avoid uploading your entire project/code and asking people to download it. The internet is full of nasty material such that very few people will trust unknown source code that they download from a web forum. If people won’t download it, they can’t help you.

    3. Use [source] tags for long pieces of code. If you are posting more than 10 lines of code, wrap it up in [source] … [/source] tags – the forum software is designed to provide syntax highlighting and retain formatting. If you don’t use these tags the formatting will be lost – making your code hard to read. If people find your code hard to read they won’t be able to help you.

    4. Use [code] tags for short pieces. For small fragments of code – less than 10 lines in total you can wrap it up in [code] … [/code] tags. The forum will not apply syntax highlighting but will preserve formatting.

  5. Include all relevant information – even if it seems obvious. Include exact error messages and numbers and where they came from. If it’s a compiler error then include the line of code that it is pointing to (if available), if it’s a runtime error then the line of code as well as any parameters/variables being used.

  6. Don’t “bump” threads until at least 24 hours have passed since the last reply. The forums are a resource for all members and should be shared fairly. Constantly “bumping” a thread so it appears at the top of the listing is considered as selfish and rude. If no one has answered your question in 24 hours then you are welcome to bump it if it is important. For clarity, “bumping” a thread is where you add a reply that has no additional information and serves only to push it to the top of the forum listings (which are sorted by most recent replies). Remember that moderators have x-ray vision and are very good at spotting people who break this rule.

Breaking these rules will result in an increase of your warning level. If your warning level is increased you are still free to contribute and use the forums, but treat it as a sign that you need to reconsider your usage of the forum – suspensions and bans come after warnings, and you don’t want that!




Forum best practices [Table of Contents]


This section of the FAQ gives some hints/tips regarding how to make the most of the forums. We have a large community full of extremely talented and helpful individuals, if you follow these points you may find that you will get bigger/better responses.


  1. Include all platform/configuration details. There are lots of combinations of lots of software available – telling us what you’re using will save everyone time. If you don’t provide enough information the best you can hope for is someone to ask for more information. You won’t get an answer.

    1. What compiler and/or IDE are you using? Visual C++ 2003, Visual C++ 2005, Dev C++, GCC, Visual Basic 6, Visual Basic .Net 2002, Visual C# 2005…

    2. What operating system are you using? Windows XP Professional, Windows XP Home, Windows Vista…

    3. What version of DirectX are you using? 7, 7a, 8, 8.1, 9, 9a, 9b, 9c, 10…

    4. Which SDK are you using? DirectX 7.0a, DirectX 8.1, DirectX 9 Summer 2004, DirectX 9 June 2005, DirectX 9 February 2006…

    5. What are you trying to achieve? Don’t just describe the problem – describe what your intended outcome is. It might be that part of your problem is that you’re attempting to do something the wrong way!

  2. Try to solve your problem before posting. Other members will understand if this is difficult and not always possible, but showing people that you’ve at least tried will gain you favour. It might take several hours to get even a simple reply – alternatively you could spend 30 minutes trying (and successfully) solving it yourself! Refer to Gen #3: How to solve a problem yourself for more information.

  3. Do some research before posting. Searching the forums is a good start, but widening it to the whole internet is also a good idea. Refer to the web-resources section of this FAQ for some other good sources. Starting a new thread in the forum does not have to be your last resort, but equally it shouldn’t be your first.

  4. Read the “sticky” topics at the top of the forum. The forum moderator has stuck these to the top because they are worth reading – they contain information about new tutorials, samples, SDKs or might contain an interesting discussion that you stand to learn something from.

  5. Show your appreciation to people who help you. The forums assign a “rating” to each member – you can read more about it here. If someone is particularly helpful and answers your question (or feel they have provided a valuable contribution to someone else’s question) then click the “Rate This User” link that appears below all posts. The most respected and appreciated members appear in the top-50, several of which are regular visitors to the DirectX forum. This obviously works both ways – if you help another member they may well choose to rate you up for your effort.




What Is DirectX? [Table of Contents]


The Wikipedia entry on DirectX gives a solid overview of the API and its background; but put simply it is a feature-rich and high-performance suite of multimedia tools, technologies and API’s for Microsoft’s Windows platform. Typically it offers fast access (hence the “Direct” part of its name!) to the advanced features of a PC’s hardware.


DirectX exists in two main forms – the runtime and the SDK (Software Development Kit). Any software that is created using the DirectX technologies will be dependent on the DirectX runtime (see the FAQ on distribution). Only the actual software developers require the SDK to be installed.


Developers often use DirectX for gaming applications, but this is by no means the only use for the technology – more general multimedia, business and scientific applications are equally feasible.


As of the time of writing, DirectX 9.0c is the newest officially released API. It is a standard system component in ALL Windows XP Service Pack 2 machines. The DirectX suite is under constant development and is being updated for the forthcoming Windows Vista operating system; in particular Direct3D 10 is due to make its appearance at the same time. Direct3D 10 will be a standard part of the Windows Vista OS, but will NOT be supported or available on any other OS (e.g. Windows XP).


DirectX itself is a collection of several related components, each serving a specific multimedia task:


Current Components

The currently available components in DirectX 9.0c are:


Direct3D – Computes and displays both 2D and 3D graphics and allows access to advanced Graphics Processing Unit (GPU) features.


DirectInput – Used to receive input from a variety of devices: Mice, keyboards, joysticks, steering wheels to name a few. Also contains ‘XInput’ for utilizing the Windows Common Controller (Xbox 360 controller).


DirectSound – For playback of sound and audio effects (although, not typically used for playing music). It can also handle 3D positioning and effects (via DirectSound3D) as well as applying effects to sounds (such as echo or reverb). DirectX Audio also contains XACT – the Cross-platform Audio Creation Tool. XACT contains elements for audio designers as well as programmers.


Deprecated Components

As previously mentioned, DirectX is now on its 9th major version and close to the 10th. For various reasons some components have been deprecated and manifests itself in two forms. Firstly a deprecated component might stay in the main DirectX SDK and simply not get updated along with other components; secondly a deprecated component might be removed from the DirectX SDK completely – in this case it will often appear in the “Platform SDK”. A deprecated component is no longer being updated or supported by Microsoft – you are still able to use it, but it is strongly advised that you DON’T start developing new software with them.


DirectDraw – A 2D specific graphics API. Direct3D should be preferred over DirectDraw – it offers all of the same features and many more!


DirectPlay – A networking and multiplayer gaming API. Microsoft strongly recommends the use of Windows Sockets and the Firewall API’s. The Multiplayer and Network Programming forum (be sure to read the FAQ page) is a better place to ask any further questions.


DirectMusic – For general music playback, although not via the popular MP3/WMV compressed audio formats. DirectMusic handled much of the finer details of playing multiple music segments (allowing for dynamic “mood” changes according to game-play) and solved various timing/streaming problems.


DirectShow – For audio/video playback, provides similar functionality to media players and also allowed streaming to output files. DirectShow was moved to the Platform SDK as of April 2005, refer to your installed samples/documentation (the PSDK is a prerequisite for DX, so you should have it!) for more details.



Getting started [Table of Contents]


If you are completely new to DirectX software development, the following is your shopping list – provided you have a valid Windows licence everything is available for free. There is absolutely no excuse for software piracy when developing DirectX applications. If you are unfamiliar with some of the general terms in this document (SDK’s, compilers, IDE’s etc…) then you may want to take a step back and drop by the For Beginners forum and Getting Started section.




Frequently Asked Questions [Table of Contents]


This is obviously the real part of this document… The following are a selection of commonly asked questions. Some of the questions are intentionally vague – giving a full and complete answer would require a large amount of explanation. In some cases there is no clear correct or incorrect answer, instead the answer is written with the intention of allowing you to assess both options and choose which one suits you.


General [Table of Contents]


Gen #1: DirectX costs and licence [Table of Contents]


There are no costs specifically involved with developing DirectX based software. Indirectly you may have to spend money purchasing other software (such as Microsoft Windows or Visual Studio) but provided you have a legitimate copy of Windows you can download the DirectX SDK completely free of charge.


There are licences that you must agree to upon installing the SDK as well as what you can (and can’t) do with the information contained within it. For the most part there is unlikely to be anything in these agreements that restricts normal use, but it is always worth familiarising yourself with the terms and conditions. Once the SDK is installed you can find the relevant agreements in the %DXSDK_DIR%\Documentation\License Agreements\ folder.


Whilst not specifically related to DirectX (even less so now that DirectShow belongs to the Platform SDK) there are restrictions on the use of MP3 encoded material. Microsoft holds a licence to include MP3 technologies in their products that you can use via the DirectShow API but it does not imply any rights for anyone else to distribute encoded content. In practical terms this means that you can enable your software to playback MP3 content, but distributing mp3 files with your application might require you to acquire a separate licence. Refer to for more details.




Gen #2: Using Managed DirectX 2.0 [Table of Contents]


The current, officially supported, form of DirectX for use with the Managed (aka .Net) languages is version 1.1. For several DirectX SDK releases in 2005 and 2006 a beta of version 2.0 (to tie in with the release of Visual Studio 2005 and .Net 2) but to many peoples surprise it was announced that it would NEVER be released in a final/non-beta form. Instead, at the Game Developers Conference in 2006 a successor, XNA Framework (part of Microsoft’s XNA strategy), was announced. The beta version of MDX will continue to ship in the SDK until XNA Framework is ready to replace it.


The bottom line: Do not develop any production code with the Managed DirectX 2.0 libraries. This is not to say that you shouldn’t use it at all, just that you should be careful not to rely on as a business or project critical technology. Refer to ZMan’s write-up for more information.




Gen #3: How to solve a problem yourself [Table of Contents]


This FAQ entry is relevant to the majority of software development questions, but there are a few specific details regarding the development of DirectX applications.


Learning how to solve your own problems is an essential skill. You will often find that the more experienced members replying to threads know the answers from experience due to following a similar set of steps to those outlined below.


For general software debugging, you may want to refer to an Introduction to Debugging by Richard Fine if you’re unfamiliar with the basic techniques.


Use the debug runtimes


Use of the debug runtimes should be standard practice when doing any DirectX based software development but they are absolutely essential when debugging a problem. For most SDK’s the DirectX control panel can be found in the regular Windows control panel, as of the August 2006 SDK it has moved to the appropriate installed start-menu folder. Running this application allows you to enable “debug” or “retail” runtimes for each major component of the DirectX suite along with an output level to control the granularity of messages.


All code should be checking return codes (the FAILED() and SUCCEEDED() macros being particularly useful here as well as the V() and V_RETURN() macros defined by most DX samples). However, the return codes will only reveal a vague reason as to why a call might be failing. The debug runtimes will often include a descriptive text message in the “Output” window of your IDE explaining what the exact error was.


Look at the documentation


Intellisense and books/tutorials will only get you so far, if you’re having a problem with a particular function or feature then look at the documentation. The C++ as well as Managed documentation is available online as well as being part of the SDK installation. Examining the documentation might reveal a special case or characteristic that your code is not correctly handling.


Search the web


If you can’t find the information directly from the documentation then using MSDN search is the next step. The MSDN search engine will index the official documentation, the MSDN forums, CodeZone and the blogs of various Microsoft employees.


Searching the GameDev.Net forums is also an excellent way to find useful information regarding your problem. The forums have hundreds of thousands of discussions covering a huge range of problems and topics. It’s an easy way to get quick answers.


If neither of these yields useful results then widening the search via Google (or your favourite equivalent) is a good idea.



By this point you’ve either solved your problem or you’ll be ready to start a new thread in the DirectX forum. Just remember to include a full description of your problem, what you’ve tried to do in order to solve it and any and every relevant piece of information.




Redistribution [Table of Contents]


Redist #1: Correctly redistributing DirectX components [Table of Contents]


As with all software you should legally redistribute any and all dependencies with your package. Not only is it good practice, but most end-users don’t appreciate having to download and install extra packages just to get your software running. For a Windows/PC game the first thing most users see is the setup program – first impressions are important, so making it as hassle-free as possible can only be a good thing.


Despite DirectX being a system-level component it still requires distribution and installation. This is simply due to the large number of combinations available (from older Win9x systems through to XP, Vista all with a variety of service packs) such that you can’t be 100% sure which version of the DirectX runtime is actually installed.


Installation of DirectX should be performed using the DirectSetup component – your software should not attempt to manually install/register the DirectX libraries. Thankfully, the DirectSetup API is extremely simple and will handle all of the dirty work for you. Refer to the ‘Installing DirectX with DirectSetup’ documentation, the samples and the reference for more details.


The actual files required for redistribution are located in the \Redist\ folder and accompanying legal information is found in \Documentation\Licence Agreements\.


There are a number of useful articles included in the DirectX SDK that are worth reading:


  1. Installation and Maintenance of Games
  2. Install-On-Demand for Games
  3. Simplifying Game Installation


It is advised that you read these articles before designing your redistributable package/software.


A particularly important point is that you shouldn’t allow the end-user to skip the DirectX installation. It has been quite common over the years to find an “Install DirectX 9.0” checkbox on a setup page – why?! If the application requires the runtime then giving the user the choice is fundamentally the same as allowing them to cancel/quit the installation of your application. DirectSetup is designed to quickly return if the system is up-to-date and no installation/changes are required – you’re not going to be saving the user any time by allowing them to skip it!


File sizes can be prohibitive for some game distributions; the Redist #3 entry creates a simple installer that weighs in at nearly 30mb just for the DirectX components. There is no easy solution to this problem, referring to the ‘Reducing the DirectX Redistribution Size’ documentation may be useful.


Microsoft does provide a web installer (updated for each SDK release) that contains the latest-and-greatest versions. The web-installer will analyze the host system and then download and install any missing components. If you do not wish to directly include the DirectX redistributable with your package then you may wish to consider pointing users at the web-installer – checks using DirectXSetupGetVersion() and D3DXCheckVersion() functions should be sufficient for detecting whether this is necessary.




Redist #2: The D3DX DLL’s [Table of Contents]


D3DX is a helper library that sits alongside the core Direct3D runtime and provides powerful as well as convenient functionality that is often taken for granted in most D3D-based applications. Being a helper library specifically for software developers it does not (and never has) qualified as being part of the core Direct3D runtime. The relevance in the context of redistribution is that the basic DirectX download a user can get from will NOT include D3DX.


Up until the December 2004 SDK update the D3DX library was included as a statically linked library – in simple terms its code is injected into the final executable by the compiler. This was quite a convenient process as it required little (or no) consideration for redistribution, but it posed maintenance problems for Microsoft and was converted to a dynamic library as of the February 2005 SDK. If a problem (e.g. a bug or security risk) is found in D3DX then Microsoft can publish the fixed component via their Windows Update system – a process that would not be available if D3DX remained as a statically linked library.


It is worth noting that some developers attempt to re-compile against earlier SDK’s so as to avoid the file-size overhead of including the D3DX DLL. Whilst this might sound like a great idea (for those creating applications where download size is critical) it might not make such a difference in practice. A statically linked D3DX still has its code added to the final executable – only compiler optimization (e.g. dead/redundant code elimination) can improve this. If your application makes substantial use of D3DX then the compiler may not be able to make substantial savings and the overall result is not that much better than just including the D3DX DLL.


The D3DX DLL appears with a filename in the form d3dx9_#.dll, and (at the time of writing) the following versions are available:


d3dx9_24.dll – February 2005 SDK

d3dx9_25.dll – April 2005 SDK

d3dx9_26.dll – June 2005 SDK

d3dx9_27.dll – August 2005 SDK

d3dx9_28.dll – December 2005 SDK

d3dx9_29.dll – February 2006 SDK

d3dx9_30.dll – April 2006 SDK


(Note: there were no updates in October 2005, June 2006 or August 2006)


The # used (e.g. 28) corresponds to the D3DX_SDK_VERSION value found in d3dx9core.h and can be used by D3DXCheckVersion() to ensure that the installed components match those used to build the executable.


To allow a D3DX DLL to be correctly updated by Microsoft (as well as correctly shared by other applications) it must be installed into the system. Whilst it might be tempting to include the DLL directly alongside the application and skip any installation/setup this is not permitted by the licence agreement. You must use the DirectX Setup API (or installer) to distribute the D3DX library. If D3DX were not correctly installed (or if your software used an incorrectly installed version) then a security risk or notable bug that causes problems with your software is likely to cause end-users to blame you rather than Microsoft. This is especially apparent if other applications use a correctly installed (and updated) version that does not show any of the symptoms of your software.


Refer to the ‘Installing DirectX with DirectSetup’ article in the SDK paying particular attention to the ‘Reducing the DirectX Redistribution Size’ section if you want to create a minimal installation package.




Redist #3: Integrating DirectSetup with Visual Studio [Table of Contents]


There are a large number of installation programs and technologies available but depending on your version of Visual Studio you may have ‘Setup and Deployment’ project templates available. This offers a very simple and capable way of creating a redistributable package for your application.


The following steps are taken from Visual Studio 2005 Professional Edition but should be valid (or very similar) for other versions of Visual Studio.


  1. Go to the ‘File’ menu, select ‘New’ followed by ‘Project’.
  2. On the left-hand side, expand the ‘Other Project Types’ branch
  3. Select ‘Setup and Deployment’ and then ‘Setup Wizard’ from the options on the right-hand side
  4. Select ‘Create a setup for a Windows application’ entry when prompted
  5. Go to the ‘View’ menu, select ‘Editor’ followed by ‘File System
  6. Under the ‘Application Folder’ create a new folder called ‘DirectX
  7. On the right-hand side, right click and select ‘Add’ and ‘File…
  8. Browse to your DirectX SDK installation’s Redist folder
  9. Select the necessary files as indicated by the ‘Installing DirectX with DirectSetup’ entry, for example, the following files are needed for a 32bit Windows XP installation using the December 2005 D3DX library:
  10. Go to the ‘View’ menu, select ‘Editor’ followed by ‘Custom Actions
  11. Right-click on the ‘Install’ text and select ‘Add Custom Action…
  12. Use the dialog to browse to the ‘dxsetup.exe’ file added as part of step 9
  13. Select the new ‘dxsetup.exe’ entry and bring up the ‘Properties’ dialog
  14. Enter ‘/silent’ in the ‘Arguments’ field – this will stop the installer from popping up its own dialogs and/or messages
  15. Go to the ‘Build’ menu and select ‘Build Solution’; Visual Studio should complete the build process with no errors.


The above process generates two files (an .exe and .msi) ready for redistribution and totalling 29.4mb.


This is a very simple way of integrating a DirectX installation into a traditional setup program – depending on requirements it is possible to come up with much more robust and/or complex solutions.




Redist #4: The DXUT framework and other SDK sample code [Table of Contents]


The DirectX SDK contains a wealth of example source code and applications as well as a competent framework (see ‘DXUT Overview’). By using the ‘Install Project’ feature of the Sample Browser it is very easy to base your application off of source code provided by Microsoft. Unlike other more restrictive source code licences, Microsoft grant you the ability to use, modify and distribute software based on their sample code:


Clause 2a,i:

* Sample Code.  You may modify, copy, and distribute the source and object code form of code marked as "samples" as well as those marked as follows:




The DXUT framework is included under the \Samples\ branch of the SDK’s file layout thus qualifies for the above statement. Refer to ‘DirectX SDK EULA.txt’ in your \Documentation\Licence Agreements\ folder – the above is taken from the December 2005 SDK and could quite legitimately change for different SDK versions. Always double-check against the agreements included in the SDK you are using.


It is worth noting that the above clause specifically indicates two folders (Maya and Max) from the utilities – but several other utilities are also included in source code form and the licence agreement does not explicitly allow redistribution of these.




Direct3D [Table of Contents]


D3D #1: What should I learn first? [Table of Contents]

Everyone learns in a different way from different starting points – this FAQ entry will hopefully give you some broad guidance and general suggestions to get you started. Ultimately it will be down to you to select the route (and resources) that best suite you.


As with many technologies, there is both a theoretical and practical aspect to Direct3D. Mathematics is an important foundation for many of the processes and operations utilized in all forms of computer graphics, a basic understanding of vectors, matrices, planes and coordinate systems is important. The D3DX API will help insulate you from many of the low-level mathematical details, but it still helps to have an idea of what is happening behind the scenes and most importantly why certain mathematical properties/methods are being used.


If you feel that your mathematical background isn’t great, or if it’s been a few too many years since your last school lessons, picking up a book on the subject will be a worthwhile investment. Mathematics for 3D Game Programming and Computer Graphics, 2nd Edition by Eric Lengyel is a good choice – it covers the key points whilst still being easy to approach.


It is worth realising from the outset that Direct3D is a complex API to master; it will take time and effort to get to a level where you can produce high quality and robust graphics applications. It is tempting to look at both older and recent commercial games and think it’ll be easy – sadly this is rarely the case! With this in mind you are well advised to pace your self – don’t jump in at the deep end. Appreciate that it will take time and that you should build up your knowledge/ability/experience in steps.


As discussed in the general Getting Started section your first steps should be to download/configure the DirectX SDK and your IDE/compiler. This FAQ entry assumes you have this all up-and-running and are ready to begin with Direct3D. Take some time to familiarise yourself with the contents of the DirectX SDK:


  1. The help files.
    These should be available in the SDK’s start-menu group, if not head for the files in
    %DXSDK_DIR%\Documentation\ folder. Documentation is split between the “Managed DirectX” (directx9_m.chmonline form) and regular/native DirectX (directx9_c.chmonline form), the latter is usually the better of the two – even if you’re working with the .Net API it pays to check the content in the C/C++ documentation. Ultimately the API is the same – just the method of using it changes, so documentation is documentation! The help files are roughly split into two parts – the “Programming Guide” and the “Reference”. When starting out you will probably want to peruse the “Programming Guide” but as you get more experienced jumping straight to the “Reference” section becomes common.

  2. The sample browser.
    This should also be available as a shortcut in the SDK’s start-menu group, if not
    %DXSDK_DIR%\Samples\SampleBrowser\SampleBrowser.exe should get it running. This tool is your primary method for accessing the sample code and articles shipped as part of the SDK. The browser is straight-forward and intuitive to use and it pays to spend some time running the different samples just to see what is possible as well as what is available. Pay particular attention to the “Install Project” link that most samples include; this allows you to create a copy of the necessary files for you to work on without damaging the original install copy. The “Install Project” method is the DirectX SDK’s equivalent of older “App Wizard” templates built into Visual Studio.

  3. The tutorials.
    The DirectX SDK contains a number of tutorials for both .Net (
    %DXSDK_DIR%\Samples\Managed\Direct3D\Tutorials) and C/C++ (%DXSDK_DIR%\Samples\C++\Direct3D\Tutorials). These are accompanied by tutorial documentation in the help files (look for the “Tutorials and Samples” section in the contents).

  4. The samples.
    The majority of the source code shipped with the SDK is in the form of samples. These tend to be substantially more advanced than the aforementioned tutorials, but should give you good examples for understanding the more interesting aspects of the API. Bare in mind that the samples are IHV-neutral; they take advantage of hardware features where possible but they aren’t specifically optimized to use IHV-specific functionality. It’s a subtle difference, but you may find that ATI’s Radeon SDK and Nvidia’s SDK contain a few useful tips-and-tricks when it comes to advanced API usage. The samples can all be launched via the sample-browser, or are located in the
    %DXSDK_DIR%\Samples\ folder.


Given that the SDK is a completely free and self-contained resource you should spend time learning what you can from it before branching out.


Once you’ve explored the SDK, consider searching online – there are several highly regarded websites offering free tutorials, samples and general information about Direct3D programming. The quality of this material varies greatly, but given that it is still free it can’t hurt to invest some time reading. Refer to the ‘Tutorials and Articles’ list for some recommended websites.


Books are an obvious choice when it comes to learning a new API – there is often no shortage of technical books in your local (or online) bookstore. GameDev.Net maintains a list of DirectX specific books. It is worth bearing in mind that the DirectX SDK is relatively fast moving – the new SDK’s every 2 months can cause problems when copy-n-pasting from a book (or trying to compile the code on the book’s CD using a downloaded SDK).


Introduction to 3D Game Programming with DirectX 9 (and the newer Introduction to 3D Game Programming with DirectX 9.0c: A Shader Approach) by Frank Luna come highly recommended. The accompanying website may be of interest.


So far this FAQ entry has considered available resources; that aside it is useful to note that there are different aspects of the API to learn.


Firstly, Direct3D is not just a 3D API (despite the name) – you can also use it for 2D rendering (see D3D #3: I only want to do 2D graphics...). Many people find it easier to pick up the concepts and characteristics of the API by starting with simple 2D graphics – it is possible to utilize many of the important API features without venturing into the world of 3D.


Secondly, Direct3D is available in two flavours – “Fixed Function” and “Programmable”. The former is more traditional – you tell the API what you want to do by configuring/calling various state-changing methods; the latter allows you a huge amount more freedom by allowing you to write short programs (“shaders”) that are executed by the GPU. Simply put, the fixed-function is legacy-only – something for the older hardware (especially pre-D3D8) – and is not a good choice when going forwards. If you are starting to learn now you want to avoid it – difficult as it might initially be (versions prior to D3D10 are a hybrid of fixed-function and programmable) it will be easier in the long run. Not only because Direct3D 10 drops the legacy pipeline but because shaders are so powerful they dominate modern texts, research papers, articles, samples (etc…) such that finding information about the older technology gets more and more difficult as time progresses.


To learn the API in a forward-thinking way, focusing on the programming pipeline with shaders written in HLSL (High Level Shader Language) via the effects framework (the effects framework is convenient, but not essential) is important. See the following FAQ entry for more information on Fixed Function or Programming Pipeline for further discussion.


Finally, in research for this FAQ entry a thread, “Your moderator needs your help: How did *you* start learning Direct3D?”, was started to gather feedback. You may find it useful to see various forum members’ answers.




D3D #2: Fixed Function or Programmable Pipeline? [Table of Contents]

There are two fundamental ways of producing graphics using Direct3D – via the traditional “Fixed Function” route or the more flexible “Programmable Pipeline”. The fixed-function route is now considered a legacy method and is effectively a left-over from versions prior to Direct3D 8. As of version 10 the fixed function route will no longer be available.


The most obvious functionality provided by the fixed-function pipeline is that of transformation and lighting. For example, the SetTransform(), SetLight(), LightEnable() and SetTextureStageState() functions are only relevant to the fixed function pipeline. Using FVF (“Flexible Vertex Format”) codes for describing vertices is also associated with the fixed-function pipeline.


The programmable pipeline makes use of vertex, geometry (Direct3D 10 only) and pixel shaders. These small programs run custom functions and operations on the input data and allow for substantially more flexibility, they can also express more complex graphical effects than the fixed function. Functions such as SetPixelShader(), SetVertexShader() and SetVertexDeclaration() are for the programmable pipeline.


Whilst there may be a few isolated cases, there is little if any difference in performance between the fixed and programmable pipelines. Obviously, as with all software development, it is possible to write code that underperforms for no good reason. Whilst the implementation details of the hardware and drivers are mostly kept secret, it is generally accepted that the driver will translate API commands into GPU-specific “micro-code” – thus the actual hardware doing the work won’t be able to tell if it is operating on fixed-function or shader-based code.


In light of Direct3D 10 dropping the fixed function, it makes sense to focus any current and future development on a shader-based architecture. Depending on the complexity and size of the code being used this may well be difficult to do, but there is at least one useful trick to try. The Direct3D 9 [Effects Framework] allows you to integrate fixed-function techniques as well as more forward-looking shader-based techniques yet the host application should have relatively little (if any) knowledge of how the technique is actually implemented. With this abstraction it is possible to implement fixed-function graphics techniques and then replace them with equivalent shaders at a later date.




D3D #3: I only want to do 2D graphics... [Table of Contents]

With the API’s name being “Direct3D” it is a common misconception that you must only use it for 3D graphics. This is completely untrue! Whilst Direct3D may be biased towards 3D graphics, it is perfectly capable of displaying 2D content.


The biggest problem is that many people insist on using DirectDraw – technology that is no longer supported by Microsoft and has not been updated since the last century (DirectDraw7 was released in 1999). For various reasons it is becoming increasingly difficult to find DirectDraw related information, and asking questions about it online is yielding less replies simply due to less people being familiar with it (or have forgotten whatever knowledge they once had!).


Using Direct3D offers many advantages over the more traditional DirectDraw route – several important features such as sprite scaling, rotation and translation as well as alpha blending are effectively “free” with Direct3D. When using DirectDraw these effects were much more complex and involved affairs.


There are essentially three ways of displaying 2D geometry via Direct3D:


  1. By sending “pre-transformed” data directly to the GPU.
    This allows your application to define coordinates as pixels in screen-space and completely skip any vertex processing on the GPU. This has the advantage of less processing, but also means that your vertex data is much more static – changing it will require resource modification (which you should avoid where possible). To use this method you either declare your vertices as having the
    D3DFVF_XYZRHW format, or the POSITIONT semantic (depending on whether you’re using FVF’s or Vertex Declarations).

  2. Regular 3D geometry via an orthographic projection.
    This method is more difficult to implement as it becomes harder to directly map directly to screen coordinates (e.g. if you require that all sprites are NxM pixels in size). The advantage is that you re-enable the regular transform and lighting sections of the pipeline – in either fixed function or programmable modes. This can allow you to perform many useful effects (such as matrix transforms like scaling, rotation, translation, shearing) completely in hardware. Also, there ceases to be any functional difference between this and “normal” 3D work – so pretty much any and every effect you see discussed in 3D contexts will work in 2D. To use this method you need to modify your view and transformation matrices – use the
    D3DXMatrixOrtho**() functions instead of the more conventional D3DXMatrixPerspective**() forms.

  3. A hybrid of the first two options.
    This is only applicable if you’re using vertex shaders, but can be a very powerful trick. By combining the ability to pass pre-transformed coordinates as well as being able to execute a vertex program you can get the best of both previous suggestions. Because you have much more control over how your vertex data is packed you can send only the data that is needed, as well as in the most suitable format. A simple example would be to send the
    POSITION data as a simple X/Y coordinate packed as a float2. If the hardware supports it, a half2 for a lower storage requirement could also be used. The vertex shader is required to output homogenous space coordinates which suffers from the same problem as option 2 when it comes to directly lining up sprites with screen pixels.


The above 3 methods describe the relatively “low level” approaches to rendering 2D graphics, but it is worth noting that D3DX provides the ID3DXSprite (‘sprite’ in MDX, see the Sprite2D sample in the SDK?) class for those who want something quick-and-simple to use. Using the sprite classes is recommended, but whilst it is highly optimized and tested it is essentially a “wrapper” over the aforementioned technologies. As such, it is possible that investing effort into implementing your own custom version might have its advantages.


You should also bare a few things in mind when it comes to designing and writing 2D applications using the Direct3D API. The bottom-line is that people make the mistake of using Direct3D in the same way they would use DirectDraw or other 2D-specific API’s. This often leads to exceptionally poor performance, which makes for plenty of great forum threads where people incorrectly assume that DirectDraw is a better/faster API. Write your “2D in 3D” applications in a 3D-like way to make sure that you get the most from your hardware and API. The following guidelines cover many common problems:


  1. Use geometry rather than trying to directly copy pixels.
    Representing your tiles/sprites/backgrounds as textured triangles is far more efficient than trying to directly copy parts of textures/surfaces to the back-buffer (or another off-screen surface).

  2. Use textures rather than surfaces.
    In conjunction with the previous point, using IDirect3DTexture9’s instead of IDirect3DSurface9’s are much better for performance. Many developers familiar with DirectDraw latch onto the ‘surface’ interface in Direct3D as it shares the name of a basic/primary type used in DirectDraw.

  3. Don’t change resources unless it’s absolutely necessary.
    Make sure you read D3D #14: Speeding up locking of resources. Various traditional 2D effects were achieved by modifying the raw pixel data – this is very expensive under Direct3D. Instead, you should prefer texture blending and/or pixel shaders to achieve similar operations. If you’re using geometry (as suggested in #1, above) then constantly locking and modifying the vertex properties is equally bad.

  4. Batching and state-change optimization are important.
    These are general Direct3D best-practice’s, but especially important here as many 2D programmers write horribly inefficient code that breaks these rules. A common example is drawing a “tile map” one tile at a time. Specifically, changing the texture and vertex buffer and issuing a Draw**() call for every 2-triangle tile. Batching all tiles together into a single, larger, vertex/index buffer is substantially better for performance.

  5. Texture palettes/atlases are a good idea.
    This is the process of grouping together many textures onto a single larger texture. For example, putting an 8x8 grid of 30x30 pixel sprites onto a single 256x256 is far more efficient than having 64 different 30x30 textures.

  6. Correct use of texture coordinates.
    A big difference between DirectDraw style 2D and Direct3D is the use of floating point texture coordinates in the range 0.0 to 1.0 that makes it difficult to directly address individual pixels (or areas of pixels). Two key problems arise from this; firstly when extracting source data from a texture you need to be careful about using linear filtering where multiple pixels will be sampled, secondly when rendering directly to the screen it is possible that texels will not directly map to pixels. This usually manifests itself as blurry rendering and/or having extra pixels around the border that you weren’t expecting. Make sure you read and understand the ‘Directly Mapping Texels to Pixels’ article in the DirectX SDK.


As a closing note… just because most API’s and libraries are written for 3D usage does not mean that 2D graphics or games are dead. Far from it even – a creative person adhering to the best practices listed above can create amazing results. In fact, the features and performance of modern GPU’s could be seen as a great opportunity for “2D in 3D”…




D3D #4: Alpha blending doesn’t work correctly or how do I do blending? [Table of Contents]

Alpha blending is a popular effect that allows Direct3D to represent semi-transparent parts of geometry or textures. The term “Alpha” refers to a scalar value often stored alongside more traditional colour values – red, green and blue. The value is typically referred to as a floating point value between 0 and 1 (even though it might be stored as a 0-255 integer in some formats) and is used to manipulate source and destination colours as determined by the D3DRS_SRCBLEND and D3DRS_DESTBLEND render-states. The following generalised equation is used:


Final_Colour = Source_Colour * D3DRS_SRCBLEND + Destination_Colour * D3DRS_DESTBLEND


Refer to the D3DBLEND enumeration for details on the different operators.


To render geometry with alpha-blending you must enable it via the D3DRS_ALPHABLENDENABLE render state; it is important to note that you should disable it once you’ve rendered the appropriate geometry. A simple mistake that some people make is to leave it enabled and wondering why they get a complete mess for their image!


A common usage uses “0” to imply completely transparent and “1” referring to completely opaque – values between 0 and 1 are a linear interpolation between transparent and opaque. This requires the following render state configuration:






Another common use of alpha blending is for “additive compositing” – simply adding the source and destination pixel colours together. Multi-pass lighting and some post-processing effects will use this. The following render-states should be used:






It is important to note that the “Output Merger” (to use the Direct3D 10 term) is NOT programmable. If you are using the programmable pipeline you still need to use regular API calls to configure the alpha blending function. However, the shader code becomes responsible for providing the input values for the actual computation. In both cases the alpha-blending operation is computed on a per-pixel basis, thus the output of the texture blending (for fixed function) or pixel shader (for programmable) is of primary importance.


A key characteristic of alpha-blending that is commonly forgotten is that it is draw-order dependent. That is, you will get different results depending on what order you despatch your Draw**() calls whilst creating the final image a user sees. As shown in the general form of the blending equation, the final output is a combination of the source and the destination colour. The destination colour being whatever colour of pixel is currently stored in the back-buffer or render-target. Several combinations – the obvious being if D3DRS_DESTBLEND is set to D3DBLEND_ZERO – will eliminate its contribution, but for most practical uses it will contribute in some way.


Assuming that the destination colour is an active component in the current operation then whatever Draw**() calls precede the current one will have an impact on the calculation of the current. A subtle side-effect that can also be overlooked is that most blending operations will also write to the depth buffer even if the resultant colour is transparent. If the depth buffer is modified by transparent pixels it is easy to introduce artefacts where later rendering will be “clipped” for no immediately obvious reason. Simply put, you won’t be able to render later geometry behind semi or totally transparent pixels.


Solving the draw-order dependence is possible, but it will often require a restructuring of how your graphics rendering works. A two-pass approach is a simple and effective method. In the first pass all opaque geometry (that which sets D3DRS_ALPHABLENDENABLE to FALSE) should be rendered. This guarantees that any background data is present-and-correct. The second pass enables blending and renders semi-transparent geometry from back-to-front (see the Painters Algorithm for reasons). Determining the order of objects can be expensive to get correct, but even rough sorting (e.g. by the mid-point of a given piece of semi-transparent geometry’s distance to the camera) can be sufficient.


Two tricks can be used to help reduce artefacts in the second pass. Firstly, allowing depth-testing (D3DRS_ZENABLE = D3DZB_TRUE) but turn off depth-writing (D3DRS_ZWRITEENABLE = FALSE). This should stop any near (or totally) transparent pixels from clipping other more noticeable (but still partially transparent) pixels. Secondly, render each semi-transparent object twice – one for the back faces and one for the front faces (remember to swap the D3DRS_CULLMODE render state!).


The first option is particularly useful for particle systems where there is a high possibility of overlapping semi-transparent geometry; the second option is generally less useful but can prove very useful when there are completely semi-transparent objects (e.g. a glass ball).


Unrelated to this particular technique, but a useful bonus of implementing the painter’s algorithm (or similar loose depth sorting) is that it can be used when rendering the opaque geometry. Specifically if you render from front-to-back you can often make better use of the depth buffer for reducing over-draw.


Regardless of whether you’re using the shaders or the fixed-function pipeline, you can store alpha values per-pixel in a texture. This is often the preferred route as it allows for fine-grained control over which alpha value is used for which pixel in the final image. A number of simple but effective results can be obtained via this route. Per-pixel alpha can either be added dynamically by the application (via Lock() calls and directly editing pixels – refer to D3D #21: How to interpret the data returned by a Lock() operation For more details), specified as a constant “colour key” in the D3DXCreateTextureFromFileEx() call, or stored as part of the actual file. If the latter option is desired it depends entirely on the chosen file format (e.g. DDS can support alpha but JPG can’t) and the content-creation tools used (not allow will allow editing of the alpha channel). Use of the ‘DirectX Texture Tool’ (dxtex.exe) can automatically merge an alpha channel and generate a suitable DDS file.


In all cases it is important to select a texture format that keeps the per-pixel alpha value; a member of the D3DFORMAT enumeration with an “A” component is required. For example, D3DFMT_A8R8G8B8 will retain per-pixel alpha, but D3DFMT_X8R8G8B8 won’t.


When using the fixed-function pipeline there are several possible “entry points” for specifying the alpha value used in the final blending operation, a solid understanding of the fixed function texturing effects is useful – refer to the Texture Blending entry in the SDK for more details. In particular, setting the arguments (ALPHAARG1 and ALPHAARG2) for each stage’s ALPHAOP is important.




Language and API Variations:


Managed Direct3D 1.1:


C/C++ Direct3D 10:




D3D #5: Hardware capabilities and enumeration [Table of Contents]

Before reading any further, it is worth noting that Direct3D 10 has a strict feature requirement with no support for previous hardware – this makes development much easier and also makes much of this FAQ entry irrelevant for D3D10!


Despite Direct3D providing an abstraction from the underlying hardware, it is still possible for your software to be exposed to a huge variety of features. This is either an opportunity or a problem depending on your point of view - an opportunity because it allows you to tailor your software to make the absolute most of the underlying hardware; it can also be a problem because you might have to change your software to work across different configurations.


Fortunately Direct3D allows you to query the capabilities (often referred to as “caps”) and make the appropriate decisions. A computer may have several Direct3D compatible devices; the following code shows the general structure than can be used to enumerate available options:


D3DFORMAT fmtValidFullScreenFormats[] =








UINT uiAdapters = pD3D9->GetAdapterCount( );


for( UINT i = 0; i < uiAdapters; i++ )


       for( UINT j = 0; j < (sizeof( fmtValidFullScreenFormats ) / sizeof( D3DFORMAT )); j++ )


              UINT uiModes = pD3D9->GetAdapterModeCount( i, fmtValidFullScreenFormats[j] );


              for( UINT k = 0; k < uiModes; k++ )


                     D3DDISPLAYMODE mode;

                     pD3D9->EnumAdapterModes( i, fmtValidFullScreenFormats[j], k, &mode );

                     if( SUCCEEDED( pD3D9->CheckDeviceType( i, D3DDEVTYPE_HAL, fmtValidFullScreenFormats[j], mode.Format, FALSE ) ) )


                           // A hardware device is available on adapter 'i'

                           // With a front-buffer display mode of 'j'

                           // And a back-buffer format of 'k'

                           // With a width of 'mode.Width' and height of 'mode.Height'


                           D3DCAPS9 caps;

                           pD3D9->GetDeviceCaps( i, D3DDEVTYPE_HAL, caps )


                                  // Further inspect the devices capabilities here







Using the above code structure (or similar) should allow you to perform a complete check of the graphics sub-system when your application starts. This allows you to warn the user if problems exist (and exit gracefully if they can’t be avoided) or to tailor execution to the detected features.


Performing the above enumeration will potentially yield a large number of combinations, it is possible to automatically sort/rank this and pick the best but equally you may want to allow the end-user to select. If this information is to be displayed to the end-user, the information returned by IDirect3D9::GetAdapterIdentifier() might be useful. In particular, the returned D3DADAPTER_IDENTIFIER9 structure contains a VendorId and DeviceId value that can be used to identify different chipsets from different companies.


The key is in the final part of the above fragment – the call to IDirect3D9::GetDeviceCaps(). This reveals the low-level information about the devices capabilities – refer to the lengthy D3DCAPS9 documentation page for the precise details.


Determining what to check is not always easy – but assuming anything is risky. The “Remarks” sections in the API documentation as well as the more general “Programming Guide” tend to include references to which capabilities to check.


Well behaved drivers tend to return failure/error codes if you attempt to use an invalid feature and some will silently fail and produce incorrect results. If you’re unlucky enough to get the latter case then it can take a very long time to determine the cause of a problem that you could easily have prevented by making the appropriate checks in the first place!


If you are using the fixed function pipeline, pay particular attention to:


If you are using the programmable pipeline, pay particular attention to:


General capabilities to check:


When writing code that checks the device capabilities it can be useful to use the “DirectX Caps Viewer” tool included in the SDK. It can be launched via the SDK’s start-menu entry or should be located in \Utilities\Bin\x86\ and called DXCapsViewer.exe. This simple program shows a complete listing for the hardware in the computer it is run from and will definitely show the correct enumerations/capabilities. A quick check can verify that the results generated by your code match those generated by the official tool – it is a simple, if a little embarrassing, to break enumeration code and attempt to use a feature that the hardware doesn’t actually support!


An important value that is missing from these checks is how much Video RAM is available for you to use. This is technically available via IDirect3DDevice9::GetAvailableTextureMem() but it will not yield the “128mb”, “256mb” or “512mb” values that you’re used to seeing alongside a GPU’s name. The value used to market and sell hardware is not always entirely available and wont factor in any additional memory (such as AGP or System RAM) allocated to the graphics system.


From the October 2005 SDK onwards a “Graphics Card Capabilities” list has been included in the SDK. This is an extremely useful resource for guiding any decisions over which features to expect on what types of hardware. It is not a guaranteed list and should not be relied upon – more useful for getting a rough idea of which chipsets support which features. The list is available as an Adobe PDF or Excel spreadsheet and can be opened via the “Sample Browser” tool.


Refer to D3D #23: Texture creating/loading enumeration and checking for more details regarding texture-specific enumerations.


Language and API Variations:


Managed Direct3D 1.1:


C/C++ Direct3D 10:




D3D #6: Depth buffering problems. [Table of Contents]

Depth buffering is a very useful, if simple, feature of 3D rendering. However, if not treated with some care it is possible to cause some very strange artefacts and generally appear as though it were broken. The primary symptom of this is that geometry/objects start to appear in front of objects they should actually be behind; this can happen consistently or it can flicker and change from frame-to-frame.


When using semi-transparent/blended rendering you may encounter problems better described in D3D #4: Alpha blending doesn’t work correctly or how do I do blending?


There are two ways to configure a depth buffer – either via the D3DPRESENT_PARAMETERS used when calling CreateDevice(), or post-creation via a CreateDepthStencilSurface() call. For most cases the former is the easiest method, but the latter might be required when dealing with effects using multiple render-targets. Note that you may need to use the D3DRS_ZENABLE and D3DRS_ZWRITEENABLE render states (via calls to SetRenderState()).


One problem that can occur is selecting a depth buffer format that is unsupported by the target hardware. The D3DFORMAT enumeration lists 9 different depth buffer formats and there is no guarantee as to how many of those will be supported. A common situation is that people change from 16bit to 24bit to 32bit to try and solve their depth buffer problems without realising this may not work on all hardware! Best practice requires you to use CheckDeviceFormat() and CheckDepthStencilMatch() to verify that the target hardware supports the desired format.


The primary cause of depth-related errors is due to incorrectly computed/stored values. Simply put, if the values stored in your depth buffer are of poor quality then the algorithm will not function accurately. Understanding how the depth values are computed and stored requires some background/mathematical knowledge – the classic “Learning to Love Your Z-Buffer” article is a good starting point. In practical terms it is almost always due to the configuration of the near and far clipping planes.


All of the D3DX matrix functions for constructing a projection matrix (such as D3DXMatrixPerspectiveFovLH()) take two floating-point values for “zn” (Z-Near) and “zf” (Z-Far). Integer depth values will be assigned non-linearly to geometry between the near and far values. The larger the distance this range covers the more spread out depth values will be. When sufficiently spread out you increase the chance that pixels will get assigned the same depth value – when the final per-pixel depth test occurs it becomes impossible to tell which pixel is actually the correct one to display.


It can be tempting to set the near plane to 0.0 (or some suitably small value) and the far plane to 10000000.0f (or some suitably large value) to ensure that no geometry gets clipped, but this is precisely what causes depth buffering problems! The golden rule is to set the near plane as far from the camera as possible and the far plane as close to the camera as possible. For example, if your graphics are primarily close-range indoor scenes then setting Z-Near to 1.0 and Z-Far to 50.0 might be a good choice. Some degree of experimentation may well be required.


The other major cause of depth-buffering errors is co-planar geometry. This particular case results in “Z-Fighting” where you will see all, or parts of, either surface appearing in the final image – often flicking/alternating as the camera moves. This is simply due to geometry being assigned the same depth value – although it is worth noting that if depth precision is low (e.g. the near/far plane parameters aren’t configured correctly) the geometry doesn’t have to be mathematically co-planar. A typical example of Z-Fighting might be where the wall behind a painting starts to appear through the painting because both (flat) pieces of geometry are at very similar (or identical) depths.


If you have a problem with Z-Fighting it is possible to solve it by biasing the generated depths – refer to the Depth Bias page in the documentation for more details. Simply put, you can add an offset to one of the co-planar surfaces (remember to reset/disable the offset for the other surface!) to ensure that it appears in front.


Language and API Variations:


Managed Direct3D 1.1:


C/C++ Direct3D 10:




D3D #7: I don’t see anything on the screen, why? [Table of Contents]

It is near impossible to answer this question in the general case – it is simply too dependent on what your application is supposed to be doing. However, consider the following guidelines for trying to figure it out…



Language and API Variations:


Managed Direct3D 1.1:


C/C++ Direct3D 10:




D3D #8: I tried my application on a different computer and it didn’t work or looks different. [Table of Contents]

Once your software project is complete it is conventional to distribute it – either freely online as a demo, or possibly as part of a commercial paid-for package. In either case it is of obvious importance that it works correctly on the end-user’s computer; if it doesn’t they’re most likely to come asking/blaming your first!


However, in the PC market there are millions (if not billions) of possible hardware and software configurations. If you’re lucky you might own several computers with different configurations, or better yet you have access to a suite of test machines. Regardless of this, you’re very unlikely to get close to covering all of the combinations available in the “real-world”.


Unsurprisingly, Direct3D 9 is defined from a set of specifications (for both hardware and software) – however this specification is not entirely water-tight. Specifically it leaves some points as being “undefined” or in some cases “suggests” and “recommends” rather than “requires”. This becomes a problem when multiple IHV’s (Independent Hardware Vendors – e.g. Intel, Nvidia and ATI) have to create their hardware and drivers to support the API. Each company is perfectly entitled to interpret the “undefined” parts in different ways.


The best strategy is to not rely on any undefined behaviours, and also to be very wary of cases where you might be using optional capabilities. Refer to the previous D3D #5 “Hardware capabilities and enumeration”. Despite this, it is possible to accidentally rely on a default/undefined feature or result.


In these cases it pays to use the “Reference Rasterizer” (sometimes referred to as “RefRast”) – a special software device included only in the DirectX SDK. As the name suggests, it is for reference purposes – it implements all the features of Direct3D 9 in software in a way that conforms to the specification. Whilst the reference rasterizer must still choose what to do in the “undefined” cases, in all other situations it should be correct according to the specifications.


To use the reference rasterizer you will need the DirectX SDK installed – this is important to note as it will mean that it is unlikely to work on your end-user’s machine. The switch is simple – in your IDirect3D9::CreateDevice( ) call you would typically use D3DDEVTYPE_HAL as the DeviceType parameter – simply swap this for D3DDEVTYPE_REF.


You will quickly realise that the reference rasterizer is very, very slow – getting above 3-4fps on a powerful PC is rare. This is quite intentional due to it being written for accuracy/correctness rather than performance. It is worth building a debugging aid into your software to allow you to jump to different parts, or to play back previously recorded input. This way you can either quickly focus in on the section that you need the reference rasterizer for, or you can leave it to automatically collect data for you!


In most cases the reference rasterizer will pick a side – closely (or exactly) matching the results from one of your real-world test samples. This is particularly useful as it helps you isolate where the problem might lie. The next steps are not easy to define – it is largely a case of experience and trial-and-error. Trying to determine the differences between the two pieces of hardware is often a good start – many differences come down to your software incorrectly assuming that hardware is capable of something it is not. As previously mentioned, see D3D #5 “Hardware capabilities and enumeration” for more details.


If you get to a situation where one particular hardware manufacturer differs from others it can be very useful to contact the manufacturer directly. Most companies offer a “developer program” and have specific “developer relations” departments. The two of the most common graphics IHV’s have developer sites found at: and


As a forward looking note, the Direct3D 10 specification is much more strictly defined than in previous versions. This should hopefully reduce if not eliminate the aforementioned problems!


Language and API Variations:


Managed Direct3D 1.1:


C/C++ Direct3D 10:




D3D #9: I’m leaking memory, help! [Table of Contents]


This is a common problem with many C/C++ programs where the programmer has substantial responsibility for memory management. For the more general case you might want to check the Wikipedia entry on memory leaks. Use of your debugger is also particularly useful when tracking down this type of error – if you’re not familiar with this you might want to read Introduction to Debugging by Richard “superpig” Fine.


Memory leaks occur in Direct3D when you allocate a resource via one of the many Create**() calls (e.g. IDirect3DDevice9::CreateTexture()) but don’t Release() it before the application terminates. A common way of getting leaks is not handling the lost device scenario correctly. DirectX uses the Component Object Model (COM) for objects, so is backed up by a reference counter that determines when a (potentially shared) object really gets destroyed. If the reference count hasn’t returned to zero by the time the application finishes it is considered as a leak. For more information on COM see the technology’s homepage as well as the MSDN documentation.


The Direct3D debug runtimes (see D3D #22: Using the debug runtimes for more details) will generate a large amount of debugging output when the application terminates with leaked resources. You will know about it if it happens! A way to make sure you don’t miss them is to enable the “break on memory leaks” option found in the DirectX control panel. As an example, take the following code:


HRESULT CALLBACK OnCreateDevice( IDirect3DDevice9* pd3dDevice, const D3DSURFACE_DESC* pBackBufferSurfaceDesc, void* pUserContext )



       pd3dDevice->CreateTexture( 128, 128, 0, 0, D3DFMT_X8R8G8B8, D3DPOOL_MANAGED, &pTex, NULL );

       return S_OK;



A simple modification to the “EmptyProject” template provided by the DirectX SDK. Compiling and running the above code will generate the following debug output when the application is terminated:


Direct3D9: (INFO) :MemFini!

Direct3D9: (WARN) :Memory still allocated!  Alloc count = 131

Direct3D9: (WARN) :Current Process (pid) = 00000a04

TestBed.exe has triggered a breakpoint

Direct3D9: (WARN) :Memory Address: 00334b18 lAllocID=1 dwSize=000047f8, (pid=00000a04)

Direct3D9: (WARN) :  Stack Back Trace

Direct3D9: (ERROR) :    [0] : Address 010ED4CB

Direct3D9: (ERROR) :    [1] : Address 010ED59B

Direct3D9: (ERROR) :    [2] : Address 010ED440

Direct3D9: (ERROR) :    [3] : Address 010E1D44

Direct3D9: (ERROR) :    [4] : Address 4FDFAF2E

Direct3D9: (ERROR) :    [5] : Address 00482F38

Direct3D9: (ERROR) :    [6] : Address 004659F8

Direct3D9: (ERROR) :    [7] : Address 00485A49

Direct3D9: (ERROR) :    [8] : Address 004C3280

Direct3D9: (ERROR) :    [9] : Address 004C2FBD

Direct3D9: (ERROR) :    [10] : Address 7C816D4F

Direct3D9: (ERROR) :    [11] : Address 00000000

Direct3D9: (ERROR) :    [12] : Address 00000000

Direct3D9: (ERROR) :    [13] : Address 00000000

Direct3D9: (ERROR) :    [14] : Address 00000000

Direct3D9: (ERROR) :    [15] : Address 00000000

… snipped for another 2300 lines …

Direct3D9: (WARN) :Total Memory Unfreed From Current Process = 644250 bytes

The program '[2564] TestBed.exe: Native' has exited with code 0 (0x0).


The important information to note is the lAllocID value that appears in the 5th line of the above listing. This can be plugged into the DirectX control panel to aid debugging. Although it is worth noting that many of the ID’s will point to basic objects like IDirect3D9 and IDirect3DDevice9 – simply due to the relationships of objects behind the scenes. It is also this relationship why the above fragment generates nearly 2400 lines of debug output!


If the above fragment is corrected to be:


HRESULT CALLBACK OnCreateDevice( IDirect3DDevice9* pd3dDevice, const D3DSURFACE_DESC* pBackBufferSurfaceDesc, void* pUserContext )



       pd3dDevice->CreateTexture( 128, 128, 0, 0, D3DFMT_X8R8G8B8, D3DPOOL_MANAGED, &pTex, NULL );


       SAFE_RELEASE( pTex );

       return S_OK;



Then the debug output becomes what you should always be aiming for:


Direct3D9: (INFO) :MemFini!


A slight variation on this is when using D3DX objects (e.g. ID3DXMesh) – the core runtime does not know about these, it only sees their constituent parts (e.g. a vertex and index buffer). If you’re leaking D3DX objects you should get additional debug information:


D3DX: MEMORY LEAKS DETECTED: 2 allocations unfreed (10716 bytes)

D3DX: Set HKLM\Software\Microsoft\Direct3D\D3DXBreakOnAllocId=0x1 to debug


Using the regedit system tool you can add (or modify) the appropriate key and it will break on the actual D3DX allocation.


Note that the above code fragment makes use of the SAFE_RELEASE() macro; if you’re using the SDK samples as a starting point they will be pre-defined for you in dxstdafx.h – for reference they look like:


#define SAFE_DELETE(p)       { if(p) { delete (p);     (p)=NULL; } }

#define SAFE_DELETE_ARRAY(p) { if(p) { delete[] (p);   (p)=NULL; } }

#define SAFE_RELEASE(p)      { if(p) { (p)->Release(); (p)=NULL; } }


Only the latter of the three is required in this instance, but making use of the other two is no bad thing. By making sure that you set the pointer to NULL after it is deleted you can easily catch any nasty bugs where your code tries to dereference released/deleted objects.


Tracking down memory leaks is both tedious and time-consuming, so by far the best remedy is simply to not leak memory in the first place. This is easier than it sounds – just pay attention when allocating resources and design your software accordingly.


Firstly – when you first declare an object that you later intend to create you should immediately add the appropriate SAFE_RELEASE() code. Then you can go back and add in the appropriate Create**() call. Everyone’s programming style varies, but little bits of discipline help avoid the embarrassing case of simple forgetting to add the right lines of code.


Secondly – watch for abnormal termination and error returns from functions. If you test all the return codes of functions and set it up to return from a function early (e.g. in the case of an unrecoverable error) you also need to remember to SAFE_RELEASE() any temporary objects. This one can be very hard to correct unless you explicitly test all routes through your software (which obviously gets more difficult the larger it gets!). An unexpected error later on in the project might suddenly throw up a load of unexpected memory leaks due to the error handler not releasing any temporary objects.


There are a couple of useful tricks that you can use to get better memory leak information. Depending on how much effort you want to invest in such tricks you can generate quite powerful logging/debugging aids that make memory-leak checking a breeze.


A common trick is to make use of ATL’s CComPtr class. This class transparently manages your AddRef() and Release() calls so you no longer have to worry about them!


Another more complex trick involves using the private data associated with COM objects. All the Direct3D resources derive from the IDirect3DResource9 class; this class exposes a SetPrivateData() function. This is so you can “tag” a resource with application specific data that Direct3D doesn’t need to know about. Clever use of this can allow you to maintain a global list of “dead” or “alive” resources as well as tagging them with extra information such as location and time of creation. The ‘D3D Memory Leaks (dec05 sdk)’ thread on the DirectXDev mailing list contained a similar suggestion from Wessam Bahnassi.


Language and API Variations:


Managed Direct3D 1.1:


C/C++ Direct3D 10:




D3D #10: Is x faster than y? [Table of Contents]

This question appears in many forms – be it an outright question starting a new topic, or when asking which potential solution is going to be the faster. Simply put it is a very difficult question to answer and in most cases the best you will get from fellow forum members is an educated guess.


Making such an educated guess often requires deep knowledge of all aspects (the D3D pipeline, the hardware, graphics theory as well as the actual application itself) of the situation and a healthy dose of experience. Referring to a classic text such as Real-Time Rendering (2nd Edition) is a good place to start; this book in particular covers much of the background logic/theory that can be used to make assertions about the performance of real-time graphics software. Also, keeping up-to-date with the latest information provided by IHV’s such as ATI and Nvidia is a good idea – they often make conference proceedings freely available. The IHV’s know their hardware’s characteristics and will often make hints, suggestions or out-right requirements necessary for making the most of their hardware.


Just to make things more complicated, the huge number of hardware configurations available in the PC market makes it even more difficult to make generalisations about performance. What works well on one generation of hardware may not work so well on an earlier (or future) generation.


Despite this bleak outlook there is one way that you can derive performance characteristics – profiling. As with more general programming there are a number of tools available for capturing performance data from a Direct3D application: PIX for Windows (plugin for ATI hardware), NVPerfHUD and NVPerfKit (both for Nvidia hardware only) to name but a few. Of course it is always possible to “roll your own” profiler using calls to QueryPerformanceCounter() and QueryPerformanceFrequency(); however manually profiling Direct3D is not as simple as it is for regular applications. In particular, Direct3D may not immediately complete any requested work – it might be queued up and performed shortly after. Thus taking direct before/after times may only reveal the duration it took to add it to the queue rather than the actual time it takes to do the work. Refer to the SDK documentation on Accurately Profiling Direct3D API Calls for more information.


Making good use of PIX for Windows will often pay off when trying to determine performance characteristics. Use of PIX from the command line could allow you to create automated batch tests to routinely monitor and analyse the performance of your software. Using the PIX functions can make the data much easier to interpret (see this forum thread for an example).




D3D #11: Creating and exporting .x files. [Table of Contents]

The .X file format is convenient for storing and loading geometric definitions for use with Direct3D. The format is a very powerful and flexible one built upon the notion of templates and can therefore be used for a large number of graphical applications.


Loading simple X-Files into Direct3D is straight forward – refer to the tutorials (%DXSDK_DIR%\Samples\C++\Direct3D\Tutorials\Tut06_Meshes\ in particular). Procedurally generating mesh data and then saving it from your application to a .x file is equally straight-forward – calling D3DXCreateMesh() followed by D3DXSaveMeshToX() should be sufficient. Refer to the D3DX Mesh Functions and X File Reference pages for specific details.


Actually creating models and animations to import into Direct3D is a completely different issue, one that is neither related to DirectX programming nor suitable for the DirectX forum! Instead it is more relevant to the Visual Arts forum and will probably require the use of a 3rd party 3D modelling package. This thread was started as part of a GameDev.Net contest and contains a useful list of places to download free models.


Language and API Variations:


Managed Direct3D 1.1:


C/C++ Direct3D 10:




D3D #12: My textures are blurry or distorted. [Table of Contents]


A common problem can occur when attempting to render 2D graphics via Direct3D – trivial implementations can end up introducing blurry and distorted results. Whilst it might not be immediately intuitive, it is perfectly possible to fix all these problems!


Firstly, be sure to read and understand the classic ‘Directly Mapping Texels to Pixels’ article in the DirectX SDK documentation. The key observation is that a pixel on your monitor is an area (albeit very small!) whereas Direct3D defines a colour as a point in the middle of a given pixel. Whilst getting the coordinates wrong with point filtering is less of a problem (but still prone to artefacts) when linear or anisotropic filtering is enabled you can easily end up sampling incorrect pixels.


Sampling incorrect pixels is the easiest way of introducing distortion and artefacts into your final image.


If you find that it is difficult to get around the texel-addressing issues then you may wish to consider adding “gutter pixels” into your source image. This process effectively clones border pixels such that any misaligned fetches don’t actually show up as errors. This can be difficult to implement (and also requires wasting extra pixels in a source texture), but as a last-resort it is an option.


There are two common ways of introducing blurry results into your images.


Linear filtering where the final result is more than twice as big or smaller than half the size tend to produce incorrect results. This, unfortunately, is a limitation with the mathematical foundation of such filtering rather than a problem with the Direct3D API. Too much scaling will generate bad results. The best solution to this is to increase the resolution of the source artwork.


Secondly is that D3DX texture creation functions can resize the source image without you knowing. This only happens where the source image has a non-2n dimension – D3DX will scale it up to the nearest 2n texture dimension. Refer to D3D #23: Texture creating/loading enumeration and checking for more details on this.


Finally, it is worth noting that the Direct3D 10 API removes the traditional centre-pixel offset; thus the problems outlined in this entry should cease to be a problem in the future!




D3D #13: Resource allocation best practices. [Table of Contents]

Direct3D applications tend to use a large number of resources of varying formats such that it becomes important to manage them effectively given the relatively limited amount of storage space available. Texture resources in particular can easily consume vast amounts of memory.


As with the majority of software, allocating and de-allocating resources is a relatively expensive operation. Where real-time performance is important it makes sense NOT to do any resource manipulation inside the core render-loop. Ideally all resources should be created at the start of the application, used in the core rendering loop and then released as the application is terminated. This cycle can be extended to include creation/release of resources at defined points in time – e.g. the loading screen(s) for a game.


Unless resources are limited it can be better to create a pool of resources that get created at the start of the application and then (re-)used as-and-when is necessary. This avoids the need to create/release resources at the point of use.


Micro-managing resources can be difficult; IDirect3DDevice9::GetAvailableTextureMem() will return the number of megabytes (expressed as bytes!) available for resources, but this rarely corresponds directly to the VRAM size printed on your graphic’s card box (128, 256 or 512mb for example). Use this value for guidance only!


If your texture resources are stored with full mip-chains you may wish to consider using the D3DX_SKIP_DDS_MIP_LEVELS() macro – several of the D3DX creation functions (e.g. D3DXCreateTextureFromFileEx()) can use this to avoid loading high-detail levels. A simple use of this might be to link this with a level of detail setting – e.g. “high detail” skips no levels, “medium detail” skips two levels and “low detail” skips four levels.


Direct3D resources are assigned to one of four pools defined by the D3DPOOL enumeration although in practice it will be D3DPOOL_MANAGED and D3DPOOL_DEFAULT that are most heavily used. As a general rule of thumb, create all resources in D3DPOOL_MANAGED unless they specifically need to be located in D3DPOOL_DEFAULT (render-targets have this requirement); resources placed in the managed pool will be scheduled by the Direct3D runtime – swapped in (or out) of VRAM as appropriate. The runtime’s scheduler rarely proves to be a performance problem, but if profiling does show it is then you can look at pushing key resources into D3DPOOL_DEFAULT (which stay resident in VRAM).


A useful trick is to make use of the D3DQUERYTYPE_RESOURCEMANAGER query (see Queries in the SDK documentation). This query is only available with the debug runtimes thus has limited use when doing real-world testing but can provide valuable guidance/insight during development. The query can give you general information (see D3DRESOURCESTATS) about what data the scheduler is moving around on a frame-to-frame basis. This combined with other profiling information can help to determine if resource usage/allocation is the cause of any performance problems.


Use of SetLOD(), SetPriority() and PreLoad() are rarely required, but can allow your application to influence the way the managed-pool resources are scheduled. You can use this method to give importance to textures most important to the game (e.g. signs, maps and other high-detail resources) and scale back importance of “background” textures. Whilst the runtime’s scheduler is sophisticated it lacks the context that you as the application developer has available.


Another advantage received by using managed resources is that they will survive the device-lost scenario. Not only does it make application code simpler (less chance of memory leaks, less calls to recreate resources) it is often faster (re-loading all resources for disk is generally a slow process).


When creating resources it is best to create all D3DPOOL_DEFAULT resources before creating any D3DPOOL_MANAGED ones; this makes sure that those requiring VRAM residency are allocated space before those that don’t. If VRAM is full then managed resources won’t fail to be created (they just remain in system/AGP RAM until needed) but default pool resources will.


Language and API Variations:


Managed Direct3D 1.1:


C/C++ Direct3D 10:




D3D #14: Speeding up locking of resources. [Table of Contents]

This process comes up fairly often in Direct3D programming. Any code that makes use of the LockRect() or Lock() calls (typically associated with textures, surfaces and vertex/index buffers) is locking a resource. You will use this technique if you want to read or write the raw data stored in the buffer – for any number of possible algorithms.


The key problem is that it is often very slow. The fact that, apparently, OpenGL can do it quicker is a moot point – in Direct3D it is slow! One of the key reasons is the “behind the scenes” work that must be done by the API, driver and hardware. Where possible GPU’s will run in parallel with the CPU, thus resource modification can end up with some typical concurrent programming problems.


If the resource you are trying to modify is used as a dependency (input or output) of an operation in the command queue then you can incur pipeline stalls and flushes. A stall will occur when the GPU cannot make any progress until you finish manipulating the resource by calling Unlock(). A flush will require that some (or all) pending operations will have to be completed before you can access the resource.


Locking is a blocking operation – If you call Lock() on a resource that is not immediately available it will stall the CPU until it is. This effectively synchronizes the two processing units and reduces overall performance.


The data must be transferred to locally addressable memory – your CPU cannot directly access the memory stored on your video card, instead the driver must stream the requested data back to CPU-addressable RAM. This step can be slow if you are requesting a large amount of data, and must be completed before the API will unlock your application. As a more subtle consequence, the blocking of your application and the usage of the AGP/PCI-E bus effectively stops your application doing any further work, which can severely reduce overall performance.


As described above, locking is slow – mostly due to the latency rather than bandwidth. Avoiding locks is good practice, but for “load-time” or initialization work they are usually fine, acquiring locks in the main application/game loop (mixed in with other GPU/graphics functions) is where you’ll get punished the most.


If you really have to manipulate resources in your main loop there are a few tricks you can use to hide the latency. There is no single way of “solving” this problem, it is a case of using clever programming to try and reduce the impact that it has.


Firstly, make sure you get the creation flags correct (see the D3DUSAGE enumeration) – these are often optional and must be specified when the resource is created. When you acquire the lock make sure you get the locking flags correct (see the D3DLOCK enumeration) – its good practice to help the driver/GPU where possible; by giving it additional information via these parameters it might be able to perform a better/faster operation. A particularly good example of correct parameter usage is “dynamic resources” – the DirectX SDK documentation includes two sub-sections on this: [Using Dynamic Textures] and [Using Dynamic Vertex and Index Buffers]. If you get these combinations wrong then the debug runtimes will often scream and shout at you – make sure you check!


As previously mentioned, the duration of the lock (how much time is spent between Lock() and Unlock() for example) can affect how badly you stall your application and/or GPU. Performing all of your manipulation whilst the lock is held might seem a more obvious way of programming, but it is not good for performance. Only consider doing this if it’s a quick operation or you need to read and write data.


If you are only reading the data you can use a quick memcpy_s() operation to copy the locked data to a normal system memory array, unlock the resource, and then do your processing/reading. A bonus is that you could also farm out the work to a “worker thread” and gain some time via concurrent programming. Similarly, if you need to only write data then you can also copy a big chunk of system-RAM data into the resource using a memcpy_s() call. If you need to read data, process it, then write it back again you could explore the possibilities of two locks (one for the read, one for the write) being faster than a lengthy single lock.


// Compute the number of elements in this vertex buffer...


m_pVertexBuffer->GetDesc( &pDesc );


size_t ElementCount = pDesc.Size / sizeof( TerrainVertex );


// Declare the variables

void *pRawData = NULL;

TerrainVertex *pVertex = new TerrainVertex[ ElementCount ];


// Attempt to gain the lock

if( SUCCEEDED( m_pVertexBuffer->Lock( 0, 0, &pRawData, D3DLOCK_READONLY ) ) )


       // Copy the data

       errno_t err = memcpy_s( reinterpret_cast< void* >( pVertex ), pDesc.Size, pRawData, pDesc.Size );


       // Unlock the resource

       if( FAILED( m_pVertexBuffer->Unlock( ) ) )


              // Handle the error appropriately...

              SAFE_DELETE_ARRAY( pVertex );



       // Make sure the copy succeeded

       if( 0 == err )


              // Work with the data...


              // Clean-up

              SAFE_DELETE_ARRAY( pVertex );






       // Clean-up the allocated memory

       SAFE_DELETE_ARRAY( pVertex );



Consider a bounded-buffer (aka “ring buffer”) approach. Create multiple copies of each resource (for example 3 render targets or vertex buffers) and alternate between them. The intended goal is that you will be locking/manipulating one resource whilst the pipeline can render to or from the other – the CPU and GPU are no longer reliant on the same resource. The down-side is that the results you’ll get back can be “stale” and it doesn’t work if the individual steps aren’t separable.


// Declarations

DWORD dwBoundedBufferSize = 4;

DWORD dwCurrentBuffer = 0;

LPDIRECT3DSURFACE9 *pBoundedBuffer = new LPDIRECT3DSURFACE9[ dwBoundedBufferSize ];


// Create the resources

for( DWORD i = 0; i < dwBoundedBufferSize; i++ )


       if( FAILED( pd3dDevice->CreateRenderTarget( ..., &pBoundedBuffer[i], ... ) ) )


              // Handle error condition here..




// On this frame we should render to 'dwIndexToRender'

DWORD dwIndexToRender = dwCurrentBuffer;


// We should lock 'dwCurrentBuffer + 1' - which will be the

// oldest of the available buffers, thus hopefully not in the command queue.

DWORD dwIndexToLock = (dwCurrentBuffer + 1) % dwBoundedBufferSize;


// At the end of each frame we make sure to move the index forwards:

dwCurrentBuffer = (dwCurrentBuffer + 1) % dwBoundedBufferSize;


// Release the resources

for( DWORD i = 0; i < dwBoundedBufferSize; i++ )

       SAFE_RELEASE( pBoundedBuffer[i] );


SAFE_DELETE_ARRAY( pBoundedBuffer );


If you need to read/write a large amount of data consider a staggered upload/download. Over the course of 10 frames, upload 10% of the data each frame – appending to the previous sections. The idea is to maintain short locks and to allow other graphics operations to be performed between locks. However, this method is not always an improvement – but it is at least something worth considering.


As originally stated, a lock can affect the concurrency of the CPU/GPU, thus you want as few locks as possible. If many resources need to be updated, consider spreading it out over a number of subsequent frames. This way you will get a less noticeable performance drop. A possible implementation is to maintain a simple queue of resources/operations that need to be performed and then allow only 1 (or 2, or 3…) per frame regardless of how many are waiting.


Language and API Variations:


Managed Direct3D 1.1:


C/C++ Direct3D 10:




D3D #15: Debugging vertex and pixel shaders. [Table of Contents]

As with all code it is extremely useful to be able to debug it. Unlike normal code you might write shaders are not usually executed on the CPU – rather they will be sent to the GPU for processing. This makes debugging hard as it is much more difficult to examine the current state of execution; it’s made even harder by the fact that a relatively short function might get execute tens or hundreds of thousands of times per frame – thus trying to debug a specific invocation is almost impossible.


However, the DirectX SDK does include a shader debugger – but one that only works with versions of Visual Studio prior to 2005. The menu entries (“Start with D3D Debugging”) exist in the Visual Studio 2005 IDE but they don’t actually do anything. Shader debugging requires that you use software vertex processing for debuggable vertex shaders and a reference rasterizer for debuggable pixel shaders. Note that many of the SDK samples have #define DEBUG_VS or #define DEBUG_PS commented out at the top of the main file – follow these for more information.


Some developers like the IDE based shader debugger but for the most part it was awkward to use and often required changing of the application’s code to get useful results. Consequently, expect the shader debugger to disappear in favour of tools such as PIX for Windows. PIX changes on a regular basis and includes more and more debugging capabilities. The only major downside is that you have to record a session before you can debug it which is much less intuitive than step-through debugging in the IDE.


The DirectXDev mailing list hosted a somewhat heated discussion on this topic that can be read here: Debugging with "Start With Direct3D Debugging" (29th June 2006, changing to be Re: VS 2005 (was Debugging with "Start With Direct3D Debugging") on 1st July 2006). There are several comments from the DirectX developers as to how they see these tools moving forwards and why they chose the route they have.




D3D #16: Getting information from/about shaders and effects. [Table of Contents]

Shaders and effects are becoming an increasingly standard part of most Direct3D-based applications – this is definitely a good thing (at least when compared to the legacy fixed-function route!). However, as with any data-driven architecture it can be difficult to gain enough information about the data being used. Specifically, you as the developer need to write code that queries the incoming data in an abstract way and then build internal data-structures (or whatever is appropriate) to match. The exact mechanics of this are difficult to discuss without a substantial amount of context as it depends hugely on what your application is attempting to achieve.


D3DX is where the HLSL and Effects framework currently reside (they move to the core runtime in Direct3D 10) and offer several way of getting useful information – generally referred to as “reflection” information.


When using individual shaders (that is, directly managing the IDirect3DVertexShader9 or IDirect3DPixelShader9 interfaces) compiled from HLSL (usually using D3DXCompileShaderFromFile()) you can use the ID3DXConstantTable to extract input related information.


When allowing the Effects framework you will need to use the ID3DXBaseEffect and ID3DXEffect interfaces to extract information. Effective use of D3DXHANDLE’s (see the Handles page in the documentation) is essential here.


A very powerful concept is to make use of annotations (see the Add Parameter Information with Annotations sub-section within the Using an Effect documentation), this can allow the effect author to annotate data with information for the application thus allowing for an even more flexible architecture. The DirectX Standard Annotations and Semantics documentation and the DxViewer (found in %DXSDK_DIR%\Utilities\Source\DxViewer\ folder) source demonstrate advanced use of this feature.


Language and API Variations:


Managed Direct3D 1.1:


C/C++ Direct3D 10:




D3D #17: Picking geometry. [Table of Contents]

This particular problem isn’t strictly a Direct3D issue – it’s more in the mathematics/physics domain. However it still gets asked on a regular basis such that it warrants an entry in this forum’s FAQ. As a general note you should consider your coupling between physics and graphics – whilst it might seem initially logical that Direct3D stores geometry thus should also be the data-source for physics, but for various reasons (mostly performance and architectural) it might be substantially better to store a separate copy of the necessary data alongside your physics code.


If your mathematics background is weak or you don’t understand this brief overview you might want to take a trip to the Maths and Physics forum (also check its Forum FAQ for useful resources).


The term “picking” typically refers to using the mouse cursor’s screen position to select a 3D model; for example, clicking on a particular vehicle in a Real-Time Strategy game. It becomes complex simply because cursor positions are two-dimensional whereas geometry is defined in three dimensions.


Depending on which version of the DirectX SDK you are using you may wish to refer to the “Pick” sample (located in \Samples\C++\Direct3D\Pick\) as it implements the methods discussed here.


Gathering the necessary information


The 2D mouse coordinate can be found via Windows messages or more directly via GetCursorPos() – remember to translate it via ScreenToClient() or you’ll find your picking is offset slightly.


Next you will need the three main transformation matrices – World, View and Projection. Bare in mind you’ll need them in separated form (some Shader/Effect based code stores them as the combined form) and that you won’t be able to use GetTransform() if you created your device using D3DCREATE_PUREDEVICE. You need to pay particular attention to the coordinate systems and transforms being used – the matrices used must be the same as those used to render the data.


You will also require the current view-port configuration – IDirect3DDevice9::GetViewport() is ideal, but not available on a pure device. If you can’t use this function directly then filling in a D3DVIEWPORT9 structure is your only choice.


Finally the most obvious one – triangle and mesh information. There is no definite way of retrieving this as it completely depends on how and where you store your geometry. If using an ID3DXMesh then you will need to use the index buffer (via ID3DXMesh::LockIndexBuffer()) to look up the three vertices (via ID3DXMesh::LockVertexBuffer()) for each triangle in the mesh.


Choice of method


Most methods require you to define a “ray” in 3D space – this is just another name for a line defined in vector form. Specifically you require an origin and a direction. Computing this is very simple. Take the D3DVIEWPORT9 structure you retrieved and you can define two screen-space points: <MouseX, MouseY, MinZ> and <MouseX, MouseY, MaxZ>. Convert these two points into model space (or world space – whichever is appropriate!) and you can then define your ray as being from the first point (with MinZ) in the normalized direction towards the second point (with MaxZ).


The first choice comes in how you do the aforementioned transform. The easiest way is to use D3DXVec3Unproject() – pass in the previously discussed information and it’ll do all the hard work for you. Alternatively you can manually perform the transformation – just make sure that you take the view-port matrix into account (see Viewports and Clipping for further details). Take the inverse of the final transformation matrix (World * View * Projection * Viewport) and then multiply the two screen-space points to find their model-space equivalents – again, this can be done manually or you could use D3DXVec3TransformCoord().


You now have a ray with which you can check intersections using – the fun isn’t over just yet!


Your geometry is almost certainly going to be defined in terms of triangles, thus the next part is a simple case of testing your ray for intersection against each and every triangle in your mesh. Using your favourite search engine with terms such as “ray triangle intersection” should reveal the mathematics involved here (as a hint – it usually involves computing and testing the barycentric coordinates from a line-plane intersection). To keep things simple you might choose to use D3DXIntersectTri() function to return both the true/false and barycentric results.


If your geometry is contained in an ID3DXMesh then you have the option of using D3DXIntersect() or D3DXIntersectSubset() functions.


Performance considerations


As you can surely appreciate from the previous description – even the simple ray/triangle tests can get very expensive when a large number of triangles need to be tested. The best way to improve this is to reduce the number of triangles tested and luckily this is often very easy to do.


The name of the game is “hierarchical testing” – use simple shapes that completely surround the geometry – cubes, spheres, capsules and ellipse’s are all common choices. A single test against this bounding primitive will crucially tell you if the ray even gets close to intersecting the enclosed object. If the ray doesn’t intersect the bounding primitive you can skip the testing of any contained triangles – an instant performance gain.


D3DX provides two useful functions for these sorts of tests – D3DXSphereBoundProbe() (also refer to D3DXComputeBoundingSphere()) and D3DXBoxBoundProbe() (see D3DXComputeBoundingBox()).


Depending on how much picking you expect your application to perform (or if a profiler indicates it is a big problem) then multi-level hierarchical systems are a good choice. Group objects together by locality (e.g. all objects in a particular room of a building) and attempt to reject multiple objects with a single test. Referring back to the opening statement – if you were to keep a physics-specific copy of geometry data then you could pre-build a hierarchy of picking data (and possibly combine it with other physics algorithms) for even faster/simpler hierarchical testing. There is no built-in hierarchy with the default Direct3D geometry storage methods.


The next optimization to try is “rolling your own” intersection testing code. D3DX provides many of the mathematical functions for you, but if you were to write them yourself (which isn’t that hard) you’ll find that a lot of intermediary results could be cached and re-used. This is an example of a “context dependent” optimization – D3DX does not know what your application is doing, but you as the application programmer know the higher algorithmic goal and can introduce optimizations accordingly.


Language and API Variations:


Managed Direct3D 1.1:


C/C++ Direct3D 10:




D3D #18: Draw call overhead. [Table of Contents]

This particular characteristic is specific to Direct3D 9 and does not apply so much to Direct3D 10 (and doesn’t seem to affect OpenGL so much, thus confuses developers migrating to D3D). Simply put, there is an overhead to all of the drawing calls (DrawPrimitive(), DrawPrimitiveUP(), DrawIndexedPrimitive(), DrawIndexedPrimitiveUP()) and any indirect drawing calls (such as ID3DXMesh::DrawSubset()). This overhead applies regardless of how much geometry you pack into a single call and is related to the internal architecture of Direct3D rather than anything specific to the application you might be writing.


The duration of this overhead is essentially wasted/lost time – your application is not doing any useful work for this period (you can, however, hope that the parallelism of the GPU allows for some work to be performed) thus the more times (or more often) you incur this overhead the more wasted time you clock up. This can very quickly become a substantial performance problem.


There is no fixed number of drawing calls you can fit into a single frame, rather a number of guidelines. Unfortunately these guidelines change over time (for example, refer to the “Batch, Batch, Batch!” slides from Nvidia – note that they refer to CPU speeds substantially below those considered standard today). Ideally you want to aim for no more than 500 draw calls per frame; in most situations between 300 and 500 draw calls per frame will allow you to maintain a reasonable level of performance.


As is often found – it can be very easy to exceed these guidelines. As an example, a real-time strategy game might have several hundred units on screen (each being a single draw call) a terrain background as well as other bits of scenery such as trees and buildings (each of which are a single draw call) and a complex GUI/HUD (which could easily require a large number of draw calls). If implemented in the simplest possible way for the programmer it is not difficult to see how this could require over 500 draw calls per frame.


Various strategies exist to overcome this problem but they all essentially boil down to doing as much work in each draw call as possible. If the overhead is constant regardless of amount of work then it makes sense to try to reduce the number of times your application incurs this overhead. Simple methods involve aggressively culling any geometry that is not visible (why incur overhead for an object that wont appear in the final image?) and grouping together multiple objects in the same call (thus only incurring the overhead once for N number of objects).


More complex, yet more efficient, methods involve “instancing” which is demonstrated via the Instancing sample in the DirectX SDK. Note that real geometry instancing technically requires a vs_3_0 capable GPU, but ATI hacked a way around this to enable it for their vs_2_x hardware – refer to this knowledgebase entry for more details.


As an aside to reducing the overhead it is also generally a good optimization to increase the batch sizes for your rendering code. The GPU and CPU are capable of operating concurrently and independently thus having them synchronized greatly reduces the potential performance of the system. Each entry into the D3D API often (but not always) requires communication with the GPU driver and possibly also the GPU itself – handing over more work in fewer calls allows for both processing units to work for longer periods of time without intercommunication.




D3D #19: DrawIndexedPrimitive parameters. [Table of Contents]

A common problem when using IDirect3DDevice9::DrawIndexedPrimitive() is that ATI and Nvidia validate parameters in different ways. Specifically, ATI drivers tend to conform to the specification (and should match the debug runtimes and reference rasterizer) whereas the Nvidia drivers are more “forgiving” of incorrect values. This problem tends to rear its head when developing on Nvidia hardware (where the error won’t occur) and then testing it on ATI hardware and finding draw calls start failing. Note that when the runtimes are set to “retail” the parameters get passed directly to the drivers thus there is no standard validation checks.


If during development you have no (or limited) access to different hardware then you should check code via the debug runtimes (see D3D #22: Using the debug runtimes) and the reference rasterizer (see D3D #8: I tried my application on a different computer and it didn’t work or looks wrong/different).


To gain a full understanding of the parameters you should read Rendering from Vertex and Index Buffers in the DirectX SDK documentation (or read Jonathan Steed’s original blog entry). However, the following fragment of pseudo code (from the previous Forum FAQ) might also be useful:



idxptr = &indexbuffer[startIndex];

loop for each primitive


    index = *idxptr++;

    // Nvidia hardware does not contain these "assert"s.

    assert (index >= MinIndex);

    assert (index < (MinIndex + NumVertices);

    vertex = vertexBuffer[index + BaseVertexIndex];





D3D #20: Direct3D and multi-threading. [Table of Contents]

With the increasing presence of multi-core and multi-processor systems it is becoming more and more important to consider multi-programming options in your software. Even with traditional single-core CPU’s it can be a substantial improvement to delegate resource loading/manipulation to “worker threads” (mostly due to the I/O related stalls when loading from a disk).


The simple rule is to keep all Direct3D related work in a single thread. Some resource-related operations can be improved via multi-threading, but core rendering and pipeline configuration gains nothing from multi-threading.


More specifically, if you do need to use the API across multiple threads you must add the D3DCREATE_MULTITHREADED flag to your IDirect3D9::CreateDevice() call. Adding this flag forces the runtime to take critical sections on most API calls, adding around 100 cycles per call (source) – refer to this list for normal function call times. That’s before considering any delays resulting from actual contention. By keeping all device/API access in a single thread you do not need to add this flag, even if other parts of your application are using multiple threads. Good software design should allow you to completely avoid using this flag.


Refer to the “Coding for Multiple Cores” presentation from the GDC 2006 conference (more Microsoft presentations) for more information and general best practices.


As previously mentioned, various resource related algorithms suit a multi-threaded approach. Simple examples are streaming resources from disk, decompressing or extracting resources from virtual file systems. A simple way to mitigate any I/O stalling – either from regular storage of from virtual storage – is to do the I/O loading in the worker thread and then use one of D3DX’s “In Memory” functions on the main thread. For example, prefer D3DXCreateTextureFromFileInMemoryEx() over D3DXCreateTextureFromFileEx().


For textures, reading the Multithreaded texture loading again... discussion (from the DirectXDev mailing list) is a good idea. Creating a texture using the null reference (D3DDEVTYPE_NULLREF) and in the D3DPOOL_SCRATCH pool in a separate thread should then be loaded using D3DXLoadSurfaceFromMemory() on the device/main thread. However, comments on the same mailing list suggest that better performance can be achieved by “rolling your own” loading/copying mechanism. As with many things it’s a trade-off between simple or complex code and good or better performance.


It is worth noting that you can have the main device thread lock a resource and pass the pointer to a worker thread to perform the real work. Be sure to do a lock/copy/unlock operation and then pass the data to a worker thread – otherwise the threads (and GPU) will be synchronized until the work is complete thus gaining no multi-threading advantage! Be careful to make copies of the locked data (it will be invalidated after an unlock operation).


Compiling and creating shaders and effects is also a tempting candidate for a multi-threaded approach. Note that the D3DXCreateEffect() will create device objects, thus should either be protected using the D3DCREATE_MULTITHREADED flag or should only be run on the main device thread. To migrate this to a worker thread consider using ID3DXEffectCompiler (the API form of the command line fxc.exe) and passing the results to the main thread where D3DXCreateEffect() can be used. Use of D3DXCompileShader() should be safe across multiple threads as it doesn’t require device interaction and the returned byte code can be passed to the main thread to create the actual vertex/pixel shader. Refer to the Building Effects in a Worker Thread discussion on the DirectXDev mailing list.


However, as a general note, run-time creation of a large number of shaders is always going to be slow such that the best performance win might be to compile shaders as part of the build process.


It is worth noting that D3DX10 has a powerful multi-threaded model that should allow easy utilization of a multi-core host system. Refer to the Direct3D 10 SDK for more details.


Language and API Variations:


Managed Direct3D 1.1:




D3D #21: How to interpret the data returned by a Lock() operation [Table of Contents]


Please refer to D3D #14: Speeding up locking of resources for more information on locking. The information in this FAQ entry assumes that you’ve already acquired a lock on a resource.


Most resources can be locked such that Direct3D will return a pointer to the backing data; it is very context-dependent as to quite what this pointer refers to. In the majority of cases this context is known because your code will have created the resource with a known format/description, but in some cases (e.g. when loading unknown external data) it may not be so straight-forward. In these cases you can either try to dynamically interpret the data, or you can provide a number of alternative paths – generating an error if the data doesn’t match one of the expected methods.


There are two fundamental operations that you will require – reading and writing. The pointer you will receive will be of unknown (VOID* or VOID**) type so will require you to cast it to a usable type (for readability, a reinterpret_cast is better than C-Style casting), even then it may be necessary to decode further. Once you’ve extracted the required data you can directly manipulate the pointer and have it reflected in the resource. Changes will not be committed until a corresponding unlock operation is made.


In all cases, familiarity with the D3DLOCK and D3DUSAGE enumerations is useful. For example, dereferencing a pointer to data from a resource created with D3DUSAGE_WRITEONLY is A Bad Thing™.


Decoding Vertex Buffer Data


Vertex data is typically a multi-element structure, so having a ready-defined struct can be particularly useful. The following code-snippet shows how you might do this:


struct Vertex


       // appropriate declaration goes here



Vertex *pVertexData = NULL; // declare a pointer that we can access data via


pVertexBuffer->Lock( 0, 0, reinterpret_cast< VOID** >( &pVertexData ), 0 );


// Provided Lock() succeeded, we can now access the data:

// pVertexData[0] or (*(pVertexData++)) etc...


In some cases you may wish to decode vertex data that is in an unknown format. A typical example might be when you’ve loaded geometry from an .X file that has no strict format requirements. Using IDirect3DVertexBuffer9::GetDesc() can be a useful starting point – the D3DVERTEXBUFFER_DESC::FVF can be passed into D3DXGetFVFVertexSize() to determine the stride/size of an individual vertex. The D3DVERTEXBUFFER_DESC::Size field can be used to determine how many vertices exist in the buffer.


This allows you to allocate arbitrary blocks of data that represent a single vertex, and to step through the data per-vertex, but it tells you nothing about the make-up of an individual vertex. The D3DVERTEXBUFFER_DESC::FVF will be useful in determining what data is stored per-vertex, retrieving a declaration via D3DXDeclaratorFromFVF() will be easier to work with. However, it is possible to create an “FVF-less” vertex buffer – one that can’t be described by a legacy FVF identifier. If you hit this problem you’ll need to find a solution for retrieving the declaration (you can use D3DXGetDeclVertexSize() to get the size of an individual vertex) before continuing. If the vertex buffer belonged to an ID3DXMesh then the ID3DXMesh::GetDeclaration() function is a good starting point.


With the pointer cast to an array of BYTE’s (or equivalent) and a valid declaration (D3DVERTEXELEMENT9[]) it is possible to step through the array and use either casting or memcpy_s() operations to extract the individual fields as required.


If an ID3DXMesh is being used as a container for the unknown vertex data then cloning can be a useful trick. ID3DXMesh::CloneMesh() and ID3DXMesh::CloneMeshFVF() can create a temporary clone in a particular format, in particular it can be used to strip out any unwanted data. For example, if only the position element of the vertex is required then calling CloneMeshFVF() with D3DFVF_XYZ will generate a vertex buffer that can be directly cast to a D3DXVECTOR3 (or equivalent).


Decoding Index Buffer Data


Working with data from an index buffer is about as simple as it gets – it will be either 16bit or 32bit unsigned integer form. Using IDirect3DIndexBuffer9::GetDesc() gives you access to D3DINDEXBUFFER_DESC::Format (either D3DFMT_INDEX16 or D3DFMT_INDEX32) and D3DINDEXBUFFER_DESC::Size (which can be used to determine how many indices are in the buffer).


If using Visual C++ the unsigned __int16 and unsigned __int32 data-types (sometimes aliased as UINT16 and UINT32) can be used to address the VOID** that an IDirect3DIndexBuffer9::Lock() call returns:



pIndexBuffer->GetDesc( &desc );

if( D3DFMT_INDEX16 == desc.Format )


       // Access via 16bit integers

       unsigned __int16 *pIdx = NULL;


       pIndexBuffer->Lock( 0, 0, reinterpret_cast< VOID** >( &pIdx ), 0 );


       // We can now manipulate the index data:

       // pIdx[...] = ... or (*(pIdx++)) = ...




       // Access via 32bit integers

       unsigned __int32 *pIdx = NULL;


       pIndexBuffer->Lock( 0, 0, reinterpret_cast< VOID** >( &pIdx ), 0 );


       // We can now manipulate the index data:

       // pIdx[...] = ... or (*(pIdx++)) = ...



Even though the above only has 2 branches, the code-duplication can be a potential problem for maintenance and testing. Using C++’s generic programming features can help out here:


template< typename index_format >

ProcessIndexBufferData( IDirect3DIndexBuffer9 *pIB )


       T* pIdx = NULL;

       pIB->Lock( 0, 0, reinterpret_cast< VOID** >( &pIdx ), 0 );


       // access as usual



The branching is then simplified to:



pIndexBuffer->GetDesc( &desc );

if( D3DFMT_INDEX16 == desc.Format )


       ProcessIndexBufferData< unsigned __int16 >( pIndexBuffer );




       ProcessIndexBufferData< unsigned __int32 >( pIndexBuffer );



Although, this trick won’t work if the processing makes use of any particular 16/32 bit characteristics...


Decoding Texture Data


Texture data is one of the more difficult forms of data to work with; based on the number of threads in forums across the internet it seems to trip up even the best programmers! The following code focuses on regular 2D textures, but the basic principles also apply to cube and volume textures.


Pixel data is usually a hybrid of multiple elements packed into a single value, thus stepping through the pointer returned by IDirect3DTexture9::Lock() is per-pixel, but individual operations might work on only a single part of a pixel. To make matters worse, there are potentially a huge number of texture formats – refer to the D3DFORMAT documentation page for the full list.


There are three important steps:


1. Choose the data-type to cast the pointer to. For a given texture all elements will be of the same format, this format will be of a fixed size (e.g. a 32bit format like D3DFMT_X8R8G8B8). You can therefore cast the VOID* pointer to a known type of the same size. A useful trick can be to cast the data to a struct with the correct layout. For example:


struct Decoded32bitARGB


       unsigned __int8 A;

       unsigned __int8 R;

       unsigned __int8 G;

       unsigned __int8 B;




pTexture->LockRect( 0, &rect, NULL, 0 );


Decoded32bitARGB *pPixelData = reinterpret_cast< Decoded32bitARGB* >( rect.pBits );


// No need to decode individual channels, just access

// them directly:

pPixelData[..].A = ..;

pPixelData[..].R = ..;

pPixelData[..].G = ..;

pPixelData[..].B = ..;


// An alternative is to cast a 32bit format to a single 32bit integer:


unsigned __int32 *pPixelData = reinterpret_cast< unsigned __int32* >( rect.pBits );


// However, with this method we must manually decode the binary representation of

// 'pPixelData' using bitwise operators...


The struct-based method may not always be possible as the language might not offer the necessary primitives for representing individual channels (e.g. there is no 5 or 6 bit integer type that would be required for D3DFMT_R5G6B5).


2. Step through the array. This might seem obvious, but has a subtle characteristic that can make things explode in your face if you’re not careful. IDirect3DTexture9::GetLevelDesc() can allow you to retrieve the pixel dimensions (D3DSURFACE_DESC::Width and D3DSURFACE_DESC::Height), but it’s the D3DLOCKED_RECT::Pitch field you must pay attention to. The Direct3D specification allows the hardware to append extra data to the end of each row in a texture – this could either be unused padding (to aid alignment and improve performance) or it may be additional private data required by the device. Thus using a simple array[x][y] or array[x+y*width_in_pixels] lookup might end up addressing incorrect data.


Compensating for the pitch is as easy as remembering to traverse between rows by multiples of D3DLOCKED_RECT::Pitch instead of the number of pixels. Exactly how you do this depends on coding style. Bare in mind that the pitch is measured in bytes, such that you may need to divide it by the size of your per-pixel data.


3. Manipulate individual pixels. Once you’ve written code for casting and traversing the data you need to be able to read and write the packed data. Not all data (such as D3DFMT_R32F) contains multiple elements, in which case this sub-section can be skipped. Also, if the aforementioned struct-based approach is applicable you may not have to do any low-level manipulation.


Take the 16bit D3DFMT_R5G6B5 for example – if you were to look at the raw 16 binary digits you’d see: RRRRRGGGGGGBBBBB. To manipulate an individual channel you need to use a combination of bitwise operators to extract the data:


RRRRRGGGGGGBBBBB & 0000011111100000 = 00000GGGGGG00000

000000GGGGGG00000 >> 5 = 0000000000GGGGGG


Once you’ve moved the desired channel to the LSB’s and zeroed out any other data you can directly read the value stored in that channel. You can then manipulate as you see fit and perform the reverse operation to merge it back together:


0000000000GGGGGG << 5 = 000000GGGGGG00000

RRRRRGGGGGGBBBBB & 1111100000011111 = RRRRR000000BBBBB



There are many variations on this method – depending on which (if not all) channels you want to manipulate and what format the raw data is in. If you’re not confident with bitwise operations then drawing out the above calculations on paper can make writing the code much easier.


Pay particular attention to the width of each channel (in the above example, either 5 or 6 bits) as you can easily overflow these quantities when performing calculations. Depending on what computations you are doing, some sort of “up scale” and “down scale” operation might be desirable – e.g. compute all values as 32bit floating point values and then scale back down to 8, 6, 5 or 4 bit integer quantities.


As a final warning - be wary of Mip-Mapping! Locking and manipulating a particular surface is not automatically reflected across other levels. This can be particularly difficult to debug – if you lock the top-level and then scale the geometry and find that the original/older texture appears you have probably got out-of-sync mip-map levels. Some hardware allows for automatic (re-)generation (see Automatic Generation of Mipmaps in the SDK documentation), in all other cases you are responsible for propagating any changes through all levels.


Useful D3DX Helper Functions


The D3DXFillTexture(), D3DXFillCubeTexture(), D3DXFillVolumeTexture() and their “Texture Shader” (the same names with a ‘TX’ at the end) equivalents can be very useful. They aren’t always appropriate, nor are they going to be as fast as other solutions, but they hand much of the complexity over to D3DX. A simple callback function allows you to write data to a resource and have D3DX map it to the correct format and compensate for the surface’s pitch.


When dealing with floating-point data and HDR Rendering you may need to decode half-precision floating point values. The D3DXFLOAT16, D3DXVECTOR2_16F, D3DXVECTOR3_16F and D3DXVECTOR4_16F types are particular useful. If a full conversion is needed then the D3DXFloat16To32Array() and D3DXFloat32To16Array() functions can be useful.


Language and API Variations:


Managed Direct3D 1.1:


C/C++ Direct3D 10:




D3D #22: Using the debug runtimes [Table of Contents]


The debug runtimes are an essential part of any DirectX developer’s toolbox. Simply put they provide you with extra information to help you out when things go wrong – how can you fail to find that useful? The debug output is referred to in several other FAQ entries – in particular it is the debug runtimes that provide you with memory leak information (see D3D #9: I’m leaking memory, help!).


Refer to this NeXe entry for setting up debug output in your program. In addition to that article it is worth noting that if your runtime and SDK versions do not match parts of the DirectX control panel will be disabled. This is a common source of problems – having a 9.0b SDK and 9.0c runtimes being a good example. Since you can’t uninstall the core runtime your only choice is to upgrade to a more recent SDK.


Forum member’s circlesoft and Pipo DeClown created simple utilities to toggle between retail and debug versions – very useful if you like playing games on your development machine! Check out these two threads for D3DTaskBar and D3DDebugConf.


Whenever you start getting failed Direct3D functions (e.g. when using the FAILED() macro returns true) the first thing you should do is check the debug output in your IDE. In Visual Studio the “Output” view should be visible by default (possibly as a minimized tab along the bottom edge) and is also where you get compile-time output – a drop-down list at the top of the view allows you to toggle which source of messages are visible. If you can’t see the Output view, click on the “View” menu and select “Output” (or press Alt+2).


If you don’t want to (or can’t) use your IDE’s built in viewer then you may wish to look into SysInternalsDebugView” software.


The following fragment shows typical debug output when starting a DXUT-based application (many of the messages are due to DXUT’s initial enumeration stages):


Direct3D9: (INFO) :Direct3D9 Debug Runtime selected.

Direct3D9: (INFO) :======================= Hal HWVP device selected


Direct3D9: (INFO) :HalDevice Driver Style 9


D3D9 Helper: Warning: Default value for D3DRS_POINTSIZE_MAX is 2.19902e+012f, not 1.58456e+029f.  This is ok.

Direct3D9: (WARN) :No SW device has been registered. GetAdapterCaps fails.

D3D9 Helper: IDirect3D9::GetDeviceCaps failed: D3DERR_NOTAVAILABLE

Direct3D9: (INFO) :======================= Reference HWVP device selected


Direct3D9: (INFO) :HalDevice Driver Style 9


Direct3D9: (INFO) :======================= Hal HWVP device selected


Direct3D9: (INFO) :HalDevice Driver Style 9


D3D9 Helper: Warning: Default value for D3DRS_POINTSIZE_MAX is 2.19902e+012f, not 1.58456e+029f.  This is ok.


Note that there are two basic types of message, indicated by the prefix “Direct3D9: “ or “D3D9 Helper: “. The former appears by selecting the debug runtimes in the control panel but the latter only appears if you compile with the D3D_DEBUG_INFO symbol defined. Refer to the NeXe article on Debugging for more information. For the basic debugging messages you’ll also note that they have (INFO), (WARN) and (ERROR) tags indicating their severity. How much attention you pay to (INFO) and (WARN) messages is dependent on what you’re using the output for – in most cases they can be ignored.


It’s when a D3D function fails that the debug runtimes become particularly useful. Take the following minimal rendering function from the SDK’s EmptyProject template:


void CALLBACK OnFrameRender( IDirect3DDevice9* pd3dDevice, double fTime, float fElapsedTime, void* pUserContext )


    HRESULT hr;


    // Clear the render target and the zbuffer

    V( pd3dDevice->Clear(0, NULL, D3DCLEAR_TARGET | D3DCLEAR_ZBUFFER, D3DCOLOR_ARGB(0, 45, 50, 170), 1.0f, 0) );


    // Render the scene

    if( SUCCEEDED( pd3dDevice->BeginScene() ) )


        //V( pd3dDevice->EndScene() );




Note that the call to EndScene() is commented out. Under normal execution you’ll get numerous failed calls following this as the following Present() and the next frames BeginScene() cannot be called whilst the API is still “inside” a scene. In particular, all the application would see is D3DERR_INVALIDCALL return values triggering any FAILED() macros being used. With the debug runtime the following output is generated:


Direct3D9: (ERROR) :Present not allowed between BeginScene and EndScene. Present fails.

D3D9 Helper: IDirect3DDevice9::Present failed: D3DERR_INVALIDCALL

Direct3D9: (ERROR) :BeginScene, already in scene. BeginScene failed.

D3D9 Helper: IDirect3DDevice9::BeginScene failed: D3DERR_INVALIDCALL


The debug runtimes generate two (ERROR) statements – one for the Present() call at the end of the frame, and a second for the BeginScene() called when the application attempts to start the following frame. The D3D9 Helper simply adds the function name and a text version of the error code.


Depending on the position of the “Debug Output Level” slider you can end up with a huge amount of debug output. Moving it all the way to “More” will often generate a line of output for every single redundant render/sampler state change – of which you could have many per frame, and at high frame rates you’ll generate 1000’s (if not 10’s of 1000’s) of lines in output. In the previous example, the four lines of debug output are repeated once per frame – the simple test-bed application generated over 1800 lines of output in a matter of seconds! The ideal way would be to fix the source of these messages, but this isn’t always possible – moving the slider towards “Less” can be the only alternative.


If your application does generate a large amount of debug output it can be difficult to match it up with the corresponding API call(s). Depending on your software architecture (e.g. if you have any clever stack-tracing/logging code) you can inject “check-points” into the debug output to mark specific points in the execution of the application. In C/C++ this can be achieved by using OutputDebugString() – just be sure to add a new line character (‘\n’) to any messages. For example, adding OutputDebugStringA( __FUNCTION__ “\n” ) to the start of each function will allow you to track which function generates which debug messages.


Direct3D 10 introduces a useful addition to the debug output that allows the actual application to receive the debug messages. In previous versions of Direct3D the debug output is a complete unknown to the executing application. Also, unlike previous versions, debug mode must be specified by the application via the use of the D3D10_CREATE_DEVICE_DEBUG flag in the D3D10CreateDevice() call.


Once a device is successfully created with the debug flag it can be queried (via a regular QueryInterface() call) for an instance of ID3D10InfoQueue which can then be periodically polled for any waiting messages. It is important to make use of ID3D10InfoQueue::AddApplicationMessage() to maintain context when you finally query the full list. Good usage of the information queue’s features will allow for a very powerful debugging aid – unlike previous versions of Direct3D your application will be able to append application-specific context information to the output. A simple example would be to append a stack-trace and other state information whenever an API call fails – information that might allow you to narrow down the source of the error much more quickly.




D3D #23: Texture creating/loading enumeration and checking [Table of Contents]


Textures are a basic resource across all versions of Direct3D and can be used as both inputs and outputs for the pipeline. Despite being a fundamental part of the API they are subject to a number of constraints that an application developer needs to be aware of. For a more general discussion of enumeration and capabilities please refer to D3D #5: Hardware capabilities and enumeration.


This particular FAQ entry focuses on textures, stored as IDirect3DTexture9’s, but the information applies equally to volume (IDirect3DVolumeTexture9) and cube (IDirect3DCubeTexture9) forms.


There are two common methods for creating texture

  1. Creating an empty texture and either rendering data to it or filling it procedurally using “Lock” operations.
  2. Using the D3DX functions to create a texture based on existing data (e.g. an image file stored on disk).


Creating regular textures is done via the IDirect3DDevice9::CreateTexture() function, whereas the Texturing Functions reference page contains a list of the D3DX functions.


The dimensions, measured in pixels, are of particular importance. The D3DCAPS9 structure (retrieved via IDirect3DDevice9::GetDeviceCaps()) reveals three maximum values: MaxTextureWidth, MaxTextureHeight and MaxVolumeExtent. You can create texture resources with dimensions between 1 and the appropriate maximum value. For most D3D9 hardware this will be either 2048 or 4096, it is rare to find hardware that supports dimensions greater than 4096.


For optimal performance powers of 2 dimensions (64, 128, 256, 512…) should be used. You should pay particular attention to the D3DCAPS9::TextureCaps flags (see the D3DCAPS9 documentation page for precise details). In particular these flags indicate whether non-2n dimensions are permitted, and if they are whether there are any restrictions. You should still check these, but according to the “Graphics Card Capabilities” spreadsheet no D3D8-D3D9 level (GeForce 5-7 series, ATI Radeon 8x00, 9x00 and X series) restricts dimensions.


The D3DFORMAT enumeration contains a huge number of different resource formats, but most hardware only allows you to use a subset of these. Specifically, not all texture formats can be used for all types of usage. The IDirect3D9::CheckDeviceFormat() function is used to determine which formats are available for which resource types and formats. Depending on the intended usage it is often possible to create a number of “fallback” choices using this function such that a texture will always be created even if it is not necessarily the best/desired choice:


if( FAILED( CheckDeviceFormat( D3DFMT_A8R8G8B8 ) ) )


       if( FAILED( CheckDeviceFormat( D3DFMT_A8B8G8R8 ) ) )


              if( FAILED( CheckDeviceFormat( D3DFMT_A2R10G10B10 ) ) )


                     // We don't support *any* of the three textures we

                     // just tested. Either continue trying different ones

                     // or return an error at this point...




                     // Third Choice: We can use A2R10G10B10





              // Second Choice: We can use A8B8G8R8





       // First Choice: We can use A8R8G8B8



The above method is particularly important when targetting multiple types of hardware – not all configurations allow the same texture formats, so providing a fallback can be important.


Another important part to check is the intended usage of the texture resource. This is done by passing one (or more) of the D3DUSAGE or D3DUSAGE_QUERY enumerations into the Usage parameter of CheckDeviceFormat(). A common usage that you must check is when creating a render target (a texture that can be used as both an input and output for the pipeline, useful in many types of effect) – D3DUSAGE_RENDERTARGET should be used here.


When doing any High Dynamic Range Rendering (HDRR) you will find that there are 16bit (half) precision and 32bit (single) precision formats available. Two basic features available to almost every other format are often lacking with these texture formats – filtering and blending. Using CheckDeviceFormat() and passing in D3DUSAGE_QUERY_FILTER and/or D3DUSAGE_QUERY_POSTPIXELSHADER_BLENDING can determine whether any filtering and frame-buffer blending is supported. For D3D9 generation hardware it is common for the half-precision formats to support filtering/blending but single-precision won’t – thus it might be better to choose a half-precision format instead of single-precision.

The core Direct3D runtime has no functionality for creating textures from images stored on disk or in memory, but D3DX has comprehensive support via its Texturing Functions. Familiarity with the D3DX functions is useful and can save you a lot of time! It is worth noting that the “FromFileInMemory” functions can be very useful when paired with a virtual file system; if, for example, you store your files in a compressed/encrypted archive you can load them into memory as appropriate and still have D3DX load/create a texture.


A common confusion with the D3DX functions is that they can change the parameters passed in, or if defaults are requested they might not give the expected results. In particular, if a source image is stored in a non-2n dimension the D3DX functions will round the image up (with filtering) to the nearest 2n dimension. In some cases this can introduce blurry results or unexpected values when manipulating the resource. Specifying D3DX_DEFAULT_NONPOW2 for height and/or width will avoid this characteristic, but be prepared for the call to fail if the device does not allow non-2n dimensions (although, as previously remarked, this is unlikely).


D3DX only supports the image formats listed in the D3DXIMAGE_FILEFORMAT enumeration. This covers the majority of uses, but it is worth noting that not all formats are supported – GIF being one that some people still insist on using. In some cases D3DX will have to convert the incoming data into a pixel format supported by Direct3D – specifying D3DFMT_UNKNOWN will allow D3DX to choose the most appropriate (and fail if none can be determined). Specifying a particular format does not guarantee the returned texture is of that format – if a conversion or support is unavailable an alternative will be selected. Using the D3DFMT_FROM_FILE flag can be useful, but if conversion or support is unavailable the call will fail.


As mentioned in the previous paragraphs, D3DX may well choose different parameters where appropriate – this causes people problems if they assume that data will be loaded in a known format (although remember that assumptions about device capabilities are not sensible!). It may be useful to use the D3DXGetImageInfoFromFile() and/or D3DXCheckTextureRequirements() functions before attempting to load a texture. Several of the D3DX functions will return a D3DXIMAGE_INFO structure that contains the source parameters that the texture was created from, but this might require a failed attempt at creating the texture first.


Using the IDirect3DTexture9::GetLevelDesc() function and inspecting level 0 will reveal what dimensions were actually created.


Language and API Variations:


Managed Direct3D 1.1:


C/C++ Direct3D 10:




D3D #24: A perfect Direct3D9 application structure (includes handling lost devices) [Table of Contents]


Creating a correct Direct3D9 application structure is not quite as straight-forward as it might be for more general software development. There are a number of considerations, mostly revolving around resource management, which really should be included from the initial design. This particular FAQ entry is specific to Direct3D9 as later versions won’t have the concept of a “lost device” state due to changes in the underlying device driver (WDDM under Windows Vista instead of XPDM under Windows XP. See the Graphics APIs in Windows Vista article in the SDK for more details).


When using the DXUT framework (as the majority of the SDK samples do) you are largely insulated from the low level application structure. DXUT’s call-backs provide you with the necessary entry points and the framework does the rest. However, if you want to write your own architecture then you will need to know how a Direct3D application should manage its resources and respond to events.


Make sure you are familiar with the Lost Devices documentation, the D3D #5: Hardware capabilities and enumeration and D3D #13: Resource allocation best practices FAQ entries. It is also especially useful to note that the DXUT framework source code is readily available as part of your project (presuming you used the “Install Project” feature from the Sample Browser). The framework shows the preferred way of responding to all situations – if you’re unsure how your code should respond to an event then looking at how DXUT handles it is a good idea. The code can be a bit intimidating, but liberal usage of Visual Studio’s “Go To Definition” and “Go To Declaration” features (right-click over a DXUT symbol) make it somewhat easier. For example, DXUTStaticWndProc() defined in dxut.cpp demonstrates how to respond to the important Windows Messages.


The following pseudo flow-diagram is provided as an example of a typical Direct3D9 application. It doesn’t cover every use-case but should be sufficient for most uses.


  1. Application Starts
  2. Initialize Win32 – register a window class and show your window
  3. Create an IDirect3D9 object
    1. Perform all necessary enumeration of available devices, features and resolutions. See D3D#5 for details.
    2. Select an appropriate device
  4. Create an IDirect3DDevice9 object using CreateDevice()
  5. Create initial resources
    1. Create any D3DPOOL_DEFAULT resources. See D3D#13 for details
    2. Create any D3DPOOL_MANAGED resources. See D3D#13 for details
  6. Begin main application loop
    1. If there are any waiting Windows Messages (use PeekMessage() for example)

                                                               i.      Translate (TranslateMessage()) and despatch (DespatchMessage()) messages as normal

                                                             ii.      Respond to events

1.    WM_PAINT

a.       Render the next iteration of the graphics

b.      If return code for Present() == D3DERR_DEVICELOST flag up a “Device Lost” situation

2.    WM_SIZE


                                                                                                                                       i.      Pause rendering


                                                                                                                                       i.      If a SIZE_MINIMIZED previously occurred then un-pause rendering

                                                                                                                                     ii.      If between a WM_ENTERSIZEMOVE and WM_EXITSIZEMOVE then ignore

                                                                                                                                    iii.      Otherwise, check for the host window changing size. If the window has changed size reset the device


                                                                                                                                       i.      Un-pause rendering


a.       Pause rendering until the user finishes resizing the target window


a.       Un-pause rendering

b.      Check for the host window changing size; if the window has changed then reset the device

    1. Otherwise, no messages waiting

                                                               i.      Render the next iteration of the graphics

                                                             ii.      If return code for Present() == D3DERR_DEVICELOST flag up a “Device Lost” situation

    1. If in a “Device Lost” situation

                                                               i.      Use TestCooperativeLevel() to check current device state

1.       If return code is D3DERR_DEVICELOST

a.       Pause the application (e.g. use Sleep()) and then use TestCooperativeLevel() again. Loop accordingly.

2.       if return code is D3DERR_DEVICENOTRESET

a.       Release all D3DPOOL_DEFAULT resources

b.      Release any resources created with CreateRenderTarget(), CreateDepthStencilSurface(), CreateStateBlock() and CreateAdditionalSwapChain().

c.       Call Reset() to transition back to an operational state

d.      Recreate any resources released in steps #1 and #2

  1. Finish main application loop
  2. Tidy up remaining resources
    1. Release any D3DPOOL_MANAGED resources.
    2. Release any D3DPOOL_DEFAULT resources.
  3. Tidy up any system level interfaces
    1. Release your IDirect3DDevice9 and IDirect3D9 objects
    2. Tidy up any Win32 resources
  4. Application Ends


The above pseudo flow-diagram specifically doesn’t include any aspects of the accompanying application – e.g. loading and initialization of application-specific data or transitioning between levels in a game.


Language and API Variations:


Managed Direct3D 1.1:


C/C++ Direct3D 10: