SlideShare a Scribd company logo
Windows to reality   getting the most out of direct3 d 10 graphics in your games
Windows to Reality:
Getting the Most out of
Direct3D 10 Graphics in
Your Games
Shanon Drone
Software Development Engineer
XNA Developer Connection
Microsoft
Key areas
 Debug Layer
 Draw Calls
 Constant Updates
 State Management
 Shader Linkage
 Resource Updates
 Dynamic Geometry
 Porting Tips
Debug Layer
Use it!
  The D3D10 layer can help find performance
  issues
    App controlled by passing
    D3D10_CREATE_DEVICE_DEBUG into
    D3D10CreateDevice.
  Use the D3DX10 Debug Runtime
    Link against D3DX10d.lib
  Only do this for debug builds!
  Look for performance warnings in the debug
  output
Draw Calls
 Draw calls are still “not free”
 Draw overhead is reduced in D3D10
   But not enough that you can be lazy
 Efficiency in the number of draw calls will
 still give a performance win
Draw Calls
Excess baggage
  An increase in the number of draw calls
  generally increases the number of API
  calls associated with those draws
    ConstantBuffer updates
    Resource changes (VBs, IBs, Textures)
    InputLayout changes
  These all have effects on performance
  that vary with draw call count
Constant Updates
 Updating shader constants was often a
 bottleneck in D3D9
 It can still be a bottleneck in D3D10
 The main difference between the two is
 the new Constant Buffer object in D3D10
 This is the largest section of this talk
Constant Updates
Constant Buffer Recap
  Constant Buffers are buffer objects that
  hold shader constant data
  They are updated using
  D3D10_MAP_WRITE_DISCARD or by calling
  UpdateSubresource
  There are 16 Constant Buffer slots
  available to each shader in the pipeline
    Try not to use all 16 to leave some headroom
Constant Updates
Porting Issues
  D3D9 constants were updated individually
  by calling SetXXXXXShaderConstantX
  In D3D10, you have to update the entire
  constant buffer all at once
  A naïve port from D3D9 to D3D10 can have
  crippling performance implications if
  Constant Buffers are not handled
  correctly!
  Rule of thumb: Do not update more data
  than you need to
Constant Updates
Naïve Port: AKA how to cripple perf
  Each shader uses one big constant buffer
  Submitting one value submits them all!
  If you have one 4096 byte Constant
  Buffer, and you only need to update your
  World matrix, you will still have to update
  4096 bytes of data and send it across the
  bus
  Don’t do this!
Constant Updates
Naïve Port: AKA how to cripple perf
  100 skinned meshes (100 materials), 900
  static meshes (400 materials), 1 shadow +
  1 lighting pass    Shadow Pass
                        Update VSGlobalCB
                                     6560 Bytes   x 100 = 656000 Bytes
 cbuffer VSGlobalsCB
                                   Update VSGlobalCB
 {                       6560
   matrix ViewProj;     Bytes        6560 Bytes   x 900 = 5904000 Bytes
   matrix Bones[100];           Light Pass
   matrix World;                    Update VSGlobalCB
   float SpecPower;
                                     6560 Bytes   x 100 = 656000 Bytes
   float4 BDRFCoefficients;
   float AppTime;                  Update VSGlobalCB
   uint2 RenderTargetSize;           6560 Bytes   x 900 = 5904000 Bytes
 };
                                = 13,120,000 Bytes
Constant Updates
Organize Constants
  The first step is to organize constants by
  frequency of update
  One shader will generally be used to draw
  several objects
  Some data in this shader doesn’t need to
  be set for every draw
    For example: Time, ViewProj matrices
  Split these out into their own buffers
Begin Frame
cbuffer VSGlobalPerFrameCB              Update VSGlobalPerFrameCB
{                        4 Bytes
  float AppTime;                           4 Bytes    x 1 = 4 Bytes
};                                     Update VSPerSkinnedCBs
cbuffer VSPerSkinnedCB                   6400 Bytes   x 100 = 640000 Bytes
{                         6400
                          Bytes        Update VSPerStaticCBs
  matrix Bones[100];
};                                        64 Bytes    x 900 = 57600 Bytes
cbuffer VSPerStaticCB               Shadow Pass
{                        64 Bytes       Update VSPerPassCB
  matrix World
};
                                          72 Bytes    x 1 = 72 Bytes
cbuffer VSPerPassCB                 Light Pass
{                                       Update VSPerPassCB
  matrix ViewProj;       72 Bytes
                                          72 Bytes    x 1 = 72 Bytes
  uint2 RenderTargetSize;
};                                     Update VSPerMaterialCBs

cbuffer VSPerMaterialCB                   20 Bytes    x 500 = 10000 Bytes
{
                         20 Bytes
  float SpecPower;
  float4 BDRFCoefficients;              = 707,748 Bytes
};
Constant Updates



 13,120,000
   Bytes
              /   707,748
                   Bytes    =   18x
Constant Updates
Managing Buffers
  Constant buffers need to be managed in
  the application
  Creating a few buffers that are used for
  all shader constants just won’t work
    We update more data than necessary due to
    large buffers
Constant Updates
Managing Buffers
  Solution 1 (Fastest)
    Create Constant Buffers that line up exactly
    with the number of elements of each
    frequency group
      Global CBs
      CBs per Mesh
      CBs per Material
      CBs per Pass
    This ensures that EVERY constant buffer is no
    larger than it absolutely needs to be
    This also ensures the most efficient update of
    CBs based upon frequency
Constant Updates
Managing Buffers
  Solution 2 (Second Best)
    If you cannot create a CBs that line up exactly
    with elements, you can create a tiered constant
    buffer system
    Create arrays of 32-byte, 64-byte, 128-byte, 256-
    byte, etc. constant buffers
    Keep a shadow copy of the constant data in
    system memory
    When it comes time to render, select the
    smallest CB from the array that will hold the
    necessary constant data
    May have to resubmit redundant data for
    separate passes
    Hybrid approach?
Constant Updates
Case Study: Skinning using Solution 1
  Skinning in D3D9 (or a bad D3D10 port)
    Multiple passes causes redundant bone data
    uploads to the GPU
  Skinning in D3D10
    Using Constant Buffers we only need to
    upload it once
Constant Updates
D3D9 Version / or Naïve D3D10 Version
   Pass1                        Mesh2 Bone0
                                Mesh1

   Set Mesh1 Bones              Mesh2 Bone1
                                Mesh1 Bone1
     Draw Mesh1
                                Mesh2 Bone2
                                Mesh1
   Set Mesh2 Bones
                     Constant   Mesh2 Bone3
                                Mesh1
     Draw Mesh2
                     Data
   Pass2                        Mesh2 Bone4
                                Mesh1

   Set Mesh1 Bones                  …
     Draw Mesh1
                                Mesh2 BoneN
                                Mesh1
   Set Mesh2 Bones
     Draw Mesh2
Constant Updates
Preferred D3D10 Version
                         Mesh1 CB                 Mesh2 CB
 Frame Start
                         Mesh1 Bone0              Mesh2 Bone0
   Update Mesh1 CB
                         Mesh1 Bone1              Mesh2 Bone1
   Update Mesh2 CB
                         Mesh1 Bone2              Mesh2 Bone2
   Pass1
                         Mesh1 Bone3              Mesh2 Bone3
    Bind Mesh1 CB
     Draw Mesh1          Mesh1 Bone4              Mesh2 Bone4
    Bind Mesh2 CB             …                        …
     Draw Mesh2          Mesh1 BoneN              Mesh2 BoneN
   Pass2
    Bind Mesh1 CB
     Draw Mesh1
    Bind Mesh2 CB    CB Slot 0         Mesh1
                                       Mesh2 CB
     Draw Mesh2
Constant Updates
Advanced D3D10 Version
 Why not store all of our characters’ bones in
 a 128-bit FP texture?
 We can upload bones for all visible
 characters at the start of a frame
 We can draw similar characters using
 instancing instead of individual draws
   Use SV_InstanceID to select the start of the
   character’s bone data in the texture
 Stream the skinned meshes to memory using
 Stream Output and render all subsequent
 passes from the post-skinned buffer
State Management
 Individual state setting is no longer
 possible in D3D10
 State in D3D10 is stored in state objects
 These state objects are immutable
 To change even one aspect of a state
 object requires that you create an
 entirely new state object with that one
 change
State Management
Managing State Objects
  Solution 1 (Fastest)
    If you have a known set of materials and
    required states, you can create all state
    objects at load time
    State objects are small and there are finite
    set of permutations
    With all state objects created at runtime, all
    that needs to be done during rendering is to
    bind the object
State Management
Managing State Objects
  Solution 2 (Second Best)
    If your content is not finalized, or if you
    CANNOT get your engine to lump state
    together
    Create a state object hash table
    Hash off of the setting that has the most
    unique states
    Grab pre-created states from the hash-table
    Why not give your tools pipeline the ability to
    do this for a level and save out the results?
Shader Linkage
 D3D9 shader linkage was based off of
 semantics (POSITION, NORMAL,
 TEXCOORDN)
 D3D10 linkage is based off of offsets and
 sizes
 This means stricter linkage rules
 This also means that the driver doesn’t
 have to link shaders together at every
 draw call!
Shader Linkage
No Holes Allowed!
   Elements must be read in the order they
   are output from the previous stage
   Cannot have “holes” between linkages

Struct VS_OUTPUT                           Struct PS_INPUT
{                                          {
    float3 Norm :   NORMAL;                    float2 Tex : TEXCOORD0;
                                               float3 Norm   NORMAL;
    float2 Tex :    TEXCOORD0;                 float3 Norm : NORMAL;
                                                      Tex    TEXCOORD0;
    float2 Tex2 :   TEXCOORD1;                 float2 Tex2 : TEXCOORD1;
    float4 Pos :    SV_POSITION;
};                                         };




                                   Holes at the end are OK
Shader Linkage
Input Assembler to Vertex Shader
  Input Layouts define the signature of the
  vertex stream data
  Input Layouts are the similar to Vertex
  Declarations in D3D9
    Strict linkage rules are a big difference
  Creating Input Layouts on the fly is not
  recommended
  CreateInputLayout requires a shader
  signature to validate against
Shader Linkage
Input Assembler to Vertex Shader
  Solution 1 (Fastest)
    Create an Input Layout for each unique
    Vertex Stream / Vertex Shader combination
    up front
    Input Layouts are small
    This assumes that the shader input signature
    is available when you call CreateInputLayout
    Try to normalize Input Layouts across level or
    be art directed
Shader Linkage
Input Assembler to Vertex Shader
  Solution 2 (Second Best)
    If you load meshes and create input layouts
    before loading shaders, you might have a
    problem
    You can use a similar hashing scheme as the
    one used for State Objects
    When the Input Layout is needed, search the
    hash for an Input Layout that matches the
    Vertex Stream and Vertex Shader signature
    Why not store this data to a file and pre-
    populate the Input Layouts after your content
    is tuned?
Shader Linkage
Aside: Instancing
  Instancing is a first class citizen on D3D10!
  Stream source frequency is now part of
  the Input Layout
  Multiple frequencies will mean multiple
  Input Layouts
Resource Updates
 Updating resources is different in D3D10
 Create / Lock / Fill / Unlock paradigm is
 no longer necessary (although you can
 still do it)
 Texture data can be passed into the
 texture at create time
Resource Updates
Resource Usage Types
 D3D10_USAGE_DEFAULT
 D3D10_USAGE_IMMUTABLE
 D3D10_USAGE_DYNAMIC
 D3D10_USAGE_STAGING
Resource Updates
D3D10_USAGE_DEFAULT
 Use for resources that need fast GPU read
 and write access
 Can only be updated using
 UpdateSubresource
 Render targets are good candidates
 Textures that are updated infrequently
 (less than once per frame) are good
 candidates
Resource Updates
D3D10_USAGE_IMMUTABLE
 Use for resources that need fast GPU read
 access only
 Once they are created, they cannot be
 updated... ever
 Initial data must be passed in during the
 creation call
 Resources that will never change (static
 textures, VBs / Ibs) are good candidates
 Don’t bend over backwards trying to make
 everything D3D10_USAGE_IMMUTABLE
Resource Updates
D3D10_USAGE_DYNAMIC
 Use for resources that need fast CPU write
 access (at the expense of slower GPU read
 access)
 No CPU read access
 Can only be updated using Map with:
   D3D10_MAP_WRITE_DISCARD
   D3D10_MAP_WRITE_NO_OVERWRITE
 Dynamic Vertex Buffers are good candidates
 Dynamic (> once per frame) textures are
 good candidates
Resource Updates
D3D10_USAGE_STAGING
 This is the only way to read data back
 from the GPU
 Can only be updated using Map
 Cannot map with
 D3D10_MAP_WRITE_DISCARD or
 D3D10_MAP_WRITE_NO_OVERWRITE
 Might want to double buffer to keep from
 stalling GPU
 The GPU cannot directly use these
Resource Updates
Summary
 CPU updates the resource frequently
 (more than once per frame)
   Use D3D10_USAGE_DYNAMIC
 CPU updates the resource infrequently
 (once per frame or less)
   Use D3D10_USAGE_DEFAULT
 CPU doesn’t update the resource
   Use D3D10_USAGE_IMMUTABLE
 CPU needs to read the resource
   Use D3D10_USAGE_STAGING
Resource Updates
Example: Vertex Buffer
  The vertex buffer is touched by the CPU
  less than once per frame
    Create it with D3D10_USAGE_DEFAULT
    Update it with UpdateSubresource
  The vertex buffer is used for dynamic
  geometry and CPU need to update if
  multiple times per frame
    Create it with D3D10_USAGE_DYNAMIC
    Update it with Map
Resource Updates
The Exception: Constant Buffers
  CBs are always expected to be updated
  frequently
  Select CB usage based upon which one
  causes the least amount of system
  memory to be transferred
    Not just to the GPU, but system-to-system
    memory copies as well
Resource Updates
UpdateSubresource
 UpdateSubresource requires a system
 memory buffer and incurs an extra copy
 Use if you have system copies of your
 constant data already in one place
Resource Updates
Map
 Map requires no extra system memory but
 may hit driver renaming limits if abused
 Use if compositing values on the fly or
 collecting values from other places
Resource Updates
A note on overusing discard
  Use D3D10_MAP_WRITE_DISCARD carefully
  with buffers!
  D3D10_MAP_WRITE_DISCARD tells the driver to
  give us a new memory buffer if the current
  one is busy
  There are a LIMITED set of temporary buffers
  If these run out, then your app will stall until
  another buffer can be freed
  This can happen if you do dynamic geometry
  using one VB and D3D10_MAP_WRITE_DISCARD
Dynamic Geometry
 DrawIndexedPrimitiveUP is gone!
 DrawPrimitiveUP is gone!
 Your well-behaved D3D9 app isn’t using
 these anyway, right?
Dynamic Geometry
Solution: Same as in D3D9
  Use one large buffer, and map it with
  D3D10_MAP_WRITE_NO_OVERWRITE
  Advance the write position with every draw
    Wrap to the beginning
  Make sure your buffer is large enough that
  you’re not overwriting data that the GPU is
  reading
  This is what happens under the covers for
  D3D9 when using DIPUP or DUP in Windows
  Vista
Porting Tips
 StretchRect is Gone
   Work around using render-to-texture
 A8R8G8B8 have been replaced with
 R8G8B8A8 formats
   Swizzle on texture load or swizzle in the
   shader
 Fixed Function AlphaTest is Gone
   Add logic to the shader and call discard
 Fixed Function Fog is Gone
   Add it to the shader
Porting Tips
Continued
 User Clip Planes usage has changed
   They’ve move to the shader
   Experiment with the SV_ClipDistance SEMANTIC vs
   discard in the PS to determine which is faster for
   your shader
 Query data sizes might have changed
   Occlusion queries are UINT64 vs DWORD
 No Triangle Fan Support
   Work around in content pipeline or on load
 SetCursorProperties, ShowCursor are gone
   Use Win32 APIs to handle cursors now
Porting Tips
Continued
 No offsets on Map calls
   This was basically API clutter in D3D9
   Calculate the offset from the returned pointer
 Clears are no longer bound to pipeline state
   If you want a clear call to respect scissor,
   stencil, or other state, draw a full-screen quad
   This is closer to the HW
   The Driver/HW has been doing for you for years
 OMSetBlendState
   Never set the SampleMask to 0 in
   OMSetBlendState
Porting Tips
Continued
 Input Layout conversions tightened up
   D3DDECLTYPE_UBYTE4 in the vertex stream
   could be converted to a float4 in the VS in D3D9
   IE. 255u in the stream would show up as 255.0 in
   the VS
   In D3D10 you either get a normalized [0..1] value
   or 255 (u)int
 Register keyword
   It doesn’t mean the same thing in D3D10
   Use register to determine which CB slot a CB
   binds to
   Use packoffset to place a variable inside a CB
Porting Tips
Continued
 Sampler and Texture bindings
   Samplers can be bound independently of textures
   This is very flexible!
   Sampler and Texture slots are not always the
   same
 Register Packing
   In D3D9 all variables took up at least one float4
   register (even if you only used a single float!)
   In D3D10 variables are packed together
   This saves a lot of space
   Make sure your engine doesn’t do everything
   based upon register offsets or your variables
   might alias
Porting Tips
Continued
 D3DSAMP_SRGBTEXTURE
   This sampler state setting does not exist on
   D3D10
   Instead it’s included in the texture format
   This is more like the Xbox 360
 Consider re-optimizing resource usage and
 upload for better D3D10 performance
   But use D3D10_USAGE_DEFAULT resources
   and UpdateSubresource and a baseline
Summary
 Use the debug runtime!
 More draw calls usually means more constant
 updating and state changing calls
 Be frugal with constant updates
   Avoid resubmitting redundant data!
 Create as much state and input layout
 information up front as possible
 Select D3D10_USAGE for resources based
 upon the CPU access patterns needed
 Use D3D10_MAP_NO_OVERWRITE and a big
 buffer as a replacement for DIPUP and DUP
Call to Action
 Actually exploit D3D10!
 This talk tells you how to get performance
 gains from a straight port
 You can get a whole lot more by using
 D3D10’s advanced features!
   StreamOut to minimize skinning costs
   First class instancing support
   Store some vertex data in textures
   Move some systems to the GPU (Particles?)
   Aggressive use of Constant Buffers
http://www.xna.com




                                © 2007 Microsoft Corporation. All rights reserved.
This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

More Related Content

PPTX
Direct x 11 입문
Jin Woo Lee
 
PPTX
[KGC2014] DX9에서DX11로의이행경험공유
Hwan Min
 
PDF
Unite2019 HLOD를 활용한 대규모 씬 제작 방법
장규 서
 
PDF
Rendering Techniques in Rise of the Tomb Raider
Eidos-Montréal
 
PDF
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Tiago Sousa
 
PPTX
Compute shader DX11
민웅 이
 
PPT
Secrets of CryENGINE 3 Graphics Technology
Tiago Sousa
 
PPTX
Progressive Lightmapper: An Introduction to Lightmapping in Unity
Unity Technologies
 
Direct x 11 입문
Jin Woo Lee
 
[KGC2014] DX9에서DX11로의이행경험공유
Hwan Min
 
Unite2019 HLOD를 활용한 대규모 씬 제작 방법
장규 서
 
Rendering Techniques in Rise of the Tomb Raider
Eidos-Montréal
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Tiago Sousa
 
Compute shader DX11
민웅 이
 
Secrets of CryENGINE 3 Graphics Technology
Tiago Sousa
 
Progressive Lightmapper: An Introduction to Lightmapping in Unity
Unity Technologies
 

What's hot (20)

PPTX
A Scalable Real-Time Many-Shadowed-Light Rendering System
Bo Li
 
PDF
멀티스레드 렌더링 (Multithreaded rendering)
Bongseok Cho
 
PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
Graham Wihlidal
 
PPTX
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
AMD Developer Central
 
PDF
Introduction to DirectX 12 Programming , Ver 1.5
YEONG-CHEON YOU
 
PDF
09_Dxt 압축 알고리즘 소개
noerror
 
PPSX
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
AMD Developer Central
 
PDF
Killzone Shadow Fall Demo Postmortem
Guerrilla
 
PPTX
Parallel Futures of a Game Engine
repii
 
PDF
나만의 엔진 개발하기
YEONG-CHEON YOU
 
PPTX
Implements Cascaded Shadow Maps with using Texture Array
YEONG-CHEON YOU
 
PDF
Ndc2010 전형규 마비노기2 캐릭터 렌더링 기술
henjeon
 
PDF
스크린 스페이스 데칼에 대해 자세히 알아보자(워햄머 40,000: 스페이스 마린)
포프 김
 
PPTX
Shiny PC Graphics in Battlefield 3
Electronic Arts / DICE
 
PDF
「原神」におけるコンソールプラットフォーム開発
Unity Technologies Japan K.K.
 
PPTX
[NDC 2018] 신입 개발자가 알아야 할 윈도우 메모리릭 디버깅
DongMin Choi
 
PDF
빌드관리 및 디버깅 (2010년 자료)
YEONG-CHEON YOU
 
PPT
Crysis Next-Gen Effects (GDC 2008)
Tiago Sousa
 
PDF
Multiprocessor Game Loops: Lessons from Uncharted 2: Among Thieves
Naughty Dog
 
PPTX
Siggraph 2011: Occlusion culling in Alan Wake
Umbra
 
A Scalable Real-Time Many-Shadowed-Light Rendering System
Bo Li
 
멀티스레드 렌더링 (Multithreaded rendering)
Bongseok Cho
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Graham Wihlidal
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
AMD Developer Central
 
Introduction to DirectX 12 Programming , Ver 1.5
YEONG-CHEON YOU
 
09_Dxt 압축 알고리즘 소개
noerror
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
AMD Developer Central
 
Killzone Shadow Fall Demo Postmortem
Guerrilla
 
Parallel Futures of a Game Engine
repii
 
나만의 엔진 개발하기
YEONG-CHEON YOU
 
Implements Cascaded Shadow Maps with using Texture Array
YEONG-CHEON YOU
 
Ndc2010 전형규 마비노기2 캐릭터 렌더링 기술
henjeon
 
스크린 스페이스 데칼에 대해 자세히 알아보자(워햄머 40,000: 스페이스 마린)
포프 김
 
Shiny PC Graphics in Battlefield 3
Electronic Arts / DICE
 
「原神」におけるコンソールプラットフォーム開発
Unity Technologies Japan K.K.
 
[NDC 2018] 신입 개발자가 알아야 할 윈도우 메모리릭 디버깅
DongMin Choi
 
빌드관리 및 디버깅 (2010년 자료)
YEONG-CHEON YOU
 
Crysis Next-Gen Effects (GDC 2008)
Tiago Sousa
 
Multiprocessor Game Loops: Lessons from Uncharted 2: Among Thieves
Naughty Dog
 
Siggraph 2011: Occlusion culling in Alan Wake
Umbra
 
Ad

Viewers also liked (19)

PPTX
[NHN_NEXT] DirectX Tutorial 강의 자료
MinGeun Park
 
PPTX
Porting direct x 11 desktop game to uwp app
YEONG-CHEON YOU
 
PPTX
[1023 박민수] 깊이_버퍼_그림자_1
MoonLightMS
 
PPTX
[Gpg1권 박민근] 5.10 게임을 위한 그럴듯한 유리 효과
MinGeun Park
 
PPTX
[0107 박민근] 쉽게 배우는 hdr과 톤맵핑
MinGeun Park
 
PPTX
[1126 박민근] 비전엔진을 이용한 mmorpg 개발
MinGeun Park
 
PPT
모바일 게임 최적화
tartist
 
PDF
NDC 2015 박주은,최재혁 물리기반렌더링 지난1년간의 경험
Jooeun Park
 
PPTX
물리 기반 셰이더의 이해
tartist
 
PDF
[C++ Korea 2nd Seminar] C++17 Key Features Summary
Chris Ohk
 
PPT
NDC2015 유니티 정적 라이팅 이게 최선인가요
Wuwon Yu
 
PPTX
[0312 조진현] good bye dx9
진현 조
 
PDF
[C++ Korea 3rd Seminar] 새 C++은 새 Visual Studio에, 좌충우돌 마이그레이션 이야기
Chris Ohk
 
PDF
[Kgc2012] deferred forward 이창희
changehee lee
 
PDF
Visual shock vol.2
changehee lee
 
PPTX
[160402_데브루키_박민근] UniRx 소개
MinGeun Park
 
PPTX
물리 기반 셰이더의 허와 실:물리기반 셰이더를 가르쳐 봤습니다 공개용
JP Jung
 
PPTX
유니티의 툰셰이딩을 사용한 3D 애니메이션 표현
MinGeun Park
 
PDF
Modern gpu optimize blog
ozlael ozlael
 
[NHN_NEXT] DirectX Tutorial 강의 자료
MinGeun Park
 
Porting direct x 11 desktop game to uwp app
YEONG-CHEON YOU
 
[1023 박민수] 깊이_버퍼_그림자_1
MoonLightMS
 
[Gpg1권 박민근] 5.10 게임을 위한 그럴듯한 유리 효과
MinGeun Park
 
[0107 박민근] 쉽게 배우는 hdr과 톤맵핑
MinGeun Park
 
[1126 박민근] 비전엔진을 이용한 mmorpg 개발
MinGeun Park
 
모바일 게임 최적화
tartist
 
NDC 2015 박주은,최재혁 물리기반렌더링 지난1년간의 경험
Jooeun Park
 
물리 기반 셰이더의 이해
tartist
 
[C++ Korea 2nd Seminar] C++17 Key Features Summary
Chris Ohk
 
NDC2015 유니티 정적 라이팅 이게 최선인가요
Wuwon Yu
 
[0312 조진현] good bye dx9
진현 조
 
[C++ Korea 3rd Seminar] 새 C++은 새 Visual Studio에, 좌충우돌 마이그레이션 이야기
Chris Ohk
 
[Kgc2012] deferred forward 이창희
changehee lee
 
Visual shock vol.2
changehee lee
 
[160402_데브루키_박민근] UniRx 소개
MinGeun Park
 
물리 기반 셰이더의 허와 실:물리기반 셰이더를 가르쳐 봤습니다 공개용
JP Jung
 
유니티의 툰셰이딩을 사용한 3D 애니메이션 표현
MinGeun Park
 
Modern gpu optimize blog
ozlael ozlael
 
Ad

Similar to Windows to reality getting the most out of direct3 d 10 graphics in your games (20)

PPSX
Dx11 performancereloaded
mistercteam
 
PPT
D3 D10 Unleashed New Features And Effects
Thomas Goddard
 
PPTX
Beyond porting
Cass Everitt
 
PPTX
Approaching zero driver overhead
Cass Everitt
 
PPTX
Making a game with Molehill: Zombie Tycoon
Jean-Philippe Doiron
 
PPT
CS 354 GPU Architecture
Mark Kilgard
 
PPTX
4,000 Adams at 90 Frames Per Second | Yi Fei Boon
Jessica Tams
 
PDF
GeForce 8800 OpenGL Extensions
icastano
 
PPT
Tessellation on any_budget-gdc2011
basisspace
 
PDF
Markus Tessmann, InnoGames
White Nights Conference
 
PPT
Far cry 3
sojuwugor
 
PPT
Your Game Needs Direct3D 11, So Get Started Now!
repii
 
PPTX
Shader model 5 0 and compute shader
zaywalker
 
PDF
NVIDIA effects GDC09
IGDA_London
 
PDF
The Technology of Uncharted: Drake’s Fortune
Naughty Dog
 
PDF
Buffersdirectx
VisCircle
 
PDF
Дмитрий Вовк - Learn iOS Game Optimization. Ultimate Guide
UA Mobile
 
PPTX
Efficient Buffer Management
basisspace
 
PDF
Modern Graphics Pipeline Overview
slantsixgames
 
PDF
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
npinto
 
Dx11 performancereloaded
mistercteam
 
D3 D10 Unleashed New Features And Effects
Thomas Goddard
 
Beyond porting
Cass Everitt
 
Approaching zero driver overhead
Cass Everitt
 
Making a game with Molehill: Zombie Tycoon
Jean-Philippe Doiron
 
CS 354 GPU Architecture
Mark Kilgard
 
4,000 Adams at 90 Frames Per Second | Yi Fei Boon
Jessica Tams
 
GeForce 8800 OpenGL Extensions
icastano
 
Tessellation on any_budget-gdc2011
basisspace
 
Markus Tessmann, InnoGames
White Nights Conference
 
Far cry 3
sojuwugor
 
Your Game Needs Direct3D 11, So Get Started Now!
repii
 
Shader model 5 0 and compute shader
zaywalker
 
NVIDIA effects GDC09
IGDA_London
 
The Technology of Uncharted: Drake’s Fortune
Naughty Dog
 
Buffersdirectx
VisCircle
 
Дмитрий Вовк - Learn iOS Game Optimization. Ultimate Guide
UA Mobile
 
Efficient Buffer Management
basisspace
 
Modern Graphics Pipeline Overview
slantsixgames
 
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
npinto
 

More from changehee lee (20)

PDF
Shader compilation
changehee lee
 
PDF
Gdc 14 bringing unreal engine 4 to open_gl
changehee lee
 
PDF
Smedberg niklas bringing_aaa_graphics
changehee lee
 
PDF
Fortugno nick design_and_monetization
changehee lee
 
PDF
카툰 렌더링
changehee lee
 
PDF
[Kgc2013] 모바일 엔진 개발기
changehee lee
 
PPTX
Paper games 2013
changehee lee
 
PPTX
모바일 엔진 개발기
changehee lee
 
PPTX
Wecanmakeengine
changehee lee
 
PDF
Mobile crossplatformchallenges siggraph
changehee lee
 
PDF
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
changehee lee
 
PPTX
개발자여! 스터디를 하자!
changehee lee
 
PPT
Light prepass
changehee lee
 
PPTX
Gamificated game developing
changehee lee
 
PDF
Basic ofreflectance kor
changehee lee
 
PDF
C++11(최지웅)
changehee lee
 
PDF
Valve handbook low_res
changehee lee
 
PDF
Ndc12 이창희 render_pipeline
changehee lee
 
PPTX
아이폰에 포팅해보기
changehee lee
 
Shader compilation
changehee lee
 
Gdc 14 bringing unreal engine 4 to open_gl
changehee lee
 
Smedberg niklas bringing_aaa_graphics
changehee lee
 
Fortugno nick design_and_monetization
changehee lee
 
카툰 렌더링
changehee lee
 
[Kgc2013] 모바일 엔진 개발기
changehee lee
 
Paper games 2013
changehee lee
 
모바일 엔진 개발기
changehee lee
 
Wecanmakeengine
changehee lee
 
Mobile crossplatformchallenges siggraph
changehee lee
 
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
changehee lee
 
개발자여! 스터디를 하자!
changehee lee
 
Light prepass
changehee lee
 
Gamificated game developing
changehee lee
 
Basic ofreflectance kor
changehee lee
 
C++11(최지웅)
changehee lee
 
Valve handbook low_res
changehee lee
 
Ndc12 이창희 render_pipeline
changehee lee
 
아이폰에 포팅해보기
changehee lee
 

Recently uploaded (20)

PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Software Development Methodologies in 2025
KodekX
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Software Development Methodologies in 2025
KodekX
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 

Windows to reality getting the most out of direct3 d 10 graphics in your games

  • 2. Windows to Reality: Getting the Most out of Direct3D 10 Graphics in Your Games Shanon Drone Software Development Engineer XNA Developer Connection Microsoft
  • 3. Key areas Debug Layer Draw Calls Constant Updates State Management Shader Linkage Resource Updates Dynamic Geometry Porting Tips
  • 4. Debug Layer Use it! The D3D10 layer can help find performance issues App controlled by passing D3D10_CREATE_DEVICE_DEBUG into D3D10CreateDevice. Use the D3DX10 Debug Runtime Link against D3DX10d.lib Only do this for debug builds! Look for performance warnings in the debug output
  • 5. Draw Calls Draw calls are still “not free” Draw overhead is reduced in D3D10 But not enough that you can be lazy Efficiency in the number of draw calls will still give a performance win
  • 6. Draw Calls Excess baggage An increase in the number of draw calls generally increases the number of API calls associated with those draws ConstantBuffer updates Resource changes (VBs, IBs, Textures) InputLayout changes These all have effects on performance that vary with draw call count
  • 7. Constant Updates Updating shader constants was often a bottleneck in D3D9 It can still be a bottleneck in D3D10 The main difference between the two is the new Constant Buffer object in D3D10 This is the largest section of this talk
  • 8. Constant Updates Constant Buffer Recap Constant Buffers are buffer objects that hold shader constant data They are updated using D3D10_MAP_WRITE_DISCARD or by calling UpdateSubresource There are 16 Constant Buffer slots available to each shader in the pipeline Try not to use all 16 to leave some headroom
  • 9. Constant Updates Porting Issues D3D9 constants were updated individually by calling SetXXXXXShaderConstantX In D3D10, you have to update the entire constant buffer all at once A naïve port from D3D9 to D3D10 can have crippling performance implications if Constant Buffers are not handled correctly! Rule of thumb: Do not update more data than you need to
  • 10. Constant Updates Naïve Port: AKA how to cripple perf Each shader uses one big constant buffer Submitting one value submits them all! If you have one 4096 byte Constant Buffer, and you only need to update your World matrix, you will still have to update 4096 bytes of data and send it across the bus Don’t do this!
  • 11. Constant Updates Naïve Port: AKA how to cripple perf 100 skinned meshes (100 materials), 900 static meshes (400 materials), 1 shadow + 1 lighting pass Shadow Pass Update VSGlobalCB 6560 Bytes x 100 = 656000 Bytes cbuffer VSGlobalsCB Update VSGlobalCB { 6560 matrix ViewProj; Bytes 6560 Bytes x 900 = 5904000 Bytes matrix Bones[100]; Light Pass matrix World; Update VSGlobalCB float SpecPower; 6560 Bytes x 100 = 656000 Bytes float4 BDRFCoefficients; float AppTime; Update VSGlobalCB uint2 RenderTargetSize; 6560 Bytes x 900 = 5904000 Bytes }; = 13,120,000 Bytes
  • 12. Constant Updates Organize Constants The first step is to organize constants by frequency of update One shader will generally be used to draw several objects Some data in this shader doesn’t need to be set for every draw For example: Time, ViewProj matrices Split these out into their own buffers
  • 13. Begin Frame cbuffer VSGlobalPerFrameCB Update VSGlobalPerFrameCB { 4 Bytes float AppTime; 4 Bytes x 1 = 4 Bytes }; Update VSPerSkinnedCBs cbuffer VSPerSkinnedCB 6400 Bytes x 100 = 640000 Bytes { 6400 Bytes Update VSPerStaticCBs matrix Bones[100]; }; 64 Bytes x 900 = 57600 Bytes cbuffer VSPerStaticCB Shadow Pass { 64 Bytes Update VSPerPassCB matrix World }; 72 Bytes x 1 = 72 Bytes cbuffer VSPerPassCB Light Pass { Update VSPerPassCB matrix ViewProj; 72 Bytes 72 Bytes x 1 = 72 Bytes uint2 RenderTargetSize; }; Update VSPerMaterialCBs cbuffer VSPerMaterialCB 20 Bytes x 500 = 10000 Bytes { 20 Bytes float SpecPower; float4 BDRFCoefficients; = 707,748 Bytes };
  • 14. Constant Updates 13,120,000 Bytes / 707,748 Bytes = 18x
  • 15. Constant Updates Managing Buffers Constant buffers need to be managed in the application Creating a few buffers that are used for all shader constants just won’t work We update more data than necessary due to large buffers
  • 16. Constant Updates Managing Buffers Solution 1 (Fastest) Create Constant Buffers that line up exactly with the number of elements of each frequency group Global CBs CBs per Mesh CBs per Material CBs per Pass This ensures that EVERY constant buffer is no larger than it absolutely needs to be This also ensures the most efficient update of CBs based upon frequency
  • 17. Constant Updates Managing Buffers Solution 2 (Second Best) If you cannot create a CBs that line up exactly with elements, you can create a tiered constant buffer system Create arrays of 32-byte, 64-byte, 128-byte, 256- byte, etc. constant buffers Keep a shadow copy of the constant data in system memory When it comes time to render, select the smallest CB from the array that will hold the necessary constant data May have to resubmit redundant data for separate passes Hybrid approach?
  • 18. Constant Updates Case Study: Skinning using Solution 1 Skinning in D3D9 (or a bad D3D10 port) Multiple passes causes redundant bone data uploads to the GPU Skinning in D3D10 Using Constant Buffers we only need to upload it once
  • 19. Constant Updates D3D9 Version / or Naïve D3D10 Version Pass1 Mesh2 Bone0 Mesh1 Set Mesh1 Bones Mesh2 Bone1 Mesh1 Bone1 Draw Mesh1 Mesh2 Bone2 Mesh1 Set Mesh2 Bones Constant Mesh2 Bone3 Mesh1 Draw Mesh2 Data Pass2 Mesh2 Bone4 Mesh1 Set Mesh1 Bones … Draw Mesh1 Mesh2 BoneN Mesh1 Set Mesh2 Bones Draw Mesh2
  • 20. Constant Updates Preferred D3D10 Version Mesh1 CB Mesh2 CB Frame Start Mesh1 Bone0 Mesh2 Bone0 Update Mesh1 CB Mesh1 Bone1 Mesh2 Bone1 Update Mesh2 CB Mesh1 Bone2 Mesh2 Bone2 Pass1 Mesh1 Bone3 Mesh2 Bone3 Bind Mesh1 CB Draw Mesh1 Mesh1 Bone4 Mesh2 Bone4 Bind Mesh2 CB … … Draw Mesh2 Mesh1 BoneN Mesh2 BoneN Pass2 Bind Mesh1 CB Draw Mesh1 Bind Mesh2 CB CB Slot 0 Mesh1 Mesh2 CB Draw Mesh2
  • 21. Constant Updates Advanced D3D10 Version Why not store all of our characters’ bones in a 128-bit FP texture? We can upload bones for all visible characters at the start of a frame We can draw similar characters using instancing instead of individual draws Use SV_InstanceID to select the start of the character’s bone data in the texture Stream the skinned meshes to memory using Stream Output and render all subsequent passes from the post-skinned buffer
  • 22. State Management Individual state setting is no longer possible in D3D10 State in D3D10 is stored in state objects These state objects are immutable To change even one aspect of a state object requires that you create an entirely new state object with that one change
  • 23. State Management Managing State Objects Solution 1 (Fastest) If you have a known set of materials and required states, you can create all state objects at load time State objects are small and there are finite set of permutations With all state objects created at runtime, all that needs to be done during rendering is to bind the object
  • 24. State Management Managing State Objects Solution 2 (Second Best) If your content is not finalized, or if you CANNOT get your engine to lump state together Create a state object hash table Hash off of the setting that has the most unique states Grab pre-created states from the hash-table Why not give your tools pipeline the ability to do this for a level and save out the results?
  • 25. Shader Linkage D3D9 shader linkage was based off of semantics (POSITION, NORMAL, TEXCOORDN) D3D10 linkage is based off of offsets and sizes This means stricter linkage rules This also means that the driver doesn’t have to link shaders together at every draw call!
  • 26. Shader Linkage No Holes Allowed! Elements must be read in the order they are output from the previous stage Cannot have “holes” between linkages Struct VS_OUTPUT Struct PS_INPUT { { float3 Norm : NORMAL; float2 Tex : TEXCOORD0; float3 Norm NORMAL; float2 Tex : TEXCOORD0; float3 Norm : NORMAL; Tex TEXCOORD0; float2 Tex2 : TEXCOORD1; float2 Tex2 : TEXCOORD1; float4 Pos : SV_POSITION; }; }; Holes at the end are OK
  • 27. Shader Linkage Input Assembler to Vertex Shader Input Layouts define the signature of the vertex stream data Input Layouts are the similar to Vertex Declarations in D3D9 Strict linkage rules are a big difference Creating Input Layouts on the fly is not recommended CreateInputLayout requires a shader signature to validate against
  • 28. Shader Linkage Input Assembler to Vertex Shader Solution 1 (Fastest) Create an Input Layout for each unique Vertex Stream / Vertex Shader combination up front Input Layouts are small This assumes that the shader input signature is available when you call CreateInputLayout Try to normalize Input Layouts across level or be art directed
  • 29. Shader Linkage Input Assembler to Vertex Shader Solution 2 (Second Best) If you load meshes and create input layouts before loading shaders, you might have a problem You can use a similar hashing scheme as the one used for State Objects When the Input Layout is needed, search the hash for an Input Layout that matches the Vertex Stream and Vertex Shader signature Why not store this data to a file and pre- populate the Input Layouts after your content is tuned?
  • 30. Shader Linkage Aside: Instancing Instancing is a first class citizen on D3D10! Stream source frequency is now part of the Input Layout Multiple frequencies will mean multiple Input Layouts
  • 31. Resource Updates Updating resources is different in D3D10 Create / Lock / Fill / Unlock paradigm is no longer necessary (although you can still do it) Texture data can be passed into the texture at create time
  • 32. Resource Updates Resource Usage Types D3D10_USAGE_DEFAULT D3D10_USAGE_IMMUTABLE D3D10_USAGE_DYNAMIC D3D10_USAGE_STAGING
  • 33. Resource Updates D3D10_USAGE_DEFAULT Use for resources that need fast GPU read and write access Can only be updated using UpdateSubresource Render targets are good candidates Textures that are updated infrequently (less than once per frame) are good candidates
  • 34. Resource Updates D3D10_USAGE_IMMUTABLE Use for resources that need fast GPU read access only Once they are created, they cannot be updated... ever Initial data must be passed in during the creation call Resources that will never change (static textures, VBs / Ibs) are good candidates Don’t bend over backwards trying to make everything D3D10_USAGE_IMMUTABLE
  • 35. Resource Updates D3D10_USAGE_DYNAMIC Use for resources that need fast CPU write access (at the expense of slower GPU read access) No CPU read access Can only be updated using Map with: D3D10_MAP_WRITE_DISCARD D3D10_MAP_WRITE_NO_OVERWRITE Dynamic Vertex Buffers are good candidates Dynamic (> once per frame) textures are good candidates
  • 36. Resource Updates D3D10_USAGE_STAGING This is the only way to read data back from the GPU Can only be updated using Map Cannot map with D3D10_MAP_WRITE_DISCARD or D3D10_MAP_WRITE_NO_OVERWRITE Might want to double buffer to keep from stalling GPU The GPU cannot directly use these
  • 37. Resource Updates Summary CPU updates the resource frequently (more than once per frame) Use D3D10_USAGE_DYNAMIC CPU updates the resource infrequently (once per frame or less) Use D3D10_USAGE_DEFAULT CPU doesn’t update the resource Use D3D10_USAGE_IMMUTABLE CPU needs to read the resource Use D3D10_USAGE_STAGING
  • 38. Resource Updates Example: Vertex Buffer The vertex buffer is touched by the CPU less than once per frame Create it with D3D10_USAGE_DEFAULT Update it with UpdateSubresource The vertex buffer is used for dynamic geometry and CPU need to update if multiple times per frame Create it with D3D10_USAGE_DYNAMIC Update it with Map
  • 39. Resource Updates The Exception: Constant Buffers CBs are always expected to be updated frequently Select CB usage based upon which one causes the least amount of system memory to be transferred Not just to the GPU, but system-to-system memory copies as well
  • 40. Resource Updates UpdateSubresource UpdateSubresource requires a system memory buffer and incurs an extra copy Use if you have system copies of your constant data already in one place
  • 41. Resource Updates Map Map requires no extra system memory but may hit driver renaming limits if abused Use if compositing values on the fly or collecting values from other places
  • 42. Resource Updates A note on overusing discard Use D3D10_MAP_WRITE_DISCARD carefully with buffers! D3D10_MAP_WRITE_DISCARD tells the driver to give us a new memory buffer if the current one is busy There are a LIMITED set of temporary buffers If these run out, then your app will stall until another buffer can be freed This can happen if you do dynamic geometry using one VB and D3D10_MAP_WRITE_DISCARD
  • 43. Dynamic Geometry DrawIndexedPrimitiveUP is gone! DrawPrimitiveUP is gone! Your well-behaved D3D9 app isn’t using these anyway, right?
  • 44. Dynamic Geometry Solution: Same as in D3D9 Use one large buffer, and map it with D3D10_MAP_WRITE_NO_OVERWRITE Advance the write position with every draw Wrap to the beginning Make sure your buffer is large enough that you’re not overwriting data that the GPU is reading This is what happens under the covers for D3D9 when using DIPUP or DUP in Windows Vista
  • 45. Porting Tips StretchRect is Gone Work around using render-to-texture A8R8G8B8 have been replaced with R8G8B8A8 formats Swizzle on texture load or swizzle in the shader Fixed Function AlphaTest is Gone Add logic to the shader and call discard Fixed Function Fog is Gone Add it to the shader
  • 46. Porting Tips Continued User Clip Planes usage has changed They’ve move to the shader Experiment with the SV_ClipDistance SEMANTIC vs discard in the PS to determine which is faster for your shader Query data sizes might have changed Occlusion queries are UINT64 vs DWORD No Triangle Fan Support Work around in content pipeline or on load SetCursorProperties, ShowCursor are gone Use Win32 APIs to handle cursors now
  • 47. Porting Tips Continued No offsets on Map calls This was basically API clutter in D3D9 Calculate the offset from the returned pointer Clears are no longer bound to pipeline state If you want a clear call to respect scissor, stencil, or other state, draw a full-screen quad This is closer to the HW The Driver/HW has been doing for you for years OMSetBlendState Never set the SampleMask to 0 in OMSetBlendState
  • 48. Porting Tips Continued Input Layout conversions tightened up D3DDECLTYPE_UBYTE4 in the vertex stream could be converted to a float4 in the VS in D3D9 IE. 255u in the stream would show up as 255.0 in the VS In D3D10 you either get a normalized [0..1] value or 255 (u)int Register keyword It doesn’t mean the same thing in D3D10 Use register to determine which CB slot a CB binds to Use packoffset to place a variable inside a CB
  • 49. Porting Tips Continued Sampler and Texture bindings Samplers can be bound independently of textures This is very flexible! Sampler and Texture slots are not always the same Register Packing In D3D9 all variables took up at least one float4 register (even if you only used a single float!) In D3D10 variables are packed together This saves a lot of space Make sure your engine doesn’t do everything based upon register offsets or your variables might alias
  • 50. Porting Tips Continued D3DSAMP_SRGBTEXTURE This sampler state setting does not exist on D3D10 Instead it’s included in the texture format This is more like the Xbox 360 Consider re-optimizing resource usage and upload for better D3D10 performance But use D3D10_USAGE_DEFAULT resources and UpdateSubresource and a baseline
  • 51. Summary Use the debug runtime! More draw calls usually means more constant updating and state changing calls Be frugal with constant updates Avoid resubmitting redundant data! Create as much state and input layout information up front as possible Select D3D10_USAGE for resources based upon the CPU access patterns needed Use D3D10_MAP_NO_OVERWRITE and a big buffer as a replacement for DIPUP and DUP
  • 52. Call to Action Actually exploit D3D10! This talk tells you how to get performance gains from a straight port You can get a whole lot more by using D3D10’s advanced features! StreamOut to minimize skinning costs First class instancing support Store some vertex data in textures Move some systems to the GPU (Particles?) Aggressive use of Constant Buffers
  • 53. http://www.xna.com © 2007 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.