Slide 1
Introduction to Data-Oriented Design
(Daniel Collin, DICE)
publications.dice.se/attachments/Introduction_to_Data-Oriented_Design.pdf
So what is this Data-Oriented Design?
It’s about on shifting focus to how data is read and written
Why should we care?
Performance
A read from memory takes ~600 cycles at 3.2 GHz
A read from memory takes 40 cycles at 300 MHz
Performance
Disks (Blu-ray/DVD/HDD)
Main Memory
L2 Cache
L1 Cache
CPU / Registers
Latency
🙁
600 cycles
40 cycles
1 – 2 cycles
Multithreading
Object
Read?
Write?
Object
update()
Object
Cannot multithread without knowing how data is touched
Adding locks always protects data not code
Read?
Write?
Read?
Write?
Offloading to co-unit
?
SPU/GPU/APU
?
If data is unknown hard/impossible to run on co-unit
Better design
Data focus can lead to isolated, self-contained, interchangeable pieces of data and code
This can make it easier to test data and code in isolation
Example – OOD
class Bot
{
…
Vec3 m_position;
…
float m_mod;
…
float m_aimDirection;
…
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
}
Example – OOD
class Bot
{
…
Vec3 m_position;
…
float m_mod;
…
float m_aimDirection;
…
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
}
icache-miss
Example – OOD
class Bot
{
…
Vec3 m_position;
…
float m_mod;
…
float m_aimDirection;
…
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
}
icache-miss
data-miss
Example – OOD
class Bot
{
…
Vec3 m_position;
…
float m_mod;
…
float m_aimDirection;
…
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
}
icache-miss
data-miss
Unused
cached
data
Example – OOD
class Bot
{
…
Vec3 m_position;
…
float m_mod;
…
float m_aimDirection;
…
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
}
icache-miss
data-miss
Unused
cached
data
Very hard to optimize!
Example – OOD
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
Lets say we call this code 4 times (4 diffrent Bots)
Example – OOD
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
Lets say we call this code 4 times (4 diffrent Bots)
iCache – 600
Example – OOD
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
Lets say we call this code 4 times (4 diffrent Bots)
iCache – 600
m_position – 600
Example – OOD
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
Lets say we call this code 4 times (4 diffrent Bots)
iCache – 600
m_position – 600
m_mod – 600
Example – OOD
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
Lets say we call this code 4 times (4 diffrent Bots)
iCache – 600
m_position – 600
m_mod – 600
~20 cycles
Example – OOD
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
Lets say we call this code 4 times (4 diffrent Bots)
iCache – 600
m_position – 600
m_mod – 600
aimDir – 100
~20 cycles
Example – OOD
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
Lets say we call this code 4 times (4 diffrent Bots)
iCache – 600
m_position – 600
m_mod – 600
aimDir – 100
~20 cycles
iCache – 600
m_position – 600
m_mod – 600
aimDir – 100
Example – OOD
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
Lets say we call this code 4 times (4 diffrent Bots)
iCache – 600
m_position – 600
m_mod – 600
aimDir – 100
~20 cycles
iCache – 600
m_position – 600
m_mod – 600
aimDir – 100
iCache – 600
m_position – 600
m_mod – 600
aimDir – 100
Example – OOD
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
Lets say we call this code 4 times (4 diffrent Bots)
iCache – 600
m_position – 600
m_mod – 600
aimDir – 100
~20 cycles
iCache – 600
m_position – 600
m_mod – 600
aimDir – 100
iCache – 600
m_position – 600
m_mod – 600
aimDir – 100
iCache – 600
m_position – 600
m_mod – 600
aimDir – 100
Example – OOD
void updateAim(Vec3 target)
{
m_aimDirection = dot3(m_position, target) * m_mod;
}
Lets say we call this code 4 times (4 diffrent Bots)
iCache – 600
m_position – 600
m_mod – 600
aimDir – 100
~20 cycles
iCache – 600
m_position – 600
m_mod – 600
aimDir – 100
iCache – 600
m_position – 600
m_mod – 600
aimDir – 100
iCache – 600
m_position – 600
m_mod – 600
aimDir – 100
7680
Example – DOD
Example – DOD
Design ”back to front” and focus on the output data
Example – DOD
Design ”back to front” and focus on the output data
Then add the minimal amount of data needed to do the transform to create the correct output
Example – DOD
void updateAims(float* aimDir,const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i];
}
}
Example – DOD
void updateAims(float* aimDir,const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i];
}
}
What has changed?
Example – DOD
void updateAims(float* aimDir,const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i];
}
}
Only read needed inputs
What has changed?
Example – DOD
void updateAims(float* aimDir,const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i];
}
}
Only read needed inputs
Write to linear array
What has changed?
Example – DOD
void updateAims(float* aimDir,const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i];
}
}
Only read needed inputs
Write to linear array
Loop over all the data
What has changed?
Example – DOD
void updateAims(float* aimDir,const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i];
}
}
Only read needed inputs
Write to linear array
Loop over all the data
Actual code unchanged
What has changed?
Example – DOD
void updateAims(float* aimDir,const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i];
}
}
Only read needed inputs
Write to linear array
Loop over all the data
Actual code unchanged
What has changed?
Code separated
Example – DOD
void updateAims(float* aimDir, const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i];
}
}
Example – DOD
void updateAims(float* aimDir, const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i];
}
}
iCache – 600
Example – DOD
void updateAims(float* aimDir, const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i];
}
}
iCache – 600
positions – 600
Example – DOD
void updateAims(float* aimDir, const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i];
}
}
iCache – 600
positions – 600
mod – 600
Example – DOD
void updateAims(float* aimDir, const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i];
}
}
iCache – 600
positions – 600
mod – 600
~20 cycles
Example – DOD
void updateAims(float* aimDir, const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i];
}
}
iCache – 600
positions – 600
mod – 600
aimDir – 100
~20 cycles
Example – DOD
void updateAims(float* aimDir, const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i];
}
}
iCache – 600
positions – 600
mod – 600
aimDir – 100
~20 cycles
Example – DOD
void updateAims(float* aimDir, const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i];
}
}
iCache – 600
positions – 600
mod – 600
aimDir – 100
~20 cycles
Example – DOD
void updateAims(float* aimDir, const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i];
}
}
iCache – 600
positions – 600
mod – 600
aimDir – 100
~20 cycles
Example – DOD
void updateAims(float* aimDir, const AimingData* aim,
Vec3 target, uint count)
{
for (uint i = 0; i < count; ++i)
{
aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i];
}
}
iCache – 600
positions – 600
mod – 600
aimDir – 100
~20 cycles
1980
Data layout OOD vs DOD
pos0
mod0
aimDir0
pos0
Pos1
mod1
aimDir1
pos0
pos0
pos0
pos1
pos1
pos1
pos1
pos2
pos2
pos2
pos2
pos3
pos3
pos3
pos3
mod0
mod1
mod2
mod3
aimDir0
aimDir1
aimDir2
aimDir3
pos0
pos0
pos0
Data layout OOD vs DOD
pos0
mod0
aimDir0
pos0
Pos1
mod1
aimDir1
pos0
pos0
pos0
pos1
pos1
pos1
pos1
pos2
pos2
pos2
pos2
pos3
pos3
pos3
pos3
mod0
mod1
mod2
mod3
aimDir0
aimDir1
aimDir2
aimDir3
pos0
pos0
pos0
Each color block is one
128 byte cache line
Data layout OOD vs DOD
pos0
mod0
aimDir0
pos0
Pos1
mod1
aimDir1
pos0
pos0
pos0
pos1
pos1
pos1
pos1
pos2
pos2
pos2
pos2
pos3
pos3
pos3
pos3
mod0
mod1
mod2
mod3
aimDir0
aimDir1
aimDir2
aimDir3
pos0
pos0
pos0
Each color block is one
128 byte cache line
Data layout OOD vs DOD
pos0
mod0
aimDir0
pos0
Pos1
mod1
aimDir1
pos0
pos0
pos0
pos1
pos1
pos1
pos1
pos2
pos2
pos2
pos2
pos3
pos3
pos3
pos3
mod0
mod1
mod2
mod3
aimDir0
aimDir1
aimDir2
aimDir3
pos0
pos0
pos0
Each color block is one
128 byte cache line
Data layout OOD vs DOD
pos0
mod0
aimDir0
pos0
Pos1
mod1
aimDir1
pos0
pos0
pos0
pos1
pos1
pos1
pos1
pos2
pos2
pos2
pos2
pos3
pos3
pos3
pos3
mod0
mod1
mod2
mod3
aimDir0
aimDir1
aimDir2
aimDir3
pos0
pos0
pos0
Each color block is one
128 byte cache line
Data layout OOD vs DOD
pos0
mod0
aimDir0
pos0
Pos1
mod1
aimDir1
pos0
pos0
pos0
pos1
pos1
pos1
pos1
pos2
pos2
pos2
pos2
pos3
pos3
pos3
pos3
mod0
mod1
mod2
mod3
aimDir0
aimDir1
aimDir2
aimDir3
pos0
pos0
pos0
Each color block is one
128 byte cache line
Data layout OOD vs DOD
pos0
mod0
aimDir0
pos0
Pos1
mod1
aimDir1
pos0
pos0
pos0
pos1
pos1
pos1
pos1
pos2
pos2
pos2
pos2
pos3
pos3
pos3
pos3
mod0
mod1
mod2
mod3
aimDir0
aimDir1
aimDir2
aimDir3
pos0
pos0
pos0
Each color block is one
128 byte cache line
Data layout OOD vs DOD
pos0
mod0
aimDir0
pos0
Pos1
mod1
aimDir1
pos0
pos0
pos0
pos1
pos1
pos1
pos1
pos2
pos2
pos2
pos2
pos3
pos3
pos3
pos3
mod0
mod1
mod2
mod3
aimDir0
aimDir1
aimDir2
aimDir3
pos0
pos0
pos0
Each color block is one
128 byte cache line
Data layout OOD vs DOD
pos0
mod0
aimDir0
pos0
Pos1
mod1
aimDir1
pos0
pos0
pos0
pos1
pos1
pos1
pos1
pos2
pos2
pos2
pos2
pos3
pos3
pos3
pos3
mod0
mod1
mod2
mod3
aimDir0
aimDir1
aimDir2
aimDir3
pos0
pos0
pos0
Each color block is one
128 byte cache line
Data layout OOD vs DOD
pos0
mod0
aimDir0
pos0
Pos1
mod1
aimDir1
pos0
pos0
pos0
pos1
pos1
pos1
pos1
pos2
pos2
pos2
pos2
pos3
pos3
pos3
pos3
mod0
mod1
mod2
mod3
aimDir0
aimDir1
aimDir2
aimDir3
pos0
pos0
pos0
Each color block is one
128 byte cache line
Its all about memory
Its all about memory
Optimize for data first then code
Its all about memory
Optimize for data first then code
Most code is likely bound by memory access
Its all about memory
Optimize for data first then code
Most code is likely bound by memory access
Not everything needs to be an object
Remember
Remember
We are doing games, we know our data.
Remember
We are doing games, we know our data.
Pre-format. Source data and native data doesn’t need to be the same
Example: Area Triggers
Example: Area Triggers
position
position
position
position
next
position
position
position
position
next
position
position
position
position
next
Source data
(Linked List)
Example: Area Triggers
position
position
position
position
next
position
position
position
position
next
position
position
position
position
next
Source data
(Linked List)
Native Data
(Array)
position
position
position
position
position
position
position
position
position
position
position
position
position
position
count
Example: Culling System
Example: Culling System
Old System
Example: Culling System
Old System
New System
(Linear arrays and brute force)
Example: Culling System
Old System
New System
(Linear arrays and brute force)
3x faster, 1/5 code size, simpler
Data Oriented Design Delivers:
Better Performance
Often simpler code
More parallelizable code
Questions?
Links
Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP) http://gamesfromwithin.com/data-oriented-design
Practical Examples in Data Oriented Design http://bitsquid.blogspot.com/2010/05/practical-examples-in-data-oriented.html
The Latency Elephant http://seven-degrees-of-freedom.blogspot.com/2009/10/latency-elephant.html
Pitfalls of Object Oriented Programming http://seven-degrees-of-freedom.blogspot.com/2009/12/pitfalls-of-object-oriented-programming.html
Insomniac R&D http://www.insomniacgames.com/research_dev
CellPerformance
Image credits
Cat image: http://icanhascheezburger.com/2007/06/24/uninterested-cat photo by: Arinn capped and submitted by: Andy
Playstation 3 and Playstation 2 Copyright to Sony Computer Entertainment
Xbox 360 Image Copyright to Microsoft
“WTF” Code quality image: Copyright by Thom Holwerda http://www.osnews.com/comics