SlideShare a Scribd company logo
Visual Studio 2010Using the Parallel Computing PlatformPhil Penningtonphilpenn@microsoft.com
Agenda2What’s new with Windows?Parallel Computing Tools in Visual StudioUsing .NET Parallel Extensions
First, An ExampleMonte Carlo Approximation of PiS = 4*r*r C = Pi*r*rPi = 4*(C/S)For each Point (P),d(P) = SQRT((x * x) + (y * y))if (d < r) thenP(x,y) is in C
Windows and Maximum ProcessorsBefore Win7/R2, the maximum number of Logical Processors (LPs) was dictated by processor integral word sizeLP state (e.g. idle, affinity) represented in word-sized bitmask32-bit Windows: 32 LPs64-bit Windows: 64 LPs32-bit Idle Processor Mask31016BusyIdle
Processor GroupsNew with Windows7 and Windows Server R25GROUPNUMA NODESocketSocketCoreCoreLPLPLPLPCoreCoreNUMA NODE
Processor GroupsExample: 2 Groups, 4 nodes, 8 sockets, 32 cores, 128 LP’s 6GroupGroupNUMA NodeNUMA NodeSocketSocketSocketSocketNUMA NodeNUMA NodeSocketSocketSocketSocketCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLP
Many-Core Topology APIs Discovery7
Many-Core Topology APIs Resource Localization8
Many-Core Topology APIs Memory Management9
User Mode SchedulingArchitectural PerspectiveUMS Scheduler’s Ready ListYour SchedulerLogicWaitReason:YieldReason:YieldReason:BlockedReason:CreatedCPU 1CPU 2UMS Completion ListW1W2W3W4S1S2ApplicationKernelBlocked Worker ThreadsScheduler Threads
Task Scheduling with a UMS SchedulerMaximize Quantum, Minimize Blocking AffectsTasks are run by worker threads, which the scheduler controlsDead ZoneWT0WT1WT2WT3Without UMS (signal-and-wait)WT0WT1WT2WT3With UMS (UMS yield)
Load-Balancing, Work Stealing SchedulerDynamicSchedulingStatic SchedulingCPU0CPU1CPU2CPU3CPU0CPU1CPU2CPU3Dynamic scheduling improves performance by distributing work efficiently at runtime.
DemosThe Platform- Topology- Schedulers
Agenda14What’s new with Windows?Parallel Computing Tools in Visual StudioUsing .NET Parallel Extensions
Visual Studio 2010, .NET Developer Tools, Programming Models, RuntimesToolsProgramming Models – Structured ParallelismParallel LINQ(PLINQ)Task ParallelLibrary (TPL)Debugger Data Structures.NET Parallel ExtensionsProfilerTask SchedulerResource Manager.NET RuntimeThreads PoolsManaged LibraryTools
Thread-Pool Scheduler in .NET 4.0Thread 1Dispatch LoopThread 2Dispatch LoopThread NDispatch LoopEnqueueDequeueT1T2T3T4Global Queue (FIFO)DequeueEnqueueT5Global Q is shared by legacy ThreadPool API and TPLLocal work queues and work stealing scheduler (TPL only)T6T7T8StealStealStealThread 1Local Queue (LIFO)Thread 2Local Queue (LIFO)Thread NLocal Queue (LIFO)
Task Parallel Library (TPL)Tasks ConceptsCommon Functionality: waiting, cancellation, continuations, parent/child relationships
Primitives and StructuresThread-safe, scalable collectionsIProducerConsumerCollection<T>ConcurrentQueue<T>ConcurrentStack<T>ConcurrentBag<T>ConcurrentDictionary<TKey,TValue>Phases and work exchangeBarrier BlockingCollection<T>CountdownEventPartitioning{Orderable}Partitioner<T>Partitioner.CreateException handlingAggregateExceptionInitializationLazy<T>LazyInitializer.EnsureInitialized<T>ThreadLocal<T>LocksManualResetEventSlimSemaphoreSlimSpinLockSpinWaitCancellationCancellationToken{Source}
Parallel DebuggingTwo new debugger toolwindowsSupport both native and managed“Parallel Tasks”“Parallel Stacks”
Parallel TasksWhat threads are executing my tasks?
Where are my tasks running (location, call stack)?
Which tasks are blocked?
How many tasks are waiting to run?Parallel StacksZoom controlMultiple call stacks in a single view
Task-specific view (Task status)
Easy navigation to any executing method
Rich UI (zooming, panning, bird’s eye view, flagging, tooltips)Bird’s eye view
Parallel Profiling
CPU UtilizationOther processesNumber of coresIdle timeYour Process
ThreadsMeasure time for interesting segmentsHide uninteresting threadsZoom in and outDetailed thread analysis(one channel per thread)Active LegendUsage HintsCall Stacks
CoresEach logical core in a swim laneOne color per threadMigration visualizationCross-core migration details
DemoLibrariesLanguagesDebuggersProfilers
Agenda27What’s new with Windows?Parallel Computing Tools in Visual StudioUsing .NET Parallel Extensions
Thinking Parallel - “Task” vs. “Data” ParallelismTask ParallelismParallel.Invoke(		() =>	{ Console.WriteLine("Begin first task...");  },        	() =>	{ Console.WriteLine("Begin second task..."); }, 		() =>	{ Console.WriteLine("Begin third task...");  } ); Data ParallelismIEnumerable<int> numbers = Enumerable.Range(2, 100-3);varmyQuery = 		from n in numbers.AsParallel()		where Enumerable.Range(2, (int)Math.Sqrt(n)).All(i => n % i > 0)		select n;int[] primes = myQuery.ToArray();

More Related Content

What's hot (19)

PPTX
Deep Learning, Keras, and TensorFlow
Oswald Campesato
 
PPTX
Tensorflow windows installation
marwa Ayad Mohamed
 
PPTX
Introduction to Machine Learning with TensorFlow
Paolo Tomeo
 
PDF
On the Necessity and Inapplicability of Python
Takeshi Akutsu
 
PDF
On the necessity and inapplicability of python
Yung-Yu Chen
 
PDF
Numba: Array-oriented Python Compiler for NumPy
Travis Oliphant
 
PDF
TensorFlow.Data 및 TensorFlow Hub
Jeongkyu Shin
 
PDF
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
Big Data Spain
 
PDF
Reversing the dropbox client on windows
extremecoders
 
PDF
Introduction to TensorFlow
Ralph Vincent Regalado
 
ODP
Tensorflow for Beginners
Sam Dias
 
PDF
Natural language processing open seminar For Tensorflow usage
hyunyoung Lee
 
PPTX
Neural Networks with Google TensorFlow
Darshan Patel
 
PPTX
Deep Learning, Scala, and Spark
Oswald Campesato
 
PPTX
Deep Learning: R with Keras and TensorFlow
Oswald Campesato
 
PPT
Os Reindersfinal
oscon2007
 
PDF
Introduction to Deep Learning, Keras, and TensorFlow
Sri Ambati
 
PDF
Tensorflow presentation
Ahmed rebai
 
PDF
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
Edureka!
 
Deep Learning, Keras, and TensorFlow
Oswald Campesato
 
Tensorflow windows installation
marwa Ayad Mohamed
 
Introduction to Machine Learning with TensorFlow
Paolo Tomeo
 
On the Necessity and Inapplicability of Python
Takeshi Akutsu
 
On the necessity and inapplicability of python
Yung-Yu Chen
 
Numba: Array-oriented Python Compiler for NumPy
Travis Oliphant
 
TensorFlow.Data 및 TensorFlow Hub
Jeongkyu Shin
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
Big Data Spain
 
Reversing the dropbox client on windows
extremecoders
 
Introduction to TensorFlow
Ralph Vincent Regalado
 
Tensorflow for Beginners
Sam Dias
 
Natural language processing open seminar For Tensorflow usage
hyunyoung Lee
 
Neural Networks with Google TensorFlow
Darshan Patel
 
Deep Learning, Scala, and Spark
Oswald Campesato
 
Deep Learning: R with Keras and TensorFlow
Oswald Campesato
 
Os Reindersfinal
oscon2007
 
Introduction to Deep Learning, Keras, and TensorFlow
Sri Ambati
 
Tensorflow presentation
Ahmed rebai
 
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
Edureka!
 

Similar to Using Parallel Computing Platform - NHDNUG (20)

PPTX
Toub parallelism tour_oct2009
nkaluva
 
PDF
Unmanaged Parallelization via P/Invoke
Dmitri Nesteruk
 
PPTX
Flink internals web
Kostas Tzoumas
 
PPT
Overview Of Parallel Development - Ericnel
ukdpe
 
PDF
Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...
Rodrigo Senra
 
PDF
Standardizing on a single N-dimensional array API for Python
Ralf Gommers
 
PDF
NVIDIA HPC ソフトウエア斜め読み
NVIDIA Japan
 
PPT
An Overview Of Python With Functional Programming
Adam Getchell
 
PPTX
.NET 4 Demystified - Sandeep Joshi
Spiffy
 
PDF
Python For Scientists
aeberspaecher
 
PPT
MTaulty_DevWeek_Parallel
ukdpe
 
PDF
Skiron - Experiments in CPU Design in D
Mithun Hunsur
 
PDF
SDAccel Design Contest: Xilinx SDAccel
NECST Lab @ Politecnico di Milano
 
PPT
Best corporate-r-programming-training-in-mumbai
Unmesh Baile
 
PPTX
Introduction to Python Programming – Part I.pptx
shakkarikondas
 
PDF
1032 cs208 g operation system ip camera case share.v0.2
Stanley Ho
 
PDF
L Fu - Dao: a novel programming language for bioinformatics
Jan Aerts
 
PDF
Parallel program design
ZongYing Lyu
 
PDF
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Fwdays
 
PPTX
R and Python, A Code Demo
Vineet Jaiswal
 
Toub parallelism tour_oct2009
nkaluva
 
Unmanaged Parallelization via P/Invoke
Dmitri Nesteruk
 
Flink internals web
Kostas Tzoumas
 
Overview Of Parallel Development - Ericnel
ukdpe
 
Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...
Rodrigo Senra
 
Standardizing on a single N-dimensional array API for Python
Ralf Gommers
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA Japan
 
An Overview Of Python With Functional Programming
Adam Getchell
 
.NET 4 Demystified - Sandeep Joshi
Spiffy
 
Python For Scientists
aeberspaecher
 
MTaulty_DevWeek_Parallel
ukdpe
 
Skiron - Experiments in CPU Design in D
Mithun Hunsur
 
SDAccel Design Contest: Xilinx SDAccel
NECST Lab @ Politecnico di Milano
 
Best corporate-r-programming-training-in-mumbai
Unmesh Baile
 
Introduction to Python Programming – Part I.pptx
shakkarikondas
 
1032 cs208 g operation system ip camera case share.v0.2
Stanley Ho
 
L Fu - Dao: a novel programming language for bioinformatics
Jan Aerts
 
Parallel program design
ZongYing Lyu
 
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Fwdays
 
R and Python, A Code Demo
Vineet Jaiswal
 
Ad

Recently uploaded (20)

PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Ad

Using Parallel Computing Platform - NHDNUG

  • 1. Visual Studio 2010Using the Parallel Computing PlatformPhil [email protected]
  • 2. Agenda2What’s new with Windows?Parallel Computing Tools in Visual StudioUsing .NET Parallel Extensions
  • 3. First, An ExampleMonte Carlo Approximation of PiS = 4*r*r C = Pi*r*rPi = 4*(C/S)For each Point (P),d(P) = SQRT((x * x) + (y * y))if (d < r) thenP(x,y) is in C
  • 4. Windows and Maximum ProcessorsBefore Win7/R2, the maximum number of Logical Processors (LPs) was dictated by processor integral word sizeLP state (e.g. idle, affinity) represented in word-sized bitmask32-bit Windows: 32 LPs64-bit Windows: 64 LPs32-bit Idle Processor Mask31016BusyIdle
  • 5. Processor GroupsNew with Windows7 and Windows Server R25GROUPNUMA NODESocketSocketCoreCoreLPLPLPLPCoreCoreNUMA NODE
  • 6. Processor GroupsExample: 2 Groups, 4 nodes, 8 sockets, 32 cores, 128 LP’s 6GroupGroupNUMA NodeNUMA NodeSocketSocketSocketSocketNUMA NodeNUMA NodeSocketSocketSocketSocketCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLPLP
  • 8. Many-Core Topology APIs Resource Localization8
  • 9. Many-Core Topology APIs Memory Management9
  • 10. User Mode SchedulingArchitectural PerspectiveUMS Scheduler’s Ready ListYour SchedulerLogicWaitReason:YieldReason:YieldReason:BlockedReason:CreatedCPU 1CPU 2UMS Completion ListW1W2W3W4S1S2ApplicationKernelBlocked Worker ThreadsScheduler Threads
  • 11. Task Scheduling with a UMS SchedulerMaximize Quantum, Minimize Blocking AffectsTasks are run by worker threads, which the scheduler controlsDead ZoneWT0WT1WT2WT3Without UMS (signal-and-wait)WT0WT1WT2WT3With UMS (UMS yield)
  • 12. Load-Balancing, Work Stealing SchedulerDynamicSchedulingStatic SchedulingCPU0CPU1CPU2CPU3CPU0CPU1CPU2CPU3Dynamic scheduling improves performance by distributing work efficiently at runtime.
  • 14. Agenda14What’s new with Windows?Parallel Computing Tools in Visual StudioUsing .NET Parallel Extensions
  • 15. Visual Studio 2010, .NET Developer Tools, Programming Models, RuntimesToolsProgramming Models – Structured ParallelismParallel LINQ(PLINQ)Task ParallelLibrary (TPL)Debugger Data Structures.NET Parallel ExtensionsProfilerTask SchedulerResource Manager.NET RuntimeThreads PoolsManaged LibraryTools
  • 16. Thread-Pool Scheduler in .NET 4.0Thread 1Dispatch LoopThread 2Dispatch LoopThread NDispatch LoopEnqueueDequeueT1T2T3T4Global Queue (FIFO)DequeueEnqueueT5Global Q is shared by legacy ThreadPool API and TPLLocal work queues and work stealing scheduler (TPL only)T6T7T8StealStealStealThread 1Local Queue (LIFO)Thread 2Local Queue (LIFO)Thread NLocal Queue (LIFO)
  • 17. Task Parallel Library (TPL)Tasks ConceptsCommon Functionality: waiting, cancellation, continuations, parent/child relationships
  • 18. Primitives and StructuresThread-safe, scalable collectionsIProducerConsumerCollection<T>ConcurrentQueue<T>ConcurrentStack<T>ConcurrentBag<T>ConcurrentDictionary<TKey,TValue>Phases and work exchangeBarrier BlockingCollection<T>CountdownEventPartitioning{Orderable}Partitioner<T>Partitioner.CreateException handlingAggregateExceptionInitializationLazy<T>LazyInitializer.EnsureInitialized<T>ThreadLocal<T>LocksManualResetEventSlimSemaphoreSlimSpinLockSpinWaitCancellationCancellationToken{Source}
  • 19. Parallel DebuggingTwo new debugger toolwindowsSupport both native and managed“Parallel Tasks”“Parallel Stacks”
  • 20. Parallel TasksWhat threads are executing my tasks?
  • 21. Where are my tasks running (location, call stack)?
  • 22. Which tasks are blocked?
  • 23. How many tasks are waiting to run?Parallel StacksZoom controlMultiple call stacks in a single view
  • 25. Easy navigation to any executing method
  • 26. Rich UI (zooming, panning, bird’s eye view, flagging, tooltips)Bird’s eye view
  • 28. CPU UtilizationOther processesNumber of coresIdle timeYour Process
  • 29. ThreadsMeasure time for interesting segmentsHide uninteresting threadsZoom in and outDetailed thread analysis(one channel per thread)Active LegendUsage HintsCall Stacks
  • 30. CoresEach logical core in a swim laneOne color per threadMigration visualizationCross-core migration details
  • 32. Agenda27What’s new with Windows?Parallel Computing Tools in Visual StudioUsing .NET Parallel Extensions
  • 33. Thinking Parallel - “Task” vs. “Data” ParallelismTask ParallelismParallel.Invoke( () => { Console.WriteLine("Begin first task..."); }, () => { Console.WriteLine("Begin second task..."); }, () => { Console.WriteLine("Begin third task..."); } ); Data ParallelismIEnumerable<int> numbers = Enumerable.Range(2, 100-3);varmyQuery = from n in numbers.AsParallel() where Enumerable.Range(2, (int)Math.Sqrt(n)).All(i => n % i > 0) select n;int[] primes = myQuery.ToArray();
  • 34. Thinking Parallel – How to Partition Work? Several partitioning schemes built-inChunkWorks with any IEnumerable<T>Single enumerator shared; chunks handed out on-demandRangeWorks only with IList<T>Input divided into contiguous regions, one per partitionStripeWorks only with IList<T>Elements handed out round-robin to each partitionHashWorks with any IEnumerable<T>Elements assigned to partition based on hash codeCustom partitioning available through Partitioner<T>Partitioner.Create available for tighter control over built-in partitioning schemes
  • 35. Thinking Parallel – How to Execute Tasks?
  • 36. Thinking Parallel – How to Collate Results?
  • 38. ResourcesNativeAPIs/runtimes (Visual C++ 10)Tasks, loops, collections, and Agentshttp://msdn.microsoft.com/en-us/library/dd504870(VS.100).aspxTools (in the VS2010 IDE)Debugger and profilerhttp://msdn.microsoft.com/en-us/library/dd460685(VS.100).aspxManaged APIs/runtimes (.NET 4)Tasks, loops, collections, and PLINQhttp://msdn.microsoft.com/en-us/library/dd460693(VS.100).aspxGeneral VS2010 Parallel Computing Developer Centerhttp://msdn.microsoft.com/en-us/concurrency/default.aspx

Editor's Notes

  • #11: Let’s use this slide for an “Architectural Perspective” of UMS.&lt;CLICK&gt;S1 and S2 are the first threads created within a UMS solution. These are “Scheduler Threads” or “Primary Threads”. These threads represent “core” or physical CPU’s from a Scheduler perspective. These are normal threads to begin with, but you would typically first establish processor affinity using the new CreateRemoteThreadEx API and the use a new API, EnterUmsSchedulingMode, to specify that the new thread is a Scheduler thread.You pass in a callback, i.e. UMSSchedulerProc, function pointer to begin executing instructions on the Scheduler thread.A UMS worker thread is created by calling CreateRemoteThreadEx with the PROC_THREAD_ATTRIBUTE_UMS_THREAD attribute and specifying a UMS thread context and a completion list. The OS places these threads into the Completion List and your Scheduler logic takes over typically placing the new threads onto the Scheduler’s Ready List.&lt;CLICK&gt;The first thing that a Scheduler should do is move it’s associated Worker threads onto the Scheduler’s Ready List. Then, it can began executing your customer scheduler logic.&lt;CLICK&gt;Each of the Scheduler threads should then pop a Worker thread off of the Ready List and run it on the associated Core. When this occurs, the Scheduler thread context is essentially lost forever… the Worker thread now owns the core and is executing. The Scheduler thread will not regain the core until a processor Yield event occurs.&lt;CLICK&gt;The first thing that could happen is that this thread could yield. Yield is again a Scheduler callback mechanism and perhaps the single most important function of UMS. It’s within the Yield that you will implement your own synchronization primitives and scheduling logic.Ideally, the yielding thread provides some contextual information to the scheduler (maybe it wants to wait on some specific application domain event to occur). Your Scheduler would look at this Yield request and associated context and make a scheduling decision.&lt;CLICK&gt;Maybe the Scheduler places the Worker thread within a Wait list for that specific event or event type.Now your Scheduler has to decide what to run next. &lt;CLICK&gt;Maybe the next Worker thread from the Ready List, for instance… and we’re back running again. Note, that no kernel scheduling context switch was necessary. Maybe that wait event handling took 200 cycles in user-mode. It may have cost 10 times that with a kernel context switch.&lt;CLICK&gt;Let’s now assume that this worker performs a system call… At this point, we switch the worker thread to it’s kernel-mode context and the thread continues to run within the kernel. If it does not block (in other words, if it doesn’t use one of the kernel synchronization primitives, then it just continues to run. If the thread never blocks in the kernel, then it just returns to user-mode and continues to run and do work.&lt;CLICK&gt;Let’s assume that the thread does block. Maybe a page fault occurred, for instance. Now our Scheduler thread regains control of the processor via a callback from the kernel. Now, the kernel is telling your Scheduler that a worker thread is blocked and the reason for that block. This is the point where we integrate kernel synchronization with user synchronization. But now, you get to decide what to run next.&lt;CLICK&gt;The Scheduler looks at the state of it’s affairs and perhaps decides to run the next Worker thread from the Ready List, for example.Let’s assume that later in time Worker 3 unblocks. &lt;CLICK&gt;The kernel will now place this unblocked Worker thread into the UMS Completion List.&lt;CLICK&gt;At the next Yield event, for instance, we get another Scheduler decision opportunity. Maybe this Yield contains information that affects the state of our Wait list.&lt;CLICK&gt;The first thing that the Scheduler should do, however, is manage the Completion List and move any unblocked threads to the Ready List.&lt;CLICK&gt;Next, our Scheduler must make a priority decision. Maybe our Waiting thread gets to run again and our Yielding thread gets placed upon the Ready List.And we’re done…
  • #12: UMS is an enabler for:Finer-grained parallelismMore deterministic behaviorBetter cache localityUMS allows your Scheduler to boost performance in certain situations:Apps that have a lot of blocking, for example
  • #13: Think Tasks not Threads.Threads represent execution flow, not workHard-coded; significant system overheadMinimal intrinsic parallel constructsQueueUserWorkItem() is handy for fire-and-forgetBut what about…WaitingCancelingContinuingComposingExceptionsDataflowIntegrationDebugging
  • #16: NOW, LET’S FIRST CONSIDER THE TOOLS ARCHICTECTURE FROM A .NET DEVELOPER’S PERSPECTIVE.LET’S START WITH THE .NET Runtime AND THE .NET Parallel Extensions library. In a moment, we’ll look at how a developer uses the extensions within their application. The .NET PARALLEL EXTENSIONS provide the benefits of concurrent task scheduling without YOU having to build a custom scheduler that is appropriately reentrant, thread-safe, and non-blocking.&lt;CLICK&gt;The Parallel Extensions library contains a Task Scheduler and a Resource Manager component that integrates with the underlying .NET Runtime. The Resource Manager manages access to system resources like the collection of available CPU’s. &lt;CLICK&gt;The Scheduler leverages only thread pools for task scheduling. &lt;CLICK&gt;The Parallel Extensions also supports multiple Programming Models. &lt;CLICK&gt;The Task Parallel Library (TPL) is an easy and convenient way to express fine-grain parallelism within your applications. The TPL provides patterns for Task Execution, Synchronization, and Data Sharing.&lt;CLICK&gt;The PLINQ (or Parallel LINQ) enables parallel query execution not only on SQL Data but also on XML or Collections Data.&lt;CLICK&gt;The Parallel Extensions also includes Data Structures that are “scheduler aware” enabling you to optimally specify task scheduling requests and custom scheduler policies.&lt;CLICK&gt;Again, Visual Studio 2010 includes new tools for parallel application development and testing. These include:&lt;CLICK&gt;A new parallel debugger. And…&lt;CLICK&gt;A new parallel application profiler.Let’s take a brief look at a simple .NET parallel application along with the Visual Studio 2010 Debugger and Parallel Performance Analyzer.Pure .NET librariesFeature areasTask Parallel LibraryParallel LINQSynchronization primitives and thread-safe data structuresEnhanced ThreadPool