Theron  5.01.00
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Pages
Public Member Functions | Public Attributes | List of all members
Theron::Framework::Parameters Struct Reference

Parameters structure that can be passed to the Framework constructor. More...

#include <Framework.h>

Public Member Functions

 Parameters (const uint32_t threadCount=16, const uint32_t nodeMask=0x1, const uint32_t processorMask=0xFFFFFFFF, const YieldStrategy yieldStrategy=YIELD_STRATEGY_POLITE)
 Constructor.
 

Public Attributes

uint32_t mThreadCount
 The initial number of worker threads to create within the framework.
 
uint32_t mNodeMask
 Specifies the NUMA processor nodes upon which the framework may execute.
 
uint32_t mProcessorMask
 Specifies the subset of the processors in each NUMA processor node upon which the framework may execute.
 
YieldStrategy mYieldStrategy
 Strategy that defines how freely worker threads yield to other system threads.
 

Detailed Description

Parameters structure that can be passed to the Framework constructor.

When an instance of this structure is passed to the constructor of the Framework class, its members can be used to configure the constructed framework.

The members are mainly concerned with configuring the pool of worker threads that is created within the framework, and which is used to execute the actors that are subsequently hosted within it. They allow the user to control the number of threads in the pool, their processor affinities, and their yielding behavior.

By choosing appropriate values it's possible to configure specialized frameworks tailored to particular uses. For example, users writing real-time systems might choose to create a separate framework for hosting actors whose execution is time-critical. Such a framework might be configured with the same number of threads as the number of hosted actors, ensuring that a free thread is always available to process actors with queued messages. Also, one might choose to dedicate a number of processor cores entirely to the execution of those threads, using processor affinity masks to limit execution of the worker threads of the framework to only the selected cores (leaving the remaining cores to other frameworks). Finally, since the cores dedicated to time-critical processing are never used for any other processing, one might choose to set the yield strategy of the worker threads to YIELD_STRATEGY_AGGRESSIVE, effectively busy-waiting for the arrival of new messages.

As well as setting processor affinities, the members of the Parameters structure allow the node affinities of the worker threads to be controlled. This allows users to restrict the execution of a framework's worker threads to a particular node within a NUMA (Non-Uniform Memory Architecture) system.

A NUMA system is one in which the different logical cores within the physical CPU(s) have different views of memory, typically by virtue of being serviced by different physical memory controllers. In such systems, memory may be partitioned into areas that are each directly accessed by a single memory controller, with the cores fed by that controller enjoying faster access to that area of memory. Access to other parts of memory is indirectly served by other controllers, so is slower. Within the context of a NUMA system, a 'node' is a group of logical cores that share the same view of memory (typically by virtue of being serviced by the same memory controller).

Windows and Linux both provide APIs by which the node topology of a system can be discovered. These APIs allow the processor affinities of threads to be set on a per-node basis, limiting threads to execute only on the processors of a particular nodes (or set of nodes). Finally, they provide methods for allocating memory within the area of memory to which a particular node has first class access.

The scalability of a multi-threaded application can be improved by restricting threads that access the same memory to execute only on a limited subset of the available cores. Doing so can improve cache coherency due to the same memory being repeatedly accessed via the caches of those cores.

As threads read and write memory, cached copies of the memory accumulate in the caches local to the cores on which the threads are executed. When a thread writes to memory, any cached copies held in the caches of other cores are invalidated and must be re-fetched from memory. If other threads also write to the same piece of memory (known as shared writes), then the repeated cache invalidations and refreshes can cause significant overheads. These overheads can be especially severe, and can limit scalability, if the threads in question are allowed to execute on different NUMA nodes.

In Theron, Frameworks serve as the mechanism by which worker threads are grouped. Accordingly, Theron allows the node and processor affinities of worker threads to be set on a per-framework basis. The expectation is that the actors within a single framework will mainly message each other, with messages being sent between frameworks far less frequently.

Note
Support for node and processor affinity masks is currently somewhat limited. Supported is implemented with Windows NUMA API in windows builds, and with libnuma under linux. In GCC builds, NUMA support requires libnuma-dev and must be explicitly enabled via THERON_NUMA (or numa=on in the makefile). The mNodeMask member is supported with both version 1 and version 2 of the libnuma API, but the mProcessorMask member is supported only with version 2. Under Windows both members are supported.