Industry Standards to Solve Multicore Challenges
The Multicore Association® (MCA) is an industry association that includes leading-edge companies implementing products that embrace multicore technology. Our primary objective is to define and promote open specifications to enable multicore product development.

Frequently Asked Questions

Expand All | Collapse All | Click on arrow to expand individual FAQ
Communications API
Are there any specific hardware and/or platform requirements to use/learn the MCAPI?
The MCAPI specification does not have any hardware or platform restrictions. Specific MCAPI third party implementations can support one or more operating systems and may have hardware dependencies. Regardless, a tool vendor should offer services to assist in porting the software to another operating system or hardware architecture.
Can MCAPI be implemented on an ASIC or FPGA in gate form?
MCAPI can be implemented in a system if the programmer is able to identify units in the system to serve as MCAPI nodes and there exists an underlying mechanism for transferring data to/from the nodes system-wide. A node can be a processor, task, core, instance of an OS, etc. The MCAPI topology can be hard-coded at compile-time, or the programmer can implement a topology discovery service at the application layer used by the nodes in the system to learn the topology at boot up.
Can MCAPI be used for communication between a smartphone
Yes, MCAPI can be used for this purpose. Contact the specific vendor for details (i.e. Freescale, Qualcomm, Samsung, TI).
What is the maximum number of cores supported by MCAPI?
The MCAPI specification does not limit the number of cores present in the system, it can be used on a single or many cores. Any implementation specific limitations should be documented by the individual vendor.
Can MCAPI be used with multiple microcontrollers?
Yes, MCAPI is applicable to multiple cores on a chip and multiple processors on a board.
How many vendors currently provide MCAPI implementations?
Please refer to the MCA Implementation section
Is there an open implementation of MCAPI being developed for Linux?
OpenMCAPI, created by Mentor Graphics, is an open source implementation of the MCAPI standard. Most of this code is provided under a BSD license, except for the Linux kernel drivers, which are dual GPL/BSD licensed.
Do you have a porting guide for MCAPI?
A porting guide for MCAPI on SRIO is under development. MCAPI vendors may provide porting guide for their implementations.
What about MCAPI performance?
While the MCAPI specification is designed to offer a low latency solution, the actual performance metrics will depend on the individual solution; these metrics should be provided by the vendor. There are currently no official benchmarks for MCAPI, although we have done performance evaluation comparing to sockets and it is considerably faster. Performance depends on the underlying transport driver.
Will MCAPI 3.0 have safety critical support?
Yes. This will be a combination of describing safety critical aspects, for various parts of MCAPI, additional documentation on how to use existing functionality and addition of functionality
How is MCAPI more appropriate than CORBA for SMP/AMP?
MCAPI should be used by an application to send and receive data across nodes in a system. A programmer could use MCAPI to implement a remote service registration database as offered by CORBA as follows - a well-known endpoint would host the registration server, and nodes would register services/issue service requests through the registration server using connectionless MCAPI messages. All service registration and retrieval requests would be handled at the application layer of the registration server. MCAPI would not be providing any sort of services except the sending/receiving of messages to/from the registration server and corresponding nodes.
Wouldn't MCAPI be better as a protocol specification? Then we would know, for example, that the DSP OS and CPU/APE OS talk the same language?
MCAPI specification implements a universal API, much like a sockets API, not to dictate how data is moved across the nodes.
What is an advantage of MCAPI over other mechanism of inter-task communication such as POSIX mechanisms, for example?
MCAPI spans beyond a single OS instantiation.
What is the difference between MCAPI and StreamIt or Cilk++ model for doing multicore programming?
MCAPI is a message based communication framework defined for closely distributed computing. StreamIt is a programming language and compilation infrastructure for streaming applications. Cilk++ is an extension for C++ for multicore programming.
What are main differences between MCAPI 1.0 and 2.0?
Addition of domains (for routing purposes), new functionality and improved consistency in the API and status codes.
Isn't MCAPI just a new IPC protocol like LINX, MPI or others?
The MCAPI working group extensively reviewed existing APIs with the hope of finding one that met our requirements for multicore. MCAPI was defined for closely distributed computing, to be simple yet provide sufficient functionality and allow for lightweight implementations and more complex functionality to be layered on top of MCAPI.
If we already implement an IPC, do we need to replace it to utilize MCAPI?
To what degree depends on how you plan to use MCAPI. In fact MCAPI could be a wrapper for an existing IPC, or, operate underneath an existing API.
How does this compare to OpenMP or other parallel APIs.
A: MCAPI provides messaging applicable to AMP and SMP. OpenMP provides language extensions, targeting SMP environments.
How does MCAPI compare to MPI?
Both provide message passing. MCAPI targets closely distributed computing whereas MPI targets more widely distributed computing. MCAPI is simpler allowing for lightweight implementations, whereas MPI provides a richer set of functionality. MCAPI and MPI were designed to address different communications needs and advantages could be realized by using both in a system with widely distributed topology with multicore subsystems.
Is MCAPI similar to ARINC653 Supplement?
MCAPI provides data communication services only. It does not provide any sort of partitioning services.
What is MCAPI 3.0 support for reduction of _i functionality or addition of blocking equivalents?
This is for example to accommodate bare metal implementations, where you may have a single thread and non-blocking functionality which adds complexity. In such an environment non-blocking functionality can be achieved with timeout=0, i.e. polling.
What is the new feature in MCAPI 3.0 for MCAPI endpoint type attribute, e.g. multicast, HW accelerator?
Currently MCAPI has "regular" endpoints, which are socket-like communication-termination points. Additional endpoint type will allow us to substantially expand the capabilities and coverage of MCAPI, e.g. in IoT.
Are any reduction operations (such as those found in MPI) supported through MCAPI?
MCAPI is a simple communications API used to send/receive data across nodes. Build a reduce type operation on top of the message send routine to create a similar service.
Does MCAPI include group communication or only point to point communication?
MCAPI does not have native support for multicast communications or broadcast groups, but the user could build this functionality on top of the connectionless messaging system. In other words, the user could implement multicast or broadcast functionality at the application layer using connectionless messages.
Does MCAPI only provide for a standard set of calls or is there other overhead in the API that reduces throughput between endpoints? Is throughput dependent on layers under MCAPI only?
MCAPI is itself only a set of API services. There are two other components in a complete MCAPI implementation; the MCAPI stack and the underlying transport driver. These two components are implementation specific. The MCAPI stack interfaces with the API routines to create endpoints, get endpoints, pass data, etc. The underlying transport driver interfaces with the MCAPI stack to send and receive data. The transport medium is not part of the MCAPI specification and could be anything from a shared memory driver to a TCP/IP socket used to send and receive data across Ethernet.
What about shared memory required for MCAPI?
An implementation can use a shared memory driver for sending and receiving data across nodes in a system, but shared memory is not required for using MCAPI. The MCAPI specification does not dictate the type of transport mechanism to be used to send and receive data.
Are there any arbitration issues?
Handling arbitration is up to the underlying implementation.
Are any error detection/recovery mechanisms available?
Error detection and recovery is not part of the MCAPI specification, but nothing prevents a programmer from building this into the implementation. MCAPI is designed for data communications only.
Using MCAPI, is it possible to recover from a node failure in SMP/AMP real-time cluster?
MCAPI provides routines to initialize and shut down the MCAPI module on a node. If the node fails gracefully, it can shut down MCAPI, which could close all open endpoints and perform other implementation specific functionality. The shut down functionality is implementation specific. The programmer could configure the node to save the current state to memory and restore that state upon re-initialization, or could close all endpoints and start fresh on re-initialization.
What about hooks for logging?
The MCAPI specification provides a list of error codes to be used for logging purposes, but no specific logging functionality is provided. Furthermore, an API routine is provided to convert a numerical error code to an ASCII string for easier readability. An individual implementation may provide additional logging services to the user.
What hooks are there for debugging purposes?
The MCAPI specification doesn't detail any hooks for debugging purposes.
In a system in which IP packets are sent from one core to another, can the message passing interface work?
MCAPI is suitable for this case and could use the TCP/IP stack as the underlying transport layer for transmitting/receiving IP packets across cores.
Are both sampling and queuing message behavior supported?
This is up to the underlying implementation; both polling and queuing can be supported.
Applications written to use MCAPI but written on cores with different endianess would not necessarily be endian-compatible. How are endian and integer size differences addressed?
The MCAPI specification does not provide a mechanism for endian conversion. Each implementation should document how endian conversion is handled for that solution.
How can MCAPI be used in SMP environment?
MCAPI has little relevance in SMP, as the operating system should seamlessly handle communication across threads on the distributed cores; however, if the system is divided to use SMP on some cores and AMP on other cores, the SMP cores could communicate with the AMP cores using MCAPI. In an SMP environment Poly-Messenger/MCAPI nodes are the equivalent of a process and can be scheduled by the SMP OS like any other process. When an application is MCAPI enabled, the application or part of the application could also be run in AMP mode.
Can multiple channels on multiple cores communication be handled and how?
MCAPI was designed to allow simultaneous channel connections across nodes. Any constraints in this ability would be at the underlying transport layer. MCAPI can handle multiple channels, but the underlying transport layer may not. This should be clearly documented by the individual implementation.
Since hierarchy is used, how are lead domain and lead node managed?
The programmer must manage the notion of lead domain and/or node at design time.
Why does MCAPI support master/slave and not client server?
MCAPI is a client / server model. A channel endpoint is used to either send or receive data. A message endpoint can be used to send and receive data.
For message sending, does the MCAPI specification deal with message conflicts? It includes a priority variable, but what about multiple messages being sent to a node with the same priority?
Priority can be specified on a per-message basis when using MCAPI connectionless messages or for a connection when using packet or scalar channels. The individual implementation determines how the transport layer handles the prioritization of incoming packets. Priority conflicts are handled by the underlying transport driver and should be documented by the vendor.
Is out-of-order memory writes handled by MCAPI?
This issue is specific to the individual transport layer driver.
How is MCAPI different when going over Ethernet compared to socket communication between two processes on two cores in AMP configuration?
It would be beneficial to use the MCAPI services to limit changes required to the application if the underlying transport layer changes in a future revision of the software or if the same application layer software is used on different devices with varying transport layers.
Can domains be hierarchical, other domains under domains?
MCAPI domains do not offer that level of granularity, as each domain exists at the same level in the topology. However, each domain does not have to be reachable from all other domains.
Can MCAPI be used to implement locking mechanisms across AMP shared memory areas?
MCAPI can be used to send / receive any data across nodes in a system. This messaging architecture could be used to coordinate locking across nodes, but MCAPI does not inherently provide such services.
Is it possible to send messages in non-blocking mode or blocking with timeout?
The MCAPI specification provides symmetric blocking and non-blocking routines for connectionless message and packet channel send / receive routines. The user can also configure a timeout for blocking calls.
Does MCAPI have support for security, having to deal with multiple OSes?
MCAPI does not handle any sort of operating system partitioning.
Resource Management
How will hardware accelerators interact with embedded processors using MRAPI? Since an API is a library of C/C++ functions, it is not clear how an API can be used with a hardware accelerator, which can be very application specific.
The API can be implemented on top of a hardware accelerator. For example, an SoC may have hardware acceleration for mutexes, in which case an MRAPI implementation could utilize that hardware accelerator without the programmer needing to know how to interact with it directly.
How does this API ask for HW accelerators if these accelerators are actually powered off because of inactivity?
In such a scenario the application would determine that there was no acceleration available and would have to find an alternative means to perform its work, perhaps by executing code on the CPU.
Does MRAPI rely upon a 'local' resource manager? That is, does MRAPI store state and need a way to allocate state storage?
It is up to the MRAPI implementation as to how resources are managed. Our simple initial implementation stores state in shared memory protected with a semaphore.
I saw a statement that other solutions are too heavyweight because they target distributed systems. Does it mean that your goal is not to target the distributed system? What happens when we have a multi-chip multi-core? Isn't this the same distribute
MRAPI targets cores on a chip, and chips on a board. MRAPI is not intended to scale beyond that scope.
Is it possible to hide the differences between local and remote memory by just different properties of these memories? Remote memory will have higher latency, some access restrictions, etc.
The working group has considered the possibility of allowing the
In many HW systems, transitions between low power (or no power) and fully working conditions are extremely frequent. In such systems some state change callbacks will become a nightmare. How does MRAPI handle the situation?
In the situation where the application does not want to be disturbed by frequent callbacks, then it would be better for the application to periodically poll MRAPI at a time of its own choosing. This is certainly possible with MRAPI.
Are there any plans to include trigger APIs? For example, invoke callback when a particular resource hits some pre-defined conditions/threshold?
Currently there are no threshold-related callbacks other than counter wrap-arounds. MRAPI may consider this for a future version.
Primitives - did you consider including read-copy-update (RCUs)?
The MRAPI working group did consider read copy update locks. After discussion with some of the original creators of the RCU code for Linux, we determined that for now there is not sufficient evidence that a high performance, user-level implementation of RCU was feasible. We intend to monitor developments as we are aware that it is an active area of research.
These primitives included in MRAPI are necessary, but seem to be insufficient. Shouldn
MRAPI is intended to provide some of the primitives that could be used for creating a higher level resource manager. However, it is also intended to be useful for application level programmers to write multicore code, and for this reason it was kept minimal and orthogonal to other Multicore Association APIs. The working group believes that a full-featured resource manager would require all of the Multicore Association APIs, e.g., MCAPI, MRAPI, and MTAPI.
Are any companies currently incorporating or have plans to incorporate MRAPI in their products. If so, can you name the products?
At this time there have been no public announcements. There is at least one university research project that is looking at MRAPI for heterogeneous multicore computing. We expect more activities to emerge soon.
Multicore Task Management
What is the difference between Frescor (FRSH-Kernel) and MTAPI?
The Frescor project seems to address a more complex infrastructure and higher abstraction levels. MTAPI could possibly be used as part of that. Timing measurements and predictions, for example, are not part of MTAPI. Fault detection and resource protection are also out of scope of MTAPI.
How difficult is it for an RTOS vendor to provide support for OpenAMP?
In other words, is it a handful of porting layer files that must have RTOS specific stuff? Are there porting examples for Nucleus, FreeRTOS, or uC/OS? In general it is easy, and today there is an example for the Bare Metal in the OpenAMP GIT tree. That GIT tree has examples from Xilinx and possibly NXP on pre-integrated support for FreeRTOS. Xilinx and Mentor engineers agreed to create a separate well-documented OS abstraction layer to make it easier for software vendors going forward. As for Linux vendors, the effort is even smaller.
Is OpenAMP applicable for a system with separate chips with different cores?
There are 2 specific aspects that make the OpenAMP framework not efficient when using it outside of the single SoC.
Are OpenAMP and MTAPI complementary?
OpenAMP and MTAPI indeed complement each other, and they are addressing different problems. While OpenAMP addresses heterogeneity across OSs, MTAPI mainly targets (but is not limited to) systems with a single OS. If you don
What is the difference between the open source version and commercial version of OpenAMP?
The open source will cover some use cases that contributors put in effort behind. Commercial implementations will cover other use cases, including situations where you need a safety certification, when you are using a commercial (or possible an open source) hypervisor as the separation technology. Commercial solutions will also bring other values such as tools and support.
What is the difference between the Open Source community discussions and the MCA Working Group?
The Open Source community is focusing on implementation issues for the open source project, including new contributions. It will run like typical open source projects where most of the discussions are over the mailing list. The MCA Working Group will focus on standardizing the APIs needed both for the Open Source project as well as for proprietary implementations. There are in essence two sets of APIs: one "north bound" which are the APIs used by an application developer, and one "south bound" that will be used when porting to new HW and new operating systems. The Working Group will also discuss future direction of OpenAMP, including things like OS/HW abstraction, power management, device negotiation, etc.
You can have Linux and real-time Linux running on the same board executing on different cores. Does this mean that the use of Linux and an RTOS will be less common?
In use cases where very powerful cores are used for wired/wireless devices, Linux can run on all of them. The comms market, in general, has almost stopped using RTOSs, partly because they do not really need real-time. OpenAMP is more targeted towards devices that use the combination of more powerful cores (e.g. ARM Cortex A class) in combination with less powerful cores (e.g. ARM R or M class) that do not have an MMU. Without an MMU on these cores, and since the typical use case is much more real-time, you simply cannot run Linux on those cores. Xilinx, Freescale/NXP, TI have a bunch of these heterogeneous devices, used in various markets such as industrial, automotive, A&D.
Why is the SHIM working group using an XML schema to describe the multicore and manycore architectures and devices?
We have selected to use an XML schema because you can use the technology called XML data binding. Simply, it allows you to generate a class library for handling the SHIM XML data as data objects, not as XML elements and attributes. For example you can create a C++ or Java object called MasterComponent from a SHIM XML and access the attributes of the MasterComponent element just like you reference/retrieve a member variable of the C++/Java object. There are many popular open source implementations of XML data binding tools. Without the technology, you can still access the SHIM XML via legacy XML libraries of SAX/DOM, which essentially you read the XML as a file and iterate over each XML element and attribute. This is quite tedious programming and moreover your code becomes dependent on the given XML structure and will not be portable if it changes. With the XML data binding, when we update our SHIM spec, chances are the legacy tools code will still operate as is.
How does the OpenMPI HWloc compare to SHIM?
HWloc is similar partly where it deals with the static chip IP organization. However, there are some major differences. One of the major differences seems to be that HWloc depends on information provided by the OS through its interfaces at runtime, and providing that information through the standard API defined by hwloc. SHIM is intended to be used primarily without running the system - its information is used to construct the OS configuration, by which itself is used to create the information hwloc obtains through the OS interfaces. So it does not focus on the standard description of hardware from software perspective, but standardizing the run-time API for retrieving the hw topology. Unlike SHIM, HWloc doesn't appear to handle hardware performance metrics information. The hwloc seems to focus on the hw topology so that the application using the hwloc library can use the provided information to bound a thread/process to a particular core, for example. This is indeed one possible use-case of SHIM.xml but instead we are focusing on tool use-cases, such as performance estimation, system configuration, and hardware modeling. The hwloc seems to have the ability to describe a virtual hardware by using commands or texts, but the capability seems limited.
What is the difference between SHIM and IP-XACT?
IP-XACT is basically a 'design' language, primarily focusing on a description of how hardware IP components are electronically tied together. On the other hand, SHIM is a 'descriptive' language, primarily focusing on only the hardware property descriptions that matter to the software development tools. Hence, SHIM does not describe the type of interconnect or bus in any direct way. However, it does describe the master/slave IP components and slave components in a hierarchical manner, but there are no specifics regarding how these are connected together (i.e. whether it is a traditional bus, a cross-bar, or NoC). In SHIM, the IP components are listed mostly for describing memory access properties such as latency, any master-to-master communication like FIFO register, and also for basic processor properties such as clock, instruction set (ABI), cache size and type - which all matters to software tools to estimate the configuration
How does SHIM support modeling with vector, VLIW, or custom instructions?
In particular, how does SHIM support modeling performance metric of a processor with: - Vector (i.e. SIMD) instruction set - VLIW or superscalar architecture; and how does it model restrictions on which units can be run in parallel - Custom instructions that don't map to LLVM IR, but rather use intrinsics With SHIM 1.0, the way to support instructions that cannot be directly expressed with LLVM IR is to extend the CommonInstructionSet, an XML element defined in SHIM XML schema which contain standard LLVM-IR instructions by default. Concurrent execution of instructions are not directly modeled but indirectly expressed in the best/worst/typical cycles of each LLVM-IR instruction. The triplet exists for both latency and pitch. Using them, the typical probabilistic distribution of the processor cycles are expressed in effect. In SHIM2.0, we are planning to introduce options to more directly express such architectures, by introducing Functional Unit description to group the instructions, for example.