xtensa Locking When Emulating Xtensa LX Multi-Core on a Xilinx FPGA By community.cadence.com Published On :: Mon, 30 Sep 2024 16:00:00 GMT Today's high-performance computing systems often require the designer to instantiate multiple CPU or DSP cores in their subsystem. However, the performance gained by using multiple CPUs comes with additional programming complexity, especially when accessing shared memory data structures and hardware peripherals. CPU cores need to access shared data in an atomic fashion in a multi-core environment. Locking is the most basic requirement for data sharing. A core takes the lock, accesses the shared data structure, and releases the lock. While one core has the lock, other cores are disallowed from accessing the same data structure. Typically, locking is implemented using an atomic read-modify-write bus transaction on a variable allocated in an uncached memory. This blog shares the AXI4 locking mechanism when implementing an Xtensa LX-based multi-core system on a Xilinx FPGA platform. It uses a dual-core design mapped to a KC705 platform as an example. Exclusive Access to Accomplish Locking The Xtensa AXI4 manager provides atomic access using the AXI4 atomic access mechanism. While Xtensa's AXI manager interface generates an exclusive transaction, the subordinate's interface is also expected to support exclusive access, i.e., AXI monitoring. Xilinx BRAM controller's AXI subordinate interface does not support exclusive access, i.e., AXI monitoring: AXI Feature Adoption in Xilinx FPGAs. Leveraging Xtensa AXI4 Subordinate Exclusive Access The Xtensa LX AXI subordinate interface supports exclusive access. One approach is to utilize this support and allocate locks in one of the core's local data memories. Ensure that the number of external exclusive managers is configured, typically to the number of cores (Figure 1). Figure 1 Note that the Xtensa NX AXI subordinate interface does not support exclusive access. For an Xtensa NX design, shared memory with AXI monitoring is required. In Figure 2, the AXI_crossbar#2 (block in green) routes core#0's manager AXI access (blue connection) to both core's local memories. Core#1's manager AXI (yellow connection) can also access both core's local memories. Locks can be allocated in either core's local data memory. In-Bound Access on Subordinate Interface On inbound access, the Xtensa AXI subordinate interface expects a local memory address, i.e., an external entity needs to present the same address as the core would use to access local memory in its 4GB address space. AXI address remap IP (block in pink) translates the AXI system address to each core's local address. For example, assuming locks are allocated in core#0's local memory, core#1 generates an AXI exclusive to access a lock allocated in core#0's local memory (yellow connection). AXI_crossbar#2 forwards transaction to M03_AXI port (green connection). AXI_address_remap#1 translates the AXI system address to the local memory address before presenting it to core#0's AXI subordinate interface (pink connection). It is possible to configure cores with disjoint local data memory addresses and avoid the need for an address remap IP block. But then it will be a heterogeneous multi-core design with a multi-image build. An address remap IP is required to keep things simple, i.e., a homogeneous multi-core with a single image build. A single image uses a single memory map. Therefore, both cores must have the same view of a lock, i.e., the lock's AXI bus address must be the same for both. Figure 2 AXI ID Width Note Xtensa AXI manager interface ID width=4 bits. Xtensa's AXI subordinate interface ID width=12 bits. So, you must configure AXI crossbar#2 and AXI address remap AXI ID width higher than 4. AXI IDs on a manager port are not globally defined; thus, an AXI crossbar with multiple manager ports will internally prefix the manager port index to the ID and provide this concatenated ID to the subordinate device. On return of the transaction to its manager port of origin, this ID prefix will be used to locate the manager port, and the prefix will be truncated. Therefore, the subordinate port ID is wider in bits than the manager port ID. Figure 3 shows the Xilinx crossbar IP AXI ID width configuration. Figure 3 Software Tools Support Cadence tools provide a way to place locks at a specific location. For more details, please refer to Cadence's Linker Support Packages (LSP) Reference Manual for Xtensa SDK. .xtos.lock(green) resides in core#0's local memory and holds user-defined and C library locks. The lock segment memory attribute is defined as shared inner (cyan) so that L32EX and S32EX instructions generate an exclusive transaction on an AXI bus. See Figure 4. The stack and per-core Xtos and C library contexts are allocated in local data memory (yellow). …………..LSP memory map………….BEGIN dram00x40000000: dataRam : dram0 : 0x8000 : writable ; dram0_0 : C : 0x40000400 - 0x40007fff : STACK : .dram0.rodata .clib.percpu.data .rtos.percpu.data .dram0.data .clib.percpu.bss .rtos.percpu.bss .dram0.bss;END dram0…………………BEGIN sysViewDataRam00xA0100000: system : sysViewDataRam0 : 0x8000 : writable, uncached, shared_inner; lockRam_0 : C : 0xA0100000 - 0xA01003ff : .xtos.lock;END sysViewDataRam0………….. Figure 4 Please visit the Cadence support site for more information on emulating Xtensa cores on FPGAs. Full Article AXI Tensilica Xtensa FPGA
xtensa xtensa download By community.cadence.com Published On :: Thu, 23 Sep 2021 16:12:45 GMT I want to download Xtensa C/C++ Compiler (XCC) . I dont know where to download. Please help me. Full Article
xtensa Xtensa compiler issue By community.cadence.com Published On :: Thu, 01 Dec 2022 09:31:48 GMT Hi I have a Xtensa compiler issue that the compilation for switch case would be optimized in some patterns and leads to unexpected result. I cross-checked the assembly code and found that such compiler optimization seems to be similar to the tree-switch-conversion feature in GCC compiler https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Optimize-Options.html-ftree-switch-conversion Perform conversion of simple initializations in a switch to initializations from a scalar array. This flag is enabled by default at -O2 and highe Unfortunately I don't find any similar compiler option(like -fno-tree-switch-conversion) in Xtensa compiler(XCC) to enable/disable such feature and such feature seems like enabled in XCC by default even if I'm using -O0 for the least optimization. I'm wondering if there's any possible solution to permanently disable such feature in XCC? PS: The release version of XCC compiler I'm using is RD-2012.5 Thanks! Full Article