IEEE 14th International Conference on High Performance Computing and Communications (HPCC) / IEEE 9th International Conference on Embedded Software and Systems (ICESS), Liverpool, ENGLAND, JUN 25-27, 2012
In this paper, we address the register file design with Single Instruction Multiple Data (SIMD) for multimedia processing applications. In a 32-bit processor, for one data unit of 8-bit in width, one SIMD instruction can operate on four units at a time and thus reach data parallelism of four. The data units are regarded as subwords in SIMD processing. However, performance of SIMD is often restricted by ill subword permutation in register file. Therefore, we present a architecture of register file called Vector Register File (VRF) to improve the subwords permutation latency. Consequently, heavy data traffics between memory and register file can be avoided. A proprietary DSP core (codename Starfish) with simulation tool chain has been developed. The simulation and the debugging flow on the proprietary DSP core to evaluate the performance are presented. Several test benches, such as matrix transposition, deblocking filter, and discrete cosine transform (DCT) based on H.264/AVC, are applied for performance evaluation. A pipeline data hazard detection with register bypassing scheme is explored for VRF to further improve the pipeline efficiency. The simulation results show that, in average, we can improve cycle count by 29.87% and code size by 29.223%.