In embedded SoC design, memory hierarchies are playing increasingly important roles for system performances. There is a significant latency gap between internal and external memory accesses. The external memory access might downgrade the performance of embedded systems. Application developers must explicitly handle data transfer between external and internal memories. That is a burden for programmers. In this paper, we propose a software cache API to help programmers to ease this problem. The proposed API includes pointwise element access and block version of access to software cache. We also give a detailed description for design and implementation of software cache API. As a case study, the software cache API is implemented on PAC DSP, a high performance DSP aiming for multi-media applications. We evaluate the implementation with UTDSP benchmark suite. The experiment results show that the proposed software cache can efficiently reduce the external memory access times.