We propose an efficient hardware architecture for the deblocking filter function in H.264/AVC. We use a novel memory organization that results in significant saving in filtering time. Our design includes two 4/spl times/4 register sets and one 160/spl times/32 two-port SRAM such that filtering and transpose are carried out simultaneously. Synthesis results show that our design is small (19k gates) and high performance (100 MHz @ 0.25 /spl mu/m). An AMBA-compliant interface is added to our design for SOC integration and FPGA prototyping. Experimental results show that our design works well with the reference software JM 7.3 and achieves significant speed up, even with a huge communication overhead between the CPU and the hardware accelerator.