Summary: In H.264/AVC, the concept of adapting the transform size to the block size of motion-compensated prediction residue has proven to be an important coding tool. This paper presents highly parallel joint circuit architecture for $8 \times 8$ and $4 \times 4$ adaptive block-size transforms in H.264/AVC. By decomposing the $8 \times 8$ transform to basic $4 \times 4$ transforms, a unified architecture is designed for both $8 \times 8$ and $4 \times 4$ transform and the transform data-path can be efficiently reused for six kinds of transforms. i.e., $8 \times 8$ forward, $8 \times 8$ inverse, $4 \times 4$ forward, $4 \times 4$ inverse, forward-Hadamard, inverse-Hadamard transforms. Linear shift mapping is applied on the memory buffer to support parallel access both in row and column directions which eliminates the need for a transpose circuit. For reusable and configurable transform data-path, a multiple-stage pipeline is designed to reduce the critical path length and increase throughput. The design is implemented under UMC 0.18 um technology at 200 MHz with 13.651 K logic gates, which can support 1,$920 \times 1$,088 30 fps H.264/AVC HDTV decoder.