DBT directly translates and executes binary programs, enabling a compatible system by virtualizing one machine (guest) on another (host). However, the memory virtualization of guest brings in a great overhead, due to effort it takes to translate GVA into HPA (Host Physical Address). For QEMU, a DBT with efficient memory virtualization mechanism, more than 60% of the translated code is used to virtualize memory, thus leading to a low performance of guest. In this paper, we employ the Co-Design methodology to optimize memory access performance of guest. This optimization is primarily focused on two aspects. First, hardware extensions are designed and implemented to conduct GVA to HPA translation directly. Second, we modify QEMU to cooperate with hardware to reduce translated code. By this means, the cost of memory virtualization is completely eliminated, resulting in a significant enhancement of the performance of the Loongson binary translation system. The experimental data implies that the performance of guest has been dramatically improved by 100 times for peak performance and 19.12% for average performance over the previous system.
QEMU可以在多种主机（x86, PowerPC, ARM, Sparc Alpha and MIPS）上仿真多种客户机（x86, PowerPC, ARM and Sparc）。通过完备的系统仿真，QEMU支持在虚拟机中运行未经修改的操作系统；通过Linux用户模式仿真，还支持在不同CPU上运行Linux软件。