As a good real-world benchmark, I propose you to try bzip2.
It is a quite small program, easy to recompile. At runtime, it is
I think the trouble is exactly the fact it's small -- both 68000 and 68060 loops will fit into the cache, there's small place for 68000 vs 68020 instruction differences etc... GCC is perfect because is huge, it uses a lot of loops, cases, jumps, subroutines so the generated code really matters. But I'll try that bzip anyway, maybe I'm wrong.