Size does matter
Thursday, June 30th, 2005When I was hacking on the code that I would eventually use as the basis for TurboHex, I experimented with C++ to see what kind of code would be generated. The results were pretty shocking - it is very competitive with assembly, especially if you stay away from the standard library.
TurboHex is coded in structured C++. This works okay. I get the type safety, without a lot of side-effect based programming. There really isn’t much OOP going on, but as I’ve said before, OOP programming is hard.
Structured code is also friendly to Win32. Combining OOP and Win32 is like hammering a screw. Its not terribly painful, but the results aren’t perfect. MFC is avoided. Even Microsoft doesn’t use MFC (MFC is dead anyway. Use .NET if you want a simple environment to program your business logic). I’ve dabbled in ATL, but see the comments below on the results of using templates.
In the first version, the single threaded standard library was used, but usually Win32 calls were used when possible. STL was avoided.
Early builds had a working hex editor in about 40k. By the time it was complete, it was around 122k, which included high-quality icons, and embedded manifests. To reduce the size, I used an EXE trimmer. The trimmer didn’t do compression, but eliminated unnecessary data and padding.
It was important not to use compression. EXE compression works by decompressing a copy of the EXE in memory. The problem with this is that this memory must now have a backing store in the page file. So during page swapping, the decompressed pages need to be written out to the page file instead of just discarded like a normal EXE. I could reduce the paging even further by rebinding the APIs during installation, but this isn’t currently done.
Now enter 2005.
Things are changing in the code. I’ve traditionally be very strict about writing conforming C++. But there is a movement towards more Modern C++. Modern C++ does not use arrays or pointer arithmetic, and uses rich abstract data types, exceptions, and generic code (templates). It is more safe, and it is a richer programming experience.
Recently, the TurboHex code base is becoming “safer.” Previous versions were very cut, and tuned for speed. Access to arrays, smart string copying, and other tricks made for a very fast, but very unsafe environment. Newer versions use StrSafe, run time checks, and buffer overrun detection. I’ve even switched to the multithreaded library
Switching to the multithreaded library was nice, and I could do things like write COM classes in C++ instead of C. I’m not sure why the compiler had this restriction, but writing v-tables in C is not fun.
One “modern” decision I made early on, was to use throwing new. Exceptions are otherwise unused in TurboHex, but I knew the future was coming, and I didn’t want to get caught with a lot of legacy code that I would have to convert.
Of all of the constructs in Modern C++, templates probably scare me the most. But templates are about ready for prime time. There were just so many reasons not to use them in the past:
-Older compilers could not debug into template code.
-Templates tend to have huge symbol names, so big that many compilers couldn’t cope.
-Throwing new often wasn’t the default in compilers.
-Old STL versions weren’t compatible with each other, or with the actual standard.
-For that matter, most compilers were so poor about implementing the standard that they couldn’t compile modern libraries such as Boost or Loki.
These problems are mostly corrected by the two major compilers, GCC and Visual C++. But two problems remain 1) Templates explode the code (this goes double for STL), and 2) Template programming is hard.
Template code bloat is a huge problem, enough so that I won’t be using it in TurboHex. Adding one STL list to the code increased my code by 30k. For a program that is 120k, that’s a big issue. Future projects will not be concerned about code size, so I won’t blink twice at this.
I’ll blink at the compile time though. Template code is expensive to process, so for a large project, this is an issue. Precompiled headers help, but now you have the problem that changing one interface requires you to rebuild the world. Export templates were supposed to help, but the version that made it into the standard sucks so bad as to be useless.
That still leaves template programming being hard. It is hard, especially if you learn about type lists, policies, and other topics. Most programmers don’t know about Koenig Lookup, syntax issues like “.template“, and why “this->” is important. It’s tricky stuff.
However, the benefits are huge, especially once your library starts growing. I’m anxious to see this show up in a major way in my future projects.