Windows DLL Relocation and Perl

Recently a nasty bug was discovered in the way Perl builds DLL files on Windows when compiling with GCC. If an application needed to load two DLL files that overlapped their memory space, the usual OS relocation process would fail because the DLLs contained duplicate relocation tables. This bug was particularly tricky to track down because it generally only affected large applications that embed Perl or ones that load many XS modules. Thanks to Daniel Dragan and Jan Dubois for locating the problem and coming up with a fix.

Background

Windows DLLs are containers for shared program code and other data designed to be dynamically loaded into a running process. They contain lists of symbols (functions, variables, and other data) and offset information so these symbols can be correctly located in memory. A DLL also includes a base memory address at which to load all of this data. This value is only a recommendation, and if several DLLs both want to be loaded at the same base address, or if there is any overlap, the OS will choose a new address for one of them and rewrite all of that DLL’s symbol offset information. This process is called relocation. During the linking stage, compilers will try to avoid this problem by choosing a default base address based on a hash of the file name, which usually does a good job of ensuring there are no overlaps. Perl however, being a dynamic language, might need to load many more DLLs than a typical Windows application.

Perl DLLs

Perl itself is contained within a DLL, perl5xx.dll, and each XS-based module will load its C-based functions from a DLL file. In a large Perl application, the number of DLLs that need to be loaded can easily number in the double digits. In addition, it is also possible to embed Perl inside of another application. The more DLLs that are loaded together, the more likely it is that a memory overlap will occur and force relocation to take place.

The Problem

Perl’s core Windows build process as well as the two competing modules responsible for building XS modules, ExtUtils::MakeMaker and ExtUtils::CBuilder, all invoke the GCC linker with the --base-file switch when creating DLLs. This switch causes GCC to add relocation information to the DLL. Unfortunately, it is possible for this relocation information to have already been included. If a program has defined any symbols using the special decorator __declspec(dllexport), the compiler will have already included relocation fixes, leaving the linker to add additional duplicate entries. This results in a DLL that will work fine most of the time, but if it ever needs to be relocated upon loading due to a conflict, Windows will happily follow both sets of relocation instructions. This will end up leaving the program with invalid data about the symbols in the DLL. Needless to say, this will crash with the error message Invalid access to memory location.

The Perl core defines several symbols with __declspec(dllexport), and when combined with --base-file this caused perl5xx.dll to receive duplicate relocation data. Since this DLL is loaded before any others, it was unlikely to cause problems except in the case where Perl is embedded into another application. In addition, if an XS module happened to declare any dllexport symbols the same problem would occur.

The Solution

The fix for this problem is to stop using --base-file in both the Perl core Makefile and both ExtUtils modules. This was likely added many years ago for good reason, but no longer appears to be necessary. Also, to make sure all XS modules are compiled with relocation information, we add __declspec(dllexport) to the XS_EXTERNAL macro.

#if (defined(__CYGWIN__) || defined(WIN32)) && defined(USE_DYNAMIC_LOADING)
#  define XS_EXTERNAL(name) __declspec(dllexport) XSPROTO(name)
#  define XS_INTERNAL(name) STATIC XSPROTO(name)
#endif

Example

Here is an example using the relatively simple Digest::MD5 module. Using the objdump tool from GCC, we can inspect a DLL file’s relocation information. When compiled using the --base-file switch, we see that the relocation area contains duplicate entries. Note that much of this output has been omitted for clarity, and I suggest running the command yourself to see the full output.

> ...
  dlltool --def MD5.def --output-exp dll.exp
  g++ -o blib\arch\auto\Digest\MD5\MD5.dll -Wl,--base-file -Wl,dll.base -mdll [...] dll.exp
  dlltool --def MD5.def --base-file dll.base --output-exp dll.exp
  g++ -o blib\arch\auto\Digest\MD5\MD5.dll -mdll [...] dll.exp
  ...

> objdump -p blib\arch\auto\Digest\MD5\MD5.dll

PE File Base Relocations (interpreted .reloc section contents)

Virtual Address: 00001000 Chunk size 88 (0x58) Number of fixups 40
Virtual Address: 00002000 Chunk size 156 (0x9c) Number of fixups 74
Virtual Address: 00003000 Chunk size 356 (0x164) Number of fixups 174
Virtual Address: 00004000 Chunk size 84 (0x54) Number of fixups 38
Virtual Address: 00005000 Chunk size 12 (0xc) Number of fixups 2
Virtual Address: 00006000 Chunk size 16 (0x10) Number of fixups 4
Virtual Address: 0000a000 Chunk size 16 (0x10) Number of fixups 4
Virtual Address: 0000b000 Chunk size 16 (0x10) Number of fixups 4
Virtual Address: 00001000 Chunk size 84 (0x54) Number of fixups 38
Virtual Address: 00002000 Chunk size 156 (0x9c) Number of fixups 74
Virtual Address: 00003000 Chunk size 356 (0x164) Number of fixups 174
Virtual Address: 00004000 Chunk size 84 (0x54) Number of fixups 38
Virtual Address: 00005000 Chunk size 12 (0xc) Number of fixups 2
Virtual Address: 00006000 Chunk size 16 (0x10) Number of fixups 4
Virtual Address: 0000a000 Chunk size 16 (0x10) Number of fixups 4
Virtual Address: 0000b000 Chunk size 16 (0x10) Number of fixups 4

If we remove the use of --base-file and simplify our compilation steps, we end up with a correct DLL containing only one set of relocations:

> ...
  dlltool --def MD5.def --output-exp dll.exp
  g++ -o blib\arch\auto\Digest\MD5\MD5.dll -mdll [...] dll.exp
  ...

> objdump -p blib\arch\auto\Digest\MD5\MD5.dll

PE File Base Relocations (interpreted .reloc section contents)

Virtual Address: 00001000 Chunk size 84 (0x54) Number of fixups 38
Virtual Address: 00002000 Chunk size 156 (0x9c) Number of fixups 74
Virtual Address: 00003000 Chunk size 356 (0x164) Number of fixups 174
Virtual Address: 00004000 Chunk size 84 (0x54) Number of fixups 38
Virtual Address: 00005000 Chunk size 12 (0xc) Number of fixups 2
Virtual Address: 00006000 Chunk size 16 (0x10) Number of fixups 4
Virtual Address: 0000a000 Chunk size 16 (0x10) Number of fixups 4
Virtual Address: 0000b000 Chunk size 16 (0x10) Number of fixups 4

Summary

This is one of those rare bugs that is unlikely to affect most people, but for those it does affect, it is a complete show-stopper.

ActivePerl 5.20.3 and 5.22.1 include fixes for this problem, and we have made sure none of the precompiled modules available through PPM are affected. We expect the patches to be integrated into upstream Perl soon.

More Info

https://rt.cpan.org/Public/Bug/Display.html?id=78395

https://rt.cpan.org/Public/Bug/Display.html?id=103782

http://www.transmissionzero.co.uk/computing/advanced-mingw-dll-topics/