12. Native Images
.NET assemblies are just-in-time compiled to native machine code. This occurs on a method-by-method basis just before the method is called. This compilation takes time, but it also means that the code is writeable, and so cannot be shared between processes and raises the memory usage of the machine. You can pre-JIT an assembly, that is, the the JIT compilation can be performed over an entire assembly so that the .NET assembly loads faster and there is a more efficient use of memory. Such a pre-JITted assembly contains native code and is known as a native image.
12.1 Loading an Assembly
When your code uses an assembly it indicates to the runtime the name of the assembly and the method to call. As you know, Fusion will try and locate that assembly, it may already be in memory, or it may be somewhere on disk or on the network. Fusion will locate and load the assembly. If the requested method has not been called before the runtime will just-in-time compile it, that is, it will go through the IL and compile it to native code. This native code will be cached in memory, so that next time the method is called the native code will be used.
Thus, the runtime is not an interpreter, because the JIT compilation of a method occurs just once for a particular instance of a process. However, note that JIT compilation is on a method-by-method basis, it does not JIT compile the entire assembly. However, the runtime does make intelligent decisions: if the method being compiled calls other methods the JIT compiler will determine the size of those methods and if they are small it may decided to inline them, that is, copy the IL from those called methods into the method being compiled. I explained earlier in this workshop that you can tell the runtime not to inline methods, however, this is only useful for debuggers.
JIT compilation takes time, however, it is not the only time consuming activity that occurs when an assembly is loaded: there is a lot of security work being performed. I will cover this work in detail in the security workshop, but here I will outline what happens. First, if the assembly is private and has a strong name, the runtime will perform strong name validation, that is, the runtime will create a hash of the assembly and then compare this with the value obtained by decrypting the strong name signature with the embedded public key. If the two hashes are not the same, the runtime will not load the assembly. If the assembly is installed in the GAC this step is omitted because it will have been performed when the assembly was first installed into the GAC and the runtime assumes that the GAC is a secure location so nothing can happen to the assembly since installation.
The runtime will now load the assembly in memory and perform PE file validation. Typically Trojans work by changing your code to call their code, but they could also alter internal tables in the PE file to exploit vulnerabilities in the operating system to allow their code to be called. The runtime will check that the PE file containing the assembly is valid, that is, it will check things like unmanaged resources and the table of imported DLLs. These checks ensure that PE file addresses are within the correct range.
Once the PE tables have been verified, the runtime will validate the metadata that is contained within the assembly. This metadata gives information about the types implemented in the assembly, but it also gives information about external types that are used. Metadata is stored in tables, similar to a relational database and an entry in one table may reference an entry in another table. The runtime checks to make sure that such references are valid, that is, the correct table is referenced and the index of the item is within the bounds of the table. For example, the table describing methods will have a Relative Virtual Address (RVA) of the position in the file that contains the method's IL. The runtime checks to make sure that these addresses are held within the part of the file where .NET code resides. The runtime will also inspect the metadata to determine if it refers to a valid .NET type, that is, the type follows the rules of the .NET type system.
At this point the runtime knows that the metadata is correct so it uses some of the metadata that it has validated: the minimum permissions requested by the assembly. First, the runtime determines the security permissions (if any) that the assembly is granted by the machine's security policy. That is, it gathers evidence about the assembly and uses the security policy to obtain the permissions (see the security workshop for more details) that the assembly will be granted. The security policy defines code groups which are collections of permissions. The policy maps evidence to code groups, and so an assembly will be granted zero or more permissions collated from all of the code groups that policy says the assembly is a member of. Permissions allow an assembly to perform some action, and the runtime library will check that an assembly has the necessary permissions before it will execute code requested by that assembly. An assembly can indicate the permissions that it will require to have before it can perform its work, and this will be stored as metadata. So once the runtime has determined the permissions that the assembly is granted (a permission set) it will then check that the required permissions are in the permission set. If not, then the assembly clearly cannot run, so the assembly will not be loaded.
Although the runtime knows that the metadata is correct, it does not know whether the IL for the methods defined in the assembly is correct. So the next action performed by the runtime is to validate and verify the IL in the assembly. To do this, the runtime walks though all the IL in every method, following every branch of the code. The runtime does not set up a stack, so no data will be processed, instead, it follows every code path and inspects the IL opcodes in each path. IL validation involves checking that the opcodes are valid, that is, it checks that the collection of bytes for each opcode is a valid sequence. During validation the runtime will also check jump opcodes to make sure that they jump within the method. Invalid IL is a symptom of assembly corruption, or of a broken compiler.
Next, the JIT compiler verifies the IL. This is not an exact science, because the JIT compiler has to verify that the code is performing safe operations. Although it is possible to determine when code is performing unsafe operations (for example, calling native code) it is impossible to be 100% sure that code is safe. The JIT compiler takes a conservative attitude and may fail to verify code that is safe. However, an assembly that has code that has not been verified can still be run as long as the assembly has been granted permission to execute unverified code. By default, code that has been installed on the computer has this permission. If an assembly is known to have unsafe code (for example, most code generated by the Managed C++ compiler) then the assembly can have metadata that indicates that this verification step should be skipped.
Now that you know about how assemblies are loaded, let's see how native images changes this.
12.2 Ngened Assemblies in .NET v1.1
In the previous section you can see that a lot of steps are performed before a method can be executed. If your application has lots of assemblies then these checks are performed for each assembly. These checks are important, because security is the most important aspect of your code. If your code is not secure then there really is no point in writing it.
In
version 1.0 and 1.1 of the runtime Microsoft provided a tool called
ngen.exe (Native Image Generator). This tool is used to perform JIT compilation
step on all methods in an assembly and save the result as a native image
file.
This image is stored in a location called the native image cache so that
the runtime can use it whenever the original IL assembly is called. The
idea of the native image is to replace the JIT compilation step, but it also has
some deeper ramifications. Although the native image is used as a replacement
for the IL assembly, the IL assembly must still exist. When an
assembly is loaded, the runtime will perform all the initialization steps as
outlined above, but when it comes to JIT compile methods in that assembly it
will first check to see if there is a native image in the cache and if so, this
will be loaded and the JIT compilation step will be omitted. The documentation
is not clear about at which point exactly that the runtime will load the
native image. After all, when the native image is created the metadata and IL
must be validated and the IL must be verified, and since the native image has
neither IL nor metadata (well, it actually has a
little bit of metadata) it means that
these steps cannot be performed on the native image. However, since these steps
will have been performed on the original assembly before the native image was
generated
and the native image is stored in a secure place on the hard disk, there does
not appear to be a reason for these steps to be performed on the original
assembly another time. So logically, the native image cache should be checked before the
metadata tables are validated. However, as I have mentioned, the documentation does not list whether
this is the case.
In addition, the IL assembly must be available on the machine in case the runtime finds that the native image is invalid for some reason. JIT compilation is dependent upon many factors, the runtime version, security policy and binding policy are a few. If any of these factors change the runtime will revert to the normal JIT compilation on a method-by-method basis. For example, a developer can provide link demands which indicate that a security check is performed at JIT compile time. Clearly, a link demand will be performed when the native image generator is run and this does not necessarily represent the situation when the native image assembly is run. Indeed, if the security policy changes when the assembly is run (and it is not a superset of the policy in force whenthe native image generator was run) then the runtime will ignore the native image assembly and instead it will load the IL image assembly as normal. Native images have a very small amount of metadata, but nothing for the types defined in the assembly, so if your code, or some other code uses reflection on your types metadata has to be available, and this means the IL assembly. (This has changed in .NET 3.0/2.0.)
JIT compiling methods as they are used has a less than obvious downside. In unmanaged DLLs exported functions are marked in the exports table and if the DLL is loaded at the library's preferred load address then the addresses stored in this table can be used by calling modules. If the DLL is loaded at a different address then the operating system has to change - or fix up - these addresses. This fix up operation takes time and, since these tables will change it means that they have to be writable memory pages. Read-only pages are sharable between processes so once a DLL has been loaded it means that other processes that use these pages will load quicker and the memory usage of the system is reduced. Writable pages are specific to a process, so if another process uses the same DLL it will have its own writeable pages, this increases the overall memory usage of the system.
The same general issues occur with .NET assemblies. Since .NET methods are JIT compiled it means that the generated code is created at run time, which means that it must be in writable, non-shareable memory pages. This increases the memory foot print of the process. When you create a native image the code that is generated will be loaded into read-only, sharable pages. Thus another bonus of creating a native image is to reduce the memory footprint.
Note that the native image generated is essentially the output of the JIT compiler and this output is cached for later use. However, even though this cache looks like a 'central repository' it is not a mechanism to share assemblies. That is the purpose of the GAC. The JIT compiler will compile both process and library methods, and the same is true about the native image generator. So native images can be generated from process assemblies as well as library assemblies, this is in contrast to the GAC which can only contain libraries.
Another problem with native images is that they cannot be shared across application domains. This means that if your application has more than one application domain then you cannot use native images. The most obvious example of an application with multiple application domains is the ASP.NET worker process: you cannot use native images in ASP.NET 1.1. (This has changed in .NET 3.0/2.0.)
Although the native image generator, ngen, was documented by Microsoft in version 1.0 and 1.1
of the runtime, they hardly encouraged people to use it. This was deliberate on
Microsoft's part. Microsoft created native images of the framework assemblies,
but since they controlled the policies that would apply to their assemblies, and
they controlled the issuing of service packs and hotfixes, the deficiencies of
native images, just outlined, does not affect their assembles. The problem, of
course, is that
you do not have such control over your assemblies, so Microsoft decided not to
encourage you to use the native image generator. In .NET 3.0/2.0, Microsoft are more
confident about native images and now they do encourage you to use them, or
rather, they encourage you to perform performance tests to see if native images
improve your application.
12.3 Contents of Native Images in .NET 1.1
When a native image is generated the ngen tool will place it in
the native image cache. Information like the IL and metadata have been stripped,
so there is no way that the runtime can perform validation and verification -
those checks have to be performed on the IL assembly. The runtime assumes that
the native image is the output of the JIT compiler, and this assumption can be
accepted if the native image is stored in a secure place so that rogue code
cannot alter the generated native code. The Fusion namespace extension in Windows Explorer will list native
images as if they are in the GAC, and this is partially correct. The native
image cache is in a folder under %windir%\assembly. But the images
are not necessarily shared like GAC libraries are.
Native images are created with ngen.exe. This tool will install and
uninstall assemblies. It can also show the native images that have been
generated. Try this:
This will show all the native images in the native image cache. You will find that most of the framework libraries will have native images.
|
.NET Version 3.0 The new version shows two groups of assemblies, Native Images and NGEN Roots. An NGEN Root is an assembly that uses other assemblies and this can be a process or a library. A Native Image is just a generic term, and it includes native image roots. |
Now take the library,
process and
key file from the previous pages. First,
make sure that the library is not in the GAC (gacutil -u lib) and
make sure that a configuration file does not exist in the folder. Now compile
the library with a strong name and compile the process that uses the library.
Run the application to confirm that it picks up the assembly from the
application folder. Next generate a native image.
Run the process. You'll see that the code base of the library is the same as before - the application folder. As I mentioned earlier, the native image cache is not a mechanism to share code. In this case the library is a private assembly irrespective of whether you have generated a native image. A native image can also be generated for the process, but in this case you should also generate the native images of the libraries it uses, if you don't, then those libraries will use JIT compilation and so you lose the effect of using native images. To try this out, delete the library native image and then generate the native image for the process and library:
ngen app.exe "lib, Version=1.0.0.0, Culture=neutral, PublicKeyToken=3bf941bb1f722efe"
It is important that you give the full name of the library so that the JIT compiler can determine if there are any publisher policy files in force, which may change the version of the library used by the process, and hence the image that is generated.
|
.NET Version 3.0 If you pass a assembly - a process or a library - to the new version of
ngen the tool will create a native image of this assembly (the root) and
all the libraries it uses. |
So where is the native image cache? Well, this is the so-called Zap cache
which you can enumerate with the code given in
Example 10.3. Also, you can get the absolute
address of this cache by calling
GetCachePath function, which is part of the unmanaged Fusion API. However, it is far simpler just to peak in the assembly
cache folder. Move to the this folder under %windir% (pushd %windir%\assembly),
and list the folder contents. You'll find a folder with a name something like
NativeImages1_v1.1.4322 (.NET version 3.0/2.0 has a folder called
NativeImages_v2.0.50727_32). Change to this folder and list its contents.
You'll find a list of the short names of the native image assemblies that have been
generated including the app and lib assemblies
that you just added. Change directory
to the lib folder. There you'll find another folder with a name that
is composed of the version, culture and public key token, similar to how the GAC
stores libraries. Change to this directory and here you'll find the native image
of the library.
|
.NET Version 3.0 .NET version 3.0/2.0 has a slightly different scheme. The main folder is called
NativeImages_v2.0.50727_32 and again there are folders with the short
names of the assembly native images that have been cached. Under each is a
folder with the name of a 16 byte hex number: a GUID that identifies
the native image.
Within that folder is the cached native image, but note that the runtime does
not use the original name. If the original file was lib.dll the
native image file is called lib.ni.dll. This means that if you
obtain the list of modules loaded by an application (for example, type
tasklist /m /fi "IMAGENAME eq app.exe" where app.exe is the
process built with .NET version 3.0/2.0) you'll be able to identify the native image files
used by the process by their name,
and this includes the framework libraries (for example mscorlib.ni.dll). |
Now run dumpbin on this assembly:
Take a look at the data directories (at the end of the Optional Header
Values). There you'll see that the COM
Descriptor Directory has a value. This means that the file is managed!
Write down the RVA of this item (on my machine this is
0x2008), and then write down the virtual and raw addresses of the .text section (on
my machine these are 0x2000 and 0x200), so that you
can convert RVA's to raw addresses.
We need to list the contents of this file,
ILDASM is an obvious choice, however, you'll find that this will only
list the manifest. Further, the advanced option (/adv)
allows you to view the various CLR headers in an assembly (the COR Header option on the View menu).
However, if you try this option on the native image file then you'll
find that ILDASM will hang. Instead, use dumpbin to
list the CLR header:
On my machine I get this:
File Type: DLL
clr Header:
48 cb
2.00 runtime version
2210 [ 43C] RVA [size] of MetaData Directory
6 flags
0 entry point token
0 [ 0] RVA [size] of Resources Directory
0 [ 0] RVA [size] of StrongNameSignature Directory
0 [ 0] RVA [size] of CodeManagerTable Directory
0 [ 0] RVA [size] of VTableFixups Directory
0 [ 0] RVA [size] of ExportAddressTableJumps Directory
Summary
2000 .data
2000 .reloc
2000 .text
This does not give you much information other than there is metadata in the
assembly. Now load the assembly in a hex viewer (like Visual Studio). Since
Windows Explorer uses the Fusion namespace extension you'll not be able to
use the Open File dialog in Visual Studio. The simplest way to change
this is to temporarily change the name of the desktop.ini file in %windir%\assembly:
attrib desktop.ini -s -h -r
rename desktop.ini desktop.ini.old
After you have loaded the file, undo the changes:
attrib desktop.ini +s +h +r
popd
The first
thing you'll need to do is convert the RVA of the metadata to a raw address. To do this, subtract
the virtual address of the .text section you recorded earlier from
the RVA and then add the raw address of the .text section. In my
case this gives 0x410. Move to this location and investigate the
values. Here is what I get:
|
0410 42 53 4a 42 01 00 01 00 00 00 00 00 0c 00 00 00 BSJB............ 0420 76 31 2e 31 2e 34 33 32 32 00 00 00 00 00 04 00 v1.1.4322....... 0430 60 00 00 00 78 00 00 00 23 7e 00 00 d8 00 00 00 `...x...#~...... 0440 18 00 00 00 23 53 74 72 69 6e 67 73 00 00 00 00 ....#Strings.... 0450 f0 00 00 00 10 00 00 00 23 47 55 49 44 00 00 00 ........#GUID... 0460 00 01 00 00 3c 03 00 00 23 42 6c 6f 62 00 00 00 ....<...#Blob... 0470 00 00 00 00 01 00 00 01 05 40 00 00 09 00 00 00 .........@...... 0480 00 fa 01 33 00 02 00 00 01 00 00 00 01 00 00 00 ...3............ 0490 01 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 ................ 04a0 01 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 ................ 04b0 01 00 01 00 0b 00 06 00 ac 00 04 80 00 00 01 00 ................ 04c0 00 00 00 00 00 00 01 00 00 00 01 00 0a 00 00 00 ................ 04d0 01 00 00 00 88 13 00 00 00 00 00 00 a3 00 0e 00 ................ 04e0 00 00 00 00 00 00 00 00 00 3c 4d 6f 64 75 6c 65 .........<Module 04f0 3e 00 6c 69 62 00 6d 73 63 6f 72 6c 69 62 00 00 >.lib.mscorlib.. 0500 3a 2e 71 a6 46 08 0b 4c 80 db 00 a6 8b 9c 18 a4 :.q.F..L........ |
You can use the .NET
ECMA spec to decode all of this. I won't go into too many
details, I will just identify the basics. Location 0x410 is the
start of the Metadata Root (ECMA Spec, Partition II, 23.2.1). It is
followed, at location 0x430, by an array of Stream Headers
(23.2.2), each header has the offset of the stream from the start of the
metadata root, its size, and then the name of the stream (a variable
length string): #~ has values for the metadata tables,
#Strings has values of the string name of types and members, #GUID
has associated GUIDs and #Blob has raw data used by the runtime.
(Another stream, not present here, is #US which has user strings, that
is, string literals in your code.) The #~ stream is at offset 0x60
and its size is just 0x78 bytes. This corresponds to a location of 0x470 (0x410
+ 0x60). The metadata stream is described by the ECMA Spec in section 23.2.6
which shows that the eight bytes at 0x478 is a bit map where each bit corresponds to a
metadata table. In this case the value indicates that the tables that are listed are
the tables with the following indexes: 0x0, 0x2, 0xe,
0x20 and 0x23. These are the Module
(21.27), TypeDef (21.34), DeclSecurity (21.11),
Assembly (21.2) and AssemblyRef (21.5) tables. Again, I will spare
you the details of decoding these tables, but basically there is just one item
in each table. On initial sight these values are understandable, after all, the lib
library has just one type. However, I can tell you that I have performed this
analysis on some of the framework native images and find the same values: there
is just one type defined in every assembly. Furthermore, the type defined in
lib, LibraryCode, has two
methods GetVersion and its constructor (created by the compiler).
These methods should be described in a MethodDef table (table index
0x6, 21.24 of the spec) but this code has no such table.
The names of the types defined in the assembly should be listed in the
#Strings stream. This stream starts at 0x4e8 and its size is
0x18 bytes. Here's the relevant data:
04f0 3e 00 6c 69 62 00 6d 73 63 6f 72 6c 69 62 00 00 >.lib.mscorlib..
0500 3a 2e 71 a6 46 08 0b 4c 80 db 00 a6 8b 9c 18 a4 :.q.F..L........
As you can see, there are
just three strings, <Module>, lib and mscorlib.
lib is the name of the module (and hence the only entry in the
Module table), mscorlib is the name of an assembly
that is referenced (and hence the only entry in the AssemblyRef
table), so the remaining string is the name of the only type that is defined in
the assembly. LibraryCode does not exist because it has been
compiled to native code and so its metadata does not exist in this file.
Now check the CLR header (24.3.3). Dumpbin prints out some
values, but not all of values in this structure. You recorded the location of this
structure earlier on and on my machine I get a value of
0x2008 for the RVA, this corresponds to a raw address of 0x208:
0210 10 22 00 00 3c 04 00 00 06 00 00 00 00 00 00 00 ."..<...........
0220 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0240 00 00 00 00 00 00 00 00 50 20 00 00 40 00 00 00 ........P ..@...
The interesting entry is at 0x248, this is the RVA and size of
the ManagedNativeHeader item (RVA 0x2050, size 0x40).
The name of this data directory implies that it is the location of the native
image code, but unfortunately,
the ECMA spec does not document this structure so we can go no further with this
analysis.
|
.NET Version 3.0 The version of dumpbin supplied with Visual Studio 2005 gives all
of the values of the CLR Header, including ManagedNativeHeader.
However, there is still no information about the data that it points to. |
Finally, load the library again in ILDASM. Take a look at the manifest.
This indicates that there is a code access
security permission set (of the type prejitgrant) which indicates
that verification should be skipped for this prejitted code. (This makes sense
because the code will have been verified when the native image was generated and
anyway, verification of native code will fail.) There is also the public key
which you gave to the IL assembly.
Close down ILDASM, and return to the original folder (popd).
Remove the prejitted application with:
12.4 CLR Optimization Service in .NET 3.0/2.0
The last section identified that there is an inherent brittleness in native images: a native image is heavily dependent upon the OS version and local settings like the security policy and binding configuration. If any of these things change the native image becomes invalid. Of course, if you know that something has changed that could affect the native image you can always rebuild the affected images. Microsoft decided to make this mechanism easier to do in .NET version 3.0/2.0.
Again, use the library, process and key file that you used in the last section. Compile the library with the .NET 3.0/2.0 compiler and then compile the process. Next, generate a native image for the process:
Notice the new syntax: you provide a command followed by the name of the assembly. If that assembly uses other assemblies then their native images will be generated too:
Copyright (C) Microsoft Corporation 1998-2002. All rights reserved.
Installing assembly C:\TestFolder\Fusion\2.0.50727\12.3\app.exe
Compiling 2 assemblies:
Compiling assembly C:\TestFolder\Fusion\2.0.50727\12.3\app.exe ...
app, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null
Compiling assembly lib, Version=1.0.0.0, Culture=neutral, PublicKeyToken=3bf941bb1f722efe ...
lib, Version=1.0.0.0, Culture=neutral, PublicKeyToken=3bf941bb1f722efe
Notice that ngen first determines the libraries used by this
assembly and then it generates the native images. The tool determines the
libraries used by the assembly by reading its metadata. ngen will
not generate native images for assemblies you dynamically load using one
of the Assembly methods. Now list the
assemblies in the native image cache, the new syntax is this:
I have piped the output through more because there is so much
output. Notice that it gives two lists, NGEN Roots and Native Images.
The former lists the top level assemblies in the cache, and you will find that
app.exe will be listed in this group (note that the path to
this assembly will be given). Native Images is a general list of the
native images and in this list you'll find app and lib
(in this case, the full name of the assemblies are given). Now ask ngen to
display a filtered list of the cache:
This will show that app is a root and it will show that the
root, app, depends on, erm, app. OK, so this last
statement seems to be a tautology, but if you try this action on lib,
you'll see that it makes more sense: lib is not a root, but one
root, app depends on it. This shows that information is stored
about the roots and the libraries they depend on, a mechanism called tracking.
Now let's take a look in the native images. Move to the assembly cache (pushd %windir%\assembly)
and then move to the native image cache (on my machine for .NET version 3.0 this is
called NativeImages_v2.0.50727_32). Move to the folder for the
library (cd lib) and there you'll find a folder that has the name
of a 16 byte hex number. Change the directory to that folder. The folder will
contain a single file called lib.ni.dll. Run dumpbin
on this file to list the headers.
The first difference you notice is that the file has many more sections than
a normal Win32 DLL or a .NET library assembly. A library assembly will normally
have three sections: .reloc (relocation information) .rsrc
(unmanaged resources) and .text (the location of the IL and
metadata). The native image has eight sections, the new sections are:
.data (global variables), .dbgmap, .extrel,
.il and .xdata (unwind information for native SEH
exceptions). Three of these are new section types. Of particular interest is the
.il section, this section is almost as large as the .text
section for this file (with good reason, as you'll see later). The .il section is marked as containing read-only code,
whereas the .text section is marked as being read/execute code. So
the data in the .il section is meant to be read, but not executed.
Even more interesting is the .data section which is read/write
initialized data and is larger than either the .il or .text
sections. I do not know what this section is used for.
Write down the virtual address and the raw address of the .text
and .il sections because you'll use them later. Now use dumpbin to list the CLR
header in the file:
Write down the RVA of the Metadata Directory. Also, notice that
dumpbin lists the undocumented ManagedNativeHeader
directory. Now load the library in the hex editor of Visual Studio (you'll have to
disable the namespace extension using the steps I outlined
earlier). Move to the location
indicated by the Metadata Directory, you'll need to convert from an RVA
to the raw address. On my machine the RVA is 0x23d0 and the
.text section covers the range 0x2000 to 0x28e2,
so the metadata is in the .text section. Since this section starts
at
the raw address 0x400, it means that the metadata is at 0x7d0 (0x400
+ 0x23d0 - 0x200):
07e0 76 32 2e 30 2e 35 30 37 32 37 00 00 00 00 05 00 v2.0.50727......
07f0 6c 00 00 00 8c 00 00 00 23 7e 00 00 f8 00 00 00 l.......#~......
0800 18 00 00 00 23 53 74 72 69 6e 67 73 00 00 00 00 ....#Strings....
0810 10 01 00 00 08 00 00 00 23 55 53 00 18 01 00 00 ........#US.....
0820 10 00 00 00 23 47 55 49 44 00 00 00 28 01 00 00 ....#GUID...(...
0830 dc 02 00 00 23 42 6c 6f 62 00 00 00 00 00 00 00 ....#Blob.......
0840 02 00 00 01 05 40 00 00 09 00 00 00 00 fa 01 33 .....@.........3
0850 00 16 00 00 01 00 00 00 01 00 00 00 01 00 00 00 ................
0860 01 00 00 00 02 00 00 00 00 00 00 00 01 00 00 00 ................
0870 00 00 00 00 00 00 01 00 00 00 00 00 01 00 01 00 ................
0880 0b 00 06 00 b5 00 04 80 00 00 01 00 00 00 00 00 ................
0890 00 00 01 00 00 00 13 00 13 00 00 00 02 00 00 00 ................
08a0 00 00 00 00 00 00 00 00 01 00 0a 00 00 00 00 00 ................
08b0 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 13 00 ................
08c0 00 00 00 00 00 00 00 00 00 3c 4d 6f 64 75 6c 65 .........<Module
08d0 3e 00 6d 73 63 6f 72 6c 69 62 00 6c 69 62 00 00 >.mscorlib.lib..
The size of the #~ stream 0x8c bytes and it starts at offset 0x6c from the start
of the metadata root. This is an address of 0x83c and the 8 bytes at
0x844 gives the bitmap of the metadata tables present. As before,
these are the Module, TypeDef, DeclSecurity,
Assembly and AssemblyRef tables. These tables have one entry
each, except for the AssemblyRef table which has two entries (this
is different to .NET 1.1, where the AssemblyRef table had one
entry). Again, I don't want to go into the details of how to calculate what
these entries are, but basically one of these entries is a reference to the mscorlib
library and
the other is to lib; in .NET 1.1 the single entry was to
mscorlib. So the metadata in this native image is very similar to 1.1
native images except that there is this extra assembly reference to the assembly
that was used to create the native image.
Now move to the position in the file where the .il section is
stored (you should have recorded this raw address from the data dumpbin
gave). Here is part of the data I see:
2e10 76 32 2e 30 2e 35 30 37 32 37 00 00 00 00 05 00 v2.0.50727......
2e20 6c 00 00 00 38 01 00 00 23 7e 00 00 a4 01 00 00 l...8...#~......
2e30 58 01 00 00 23 53 74 72 69 6e 67 73 00 00 00 00 X...#Strings....
2e40 fc 02 00 00 38 00 00 00 23 55 53 00 34 03 00 00 ....8...#US.4...
2e50 10 00 00 00 23 47 55 49 44 00 00 00 44 03 00 00 ....#GUID...D...
2e60 10 01 00 00 23 42 6c 6f 62 00 00 00 00 00 00 00 ....#Blob.......
2e70 02 00 00 01 47 14 02 00 09 00 00 00 00 fa 01 33 ....G..........3
2e80 00 16 00 00 01 00 00 00 09 00 00 00 02 00 00 00 ................
2e90 02 00 00 00 0b 00 00 00 03 00 00 00 01 00 00 00 ................
2ea0 01 00 00 00 01 00 00 00 00 00 0a 00 01 00 00 00 ................
2eb0 00 00 06 00 2e 00 27 00 06 00 58 00 46 00 06 00 ......'...X.F...
2ec0 71 00 46 00 06 00 aa 00 8a 00 06 00 ca 00 8a 00 q.F.............
This is another metadata root, but notice now that the size of the
#~ stream is 0x0138 bytes compared to 0x8c
bytes in the previous metadata root. Similarly, there are 0x0158
bytes of strings (compared to 0x18). This is totally different
metadata, and if you inspect the strings section (in my case this is at
0x2e00 + 0x01a4 = 0x2fa4) you'll find the strings of the types and type
members used in LibraryCode. In other words the metadata from the
IL assembly has been embedded into the native image in the .il
section.
If your 1.1 code used reflection then the 1.1 runtime would have to load the
IL image of the assembly to get access to the metadata. As you can see, this is
not the case for 3.0/2.0 assemblies, the metadata is in the .il section
of the native image. However, it does not stop at metadata. The metadata stream indicates that there are the following tables:
Module, TypeRef, TypeDef, MethodDef,
MemberRef, CustomAttribute, StandAloneSig,
Assembly and AssemblyRef. The MethodDef
table will contain information about the methods defined in the assembly,
including the RVA of the IL for the method. Analysing this metadata shows that there are
two entries in the MethodDef table, and these are for GetVersion and .ctor (the
constructor of LibraryCode). The interesting things is that the RVAs for these methods are
0x04 and 0x3c respectively. Of course, these values cannot
be converted to raw addresses using the normal mechanism, and so they are not
really RVAs. ILDASM on an IL assembly can show the actual bytes for the IL it is
displaying, so I ran ILDASM on the original assembly and obtained
the sequences of bytes for GetVersion.
I found that these bytes are in the native image at 0x3270. The IL for a method has a header
which indicates information about the method. This can be either tiny format
(1 byte) or fat format (12 bytes). The bytes preceding the IL I
identified at 0x3270 had the information that would be in the fat format header. Taking this into
account, the location of the IL for the method is at 0x3264. So the
RVA entry in the MethodDef table (0x4) must be the offset
from the address 0x3260, which itself is 0xc bytes
after the end of the last stream. There is no relevant documentation about what
these 0xc bytes mean, however, what is clear is that the IL for the
methods is in the native image.
| Let me reiterate: the native image contains both the metadata and the IL from the assembly that was used to generate the image. |
Now you've finished with this file, so close it in VS and at the command line
you can return to the assembly application folder (popd).
Before leaving this section it is worth pointing out some other features of
ngen. When you create a native image the tool will locate the
libraries that it uses, which the documentation calls dependencies. (This is a
confusing term, one definition for dependency is somethign that relies on
something else. clearly this is not the case, the libraries are not dependent
upon the root, it is the root that is dependent upon them. However, there
is another definition of dependency that means 'a subordinate' which I guess is
the definition used here since a library can be treated as a subordinate to the
assembly that loads it. I wish Microsoft had used the term subordinate rather
than dependency.) The native image generator needs to have the configuration for
the assembly: if you are generating a native image for a library in the GAC then
it needs to have access to any publisher policy files for the library; if the
assembly is a process then the tool needs to get access to the configuration
file. By default ngen will use normal Fusion probing to find the
libraries the assembly uses and will use the current folder
as the application base folder to search for private assemblies. However, you can supply
the /appbase switch to indicate
another folder.
If you are creating the native image for a library then the tool will use the
information in the manifest to determine the libraries it uses, however, if this
library is used with a process, the process configuration may specify version
redirects for the libraries it uses. You can use the /execonfig
switch to give the
configuration file of a process assembly that provides such redirects, but
clearly, the native image library will now be tied to that process, especially
if hard binding is used (see later).
If the assembly is to be debugged then you can use the /debug
switch to tell the native image generator to generate additional debugging
information, otherwise, if a process is run under a debugger the runtime will
load the IL assembly instead. Furthermore, if you want to use the assembly under
a profiler you can use the /profile switch.
12.5 Logging Binding to Native Images
At this point you know that ngen will create native images for
you and install this in a folder under %windir%\assembly. You also
know that the native image contains the metadata and the IL of the
assembly from which it was created. But are you convinced that when you run the
application the native image rather than the IL image is loaded? Well, to
convince you that this is the case, we will use the new version of fuslogvw.
This tool now has
an option, at the bottom right hand side, to view the log file of binds to native images.
Select this option, Native Images. You'll probably see some entries already
for native image binds, but ignore these, indeed, to make sure that you only see
the binds we are to perform, remove these existing entries by clicking on the
Delete All button. Now click on the Settings button. I mentioned
earlier that fuslogvw
appears to work correctly only if you specify a custom log path, if you have not
done that already, do it now. Make sure that you check Log all binds to disk
and click on OK. Now run the process that you have generated a native
image for and then
click on the Refresh button on fuslogvw:

Notice that three binds have occurred, one is called ExplicitBind,
which is for the process itself, and the other two are for the libraries that
the process uses, lib and mscorlib. Take a look at
these files. The contents are similar to those that you would expect for a bind
to an IL assembly, but there are differences. Here is an excerpt:
LOG: IL assembly loaded from C:\TestFolder\Fusion\2.0.50727\12.3\app.exe.
LOG: Start validating native image app, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null.
LOG: Start validating all the dependencies.
LOG: [Level 1]Start validating native image dependency mscorlib, Version=2.0.0.0,
Culture=neutral, PublicKeyToken=b77a5c561934e089.
LOG: Dependency evaluation succeeded.
LOG: [Level 1]Start validating IL dependency lib, Version=1.0.0.0, Culture=neutral,
PublicKeyToken=3bf941bb1f722efe.
LOG: Dependency evaluation succeeded.
LOG: Validation of dependencies succeeded.
LOG: Start loading all the dependencies into load context.
LOG: Loading of dependencies succeeded.
LOG: Bind to native image succeeded.
Native image has correct version information.
Attempting to use native image C:\WINDOWS\assembly\NativeImages_v2.0.50727_32\app\0b1006044ca3f944ab21eb0c07f4d752\app.ni.exe.
Native image successfully used.
Note that it first loads the IL assembly and then it goes through a process of 'validating' the assembly and the libraries it uses. After it has completed this procedure it loads the native image from the cache. The logs for the libraries show that a similar procedure is performed on those, and the log gives the name of the native image that is loaded. At this point remove the native images from the cache using:
12.6 Automatic Redirect to Native Images
In this example, you will create a shared library and create two processes that use it, then you'll create native images for all of the assemblies. Next, you'll create a new version of the library, install that into the GAC and provide a publisher policy assembly to redirect the processes to use the new library.
Use the files from the previous example. First, confirm that there is
no config file and that the library has not been installed in the GAC. Now
confirm that the version of the library code is 1.0.0.0. Compile
the library and the process.
csc app.cs /r:lib.dll
Next, create a second process that uses this library, you can do this by specifying a different output name:
Put the library in the GAC and then create native images for both of the processes:
ngen install app.exe
ngen install app2.exe
Run both processes to convince yourself that they work and check the fusion
log to see that the native images are being loaded. Now change the library
source so that the version is 1.1.0.0, compile this library and
then insert it into the GAC. To indicate that this new version should be used
instead of the old version you can create a publisher
policy file. To do this, create a configuration file for the library (I have
called it lib.config) with the redirection information:
<configuration>
<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
<dependentAssembly>
<assemblyIdentity name="lib" publicKeyToken="3bf941bb1f722efe" />
<bindingRedirect oldVersion="1.0.0.0" newVersion="1.1.0.0" />
</dependentAssembly>
</assemblyBinding>
</runtime>
</configuration>
Compile this to a publisher policy assembly and add it to the GAC:
gacutil -i policy.1.0.lib.dll
Now move to fuslogvw and clear the log by clicking on Delete All.
Now run app and confirm that the new version of the library is
loaded. Switch to fuslogvw and look at the log entry for lib
you will find the following lines:
WRN: No matching native image found.
The log file for the process has these lines:
WRN: [Level 1] Dependency version mismatch.
WRN: No matching native image found.
LOG: Bind to native image assembly did not succeed. Use IL image.
As you can see, it cannot find a native image for version 1.1.0.0,
so it uses the IL file instead. Switch to fuslogvw to the
Default view. Here, you'll find three entries, one for the process (marked
WhereRefBind) and the other two for mscorlib and
lib. The log for lib shows that a redirection has occurred:
LOG: No application configuration file found.
LOG: Using machine configuration file from C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\config\machine.config.
LOG: Publisher policy file is found at C:\WINDOWS\assembly\GAC_MSIL\policy.1.0.lib\0.0.0.0__3bf941bb1f722efe\lib.config.
LOG: Publisher policy file redirect is found: 1.0.0.0 redirected to 1.1.0.0.
LOG: ProcessorArchitecture is locked to MSIL.
LOG: Post-policy reference: lib, Version=1.1.0.0, Culture=neutral, PublicKeyToken=3bf941bb1f722efe,
processorArchitecture=MSIL
LOG: Found assembly by looking in the GAC.
| Notice that you should use both the Default and Native Images logs when you are trying to debug binding errors with native images. |
Clear both logs and repeat the test for app2.
You should find that the same errors are reported. Clear the log. Now you need
to tell ngen to update the native images. You have two options
here, firstly you can call ngen update, this will go through all
native images and check each one to make sure that the libraries it uses are in
the native image GAC and recreate the native image if this is not the case.
Since this
will take a long time, we will use the second option: just install the
process a second time into the native image cache. Type the following:
Switch back to fuslogvw and refresh the display. You will find
that there are four new entries from a process called mscorsvw.exe.
This is the native image generator worker process. In fact this file also
doubles up as the native image generator service if you use delayed updates, but
in this case it is run as a process. These binds give information about loading
the process before the new native image was generated. Clear the log again and
run app. Switch to fuslogvw and confirm that the native image of
the new version of the library is being used.
Now run app2. The results here are confusing. The log file for
lib shows that redirection is used and that it successfully loads
the native image. However, the log file for the process shows that there is a
'dependency version mismatch' and so the IL image is loaded. It is simple to fix
this (just install the process again). However, to me this looks like a bug
because the binding occurs correctly for the library but something happens after
that binding that causes the version mismatch error message.
Now uninstall the second process:
To show that this has succeeded list the contents of the native cache:
You should find folders for app and lib, but not for app2.
Finally, uninstall app and list the contents of the native cache.
You'll find that both app and lib will have been
removed. When you uninstall app2 the ngen tool senses
that lib is being used by another assembly and so it does not
remove it from the cache. However, when you remove the only assembly that uses
this library ngen removes the native image.
Finally clean up by removing the libraries and policy file from the GAC:
gacutil -u lib
So that you don't generate too many log files, switch to fuslogvw
and use the Settings dialog to
select Log bind failures to disk before closing this tool.
12.7 Deferred Updates
I mentioned earlier that mscorsvw.exe can be run as a service.
The reason for this service is to defer updates to a time when the machine is
idle. You can queue up installs, updates and uninstalls using ngen.
Installs and uninstalls can be given one of three priorities, 1,
2 or 3. The first two priorities mean that the change
must be performed immediately with priority 1 changes being performed first.
Priority 3 changes will be performed when the machine is idle. Microsoft do not
say exactly what 'idle' means but they say that it is triggered if there has
been no user input for a certain amount of time.
You can use deferred actions by using the /queue switch, this
will start the ngen service to perform the native image generation.
Since this is a service it means that if the machine reboots while the native
image generation is being performed, the service will be started when the
machine is booted so that the action is completed. However, once all actions in
the queue have been completed the service stops and it will only be restarted
when another action is added to the queue.
ngen also allows you to perform actions on the queue: you can
pause it, continue it from pause and you can tell it to perform all items in the
queue with a particular priority or higher. For example, use the examples from
the last section. (Note that the processes were compiled for version
1.0.0.0 of the library so make sure that you have changed the source for the
library back to this version and recompiled the library.) Now type the following:
ngen install app.exe /queue:3
ngen install app2.exe /queue:1
Note that app2 has a higher priority than app. Now
start the update procedure:
ngen queue continue
For this test we want togenerate native images for all the items in the queue, that
is, we don't want to wait for the machine to become idle. This is why you have
used executequeueditems to perform all actions of priority 3
and higher. The final action tells the service to start work on the queue. You
will not be informed that the actions have completed, however, a short time
after the work has completed the service will be stopped and you can use the
ngen queue status command to test for this.
There is no queue option for removing assemblies from the native image cache, so to clean up use the following to remove the two assemblies from the native image cache:
ngen uninstall app2.exe
12.8 Hard Binding
Earlier I mentioned the issues of creating binding to functions. In normal
Win32 DLLs the library provides an export table with the addresses of the
functions exported by the DLL. If the DLL is loaded in its preferred address
then the addresses in the export table can be called by the process. If the DLL
is loaded at another address then the OS has to perform fix ups to change the
address to the actual address where the function resides. When you create a
native image, ngen will create something similar to an export
address table in the native image file (the details of this are not documented).
This means that at run time the .NET runtime must perform some fix ups to
methods that are called. This clearly takes time and it makes some previously
read-only pages writable, which means they cannot be shared between processes.
The solution to this issue is hard binding. When you specify that you
want to use hard binding ngen will put hard bound addresses in the assembly to the methods
in the native image assembly it references. This means that the code pages are
read-only and shareable, with the downside that all the hard bound libraries must be loaded at
the same time. You cannot require that hard binding is performed, you can merely
indicate that you would like to use it. To do this you add assembly level
attributes to the assembly. Again, the documentation uses the confusing term
dependency, it really should be subordinate. A root assembly will use
the [Dependency] attribute to indicate the subordinate assemblies
it will use and to indicate how likely it is that the subordinate will be
loaded. The rationale is that hard binding will lengthen initial application
start up and so you should avoid it if the subordinate assembly is unlikely to
be called. In this case you should use LoadHint.Sometimes. However,
if the code that uses the subordinate assembly will always be called then you
should use LoadHint.Always to indicate that you would like to use
hard binding.
The [DefaultDependency] is applied to subordinate assemblies.
You use this to indicate how likely you think this assembly will be loaded and
it is used by ngen when there is not a [Dependency]
attribute on the root assembly (the one that uses the subordinate), or if the
has that attribute but specifies LoadHint.Default.
| I hope that you enjoy this tutorial and value the knowledge that you will gain from it. I am always pleased to hear from people who use this tutorial (contact me). If you find this tutorial useful then please also email your comments to mvpga@microsoft.com. |
Errata
If you see an error on this page, please contact me and I will fix the problem.