Migrating Windows (or any partition) to a Virtual Machine

Introduction

At the company I work for, being sick of making clean Windows installs, we decided to willingly violate Windows XP’s EULA for the greater good and put together a few open-source tools (basically ntfsclone, ntfsreloc, ntfsresize, gparted and of course, Linux), wrote a couple of witty scripts and came out with a “free” and nearly legal way of (re)installing Windows on our machines.

Such a method consists basically of having in each machine besides the live Windows installation, a striped out Linux system with a backup image of its Windows (legally registered!). Of course, to save time, we sometimes use that same image to install Windows in more than one than one machine and once it’s been installed, we change the license data and create a new internal backup image with its own license info.

So, I have that set up on my machine as well, except that I have a full, lovely, amazingly useful Debian installation. The problem is, that I can’t be bothered to close, abandon whatever I’m doing and leave my happy place just to boot 5 minutes into Windows, figure out how to do something or test a new script and boot back into Linux to resume my other activities.

Here comes Virtualisation to the rescue, being something I had played with in the past, it wasn’t totally new and I already knew about the different options out there and their pros and cons.

So I decided to give it a shot, but once again, making a fresh Windows install with all of the software needed to make it useful… is just too much of a burden.

Introduction
Table of Contents
Issue
Solution
Conclusion

Issue

Summing up, I’ve got a Linux installation (which will be the Host OS) and a gzipped ntfsclone Windows XP image (which will, of course, be the Guest OS).

That being, nearly the equivalent of having a live Windows installation, and wanting to migrate it to a Virtual Machine.

There are already several articles about how to do that, but none of the ones I found solved a very simple issue:

You can’t just get the image of a disk partition and boot it!

As a matter of fact, the lovely guys at VirtualBox already tell you that exactly:

Either pull the drive from the windows machine or copy the data with a low level image tool (like dd) to a USB drive or other removable media. If making an image, DO NOT image just the partition, this will not work!

What they seem to be suggesting here, is that you make the image of the full hard drive (as opposed of just one partition), just to get the piece of it you want. That may mean having a 240 Gb of raw data, instead of just the 25 Gb you’re interested in… That’s nearly 90% inefficiency.

Solution

Analysis

To get around this, we need to understand why there is this problem to begin with.

You’ll see, it’s not as simple as that virtualisation software not wanting to boot your image because it is a faulty, buggy or incomplete program (as many would dare to suggest without digging in any further!); the problem here, is how hard drives work, how they were designed and how a real computer understands them.

There are many good articles around, and it’s not my intention at all to duplicate that information, so I’m just going to make a quick introduction to the topics, in order for the solution to make a bit of sense.

Hard drives

So, nowadays, everyone knows that a hard drive can have several partitions that look like “different hard drives” inside of the Operative System, but the information about the different partitions, their format, size, position in the physical hard drive, etc. has to be stored somewhere.

Where? Well, most likely you’ve heard about the infamous Partition Table or Master Boot Record (MBR), and I say infamous because probably the one time you heard about those things, you had to curse a lot due to data loss or all those wasted hours.

There we have it, there’s a mystical thing at the very beginning of our hard drive describing where our partitions are and how they are!

Then, when we try to get our Virtual Machine to boot that partition image we made, it’ll complain about it not being a properly formatted disk or something amongst those lines. Of course, there’s no MBR!

Master Boot Record

It’s pretty well explained on Wikipedia, but it’s full of historical data and things that, whilst being interesting, are not related to our goal. I’d recommend then, taking a look at these articles.

In short, the MBR is a 512 byte long section with a standard structure:

Code area (0x000—0x1BD)
Partition Table (0x1BE—0x1FD)
Boot Record Signature (0x1FE-0x1FF).

Those hex numbers in parentheses, correspond to the offset within the MBR in which the sections are located, which is also the absolute offset.

We’ll see later that we only have to worry about the Partition Table, so let’s take a look into it.

Partition Table

The Partition Table is really where the information about our disk partitions is written, it has enough room to define four partitions called primary partitions, one of those can be an extended partition, which will contain another partition table with information about all the logical partitions, but we don’t really care much about it right now, if interested read the articles above or make a quick internet search.

This sector then, has also a standard structure:

Entry for Primary Partition #1 (0x1BE—0x1CD).
Entry for Primary Partition #2 (0x1CE—0x1DD).
Entry for Primary Partition #3 (0x1DE—0x1ED).
Entry for Primary Partition #4 (0x1EE—0x1FD).

That means, that whatever defines the first primary partition, is between sectors 0x1BE and 0x1CD of the MBR.

Those entries have, of course, a structure that is better explained here, but here it goes for completeness’ sake:

Partition State: 0x80 if it’s the boot partition, 0x00 otherwise (1 byte).
Starting sector CHS coordinates (3 bytes).
Partition Type (1 byte).
Ending sector CHS coordinates (3 bytes).
Starting sector LBA coordinates (4 bytes).
Partition length in sectors (4 bytes).

What are those CHS, LBA things you ask. Well, in þe old times, it was actually needed to refer to a disk sector by its CHS coordinates (Cylinder, Head, Sector) which is hardware-dependant. However, nowadays software cares more about LBA (Logical Block Addressing) because it’s easier and the abstraction layers do the hard part.

Also, as Dan Strick said on the FreeBSD mail list (and I believe it just because my BIOS agrees):

Modern BIOS geometry most frequently uses 255 heads and 63 sectors/track because that maximizes the addressable part of the disk drive using the basic int13 function.

Real case MBR

Cool, we now know that there are three things we need, and roughly how they are, but it was all too abstract, so, as an instructive exercise, why don’t you go to your terminal and execute

$ dd if=/dev/sda count=1 | hd | less

Note that you may get into permission errors, just turn root, use sudo or get privileges to the disk group or whatever helps you get raw access to the disk. Also note that if you mess up the if= and write of= instead, you may be killing your MBR :), read man dd for more info.

So, mine looks a bit like this (I skipped a part as it’s mostly incomprehensible):

00000000  eb 63 90 d0 bc 00 7c 8e  c0 8e d8 be 00 7c bf 00  |.c....|......|..|
00000010  06 b9 00 02 fc f3 a4 50  68 1c 06 cb fb b9 04 00  |.......Ph.......|
00000020  bd be 07 80 7e 00 00 7c  0b 0f 85 0e 01 83 c5 10  |....~..|........|
00000170  be 95 7d e8 34 00 be 9a  7d e8 2e 00 cd 18 eb fe  |..}.4...}.......|
00000180  47 52 55 42 20 00 47 65  6f 6d 00 48 61 72 64 20  |GRUB .Geom.Hard |
00000190  44 69 73 6b 00 52 65 61  64 00 20 45 72 72 6f 72  |Disk.Read. Error|
000001a0  0d 0a 00 bb 01 00 b4 0e  cd 10 ac 3c 00 75 f4 c3  |...........<.u..|
000001b0  00 00 00 00 00 00 00 00  f7 a4 85 a3 2f d2 80 20  |.........J..... |
000001c0  21 00 17 fe ff ff 00 08  00 00 00 00 80 02 00 fe  |!...............|
000001d0  ff ff 83 fe ff ff 73 0a  80 02 92 69 04 00 00 fe  |......s....i....|
000001e0  ff ff 17 fe ff ff 05 74  84 02 c1 3e 00 00 00 fe  |.......t...>....|
000001f0  ff ff 05 fe ff ff fe bf  84 02 02 c8 1c 10 55 aa  |..............U.|

The interesting part is at offset 0x1b0, which is the row in which the partition table starts, notice the section at 0x1B8, where we see f7 a4 85 a3 2f d2, that’d be this disk’s identifier (I must confess I don’t know if, or how this is important) and right after that, starting at 0x1BE, we find the start of the partition table.

If we try the same thing (hexdump the first 512 bytes) on our image (again, some bits have been skipped):

00000000  eb 52 90 4e 54 46 53 20  20 20 20 00 02 08 00 00  |.R.NTFS    .....|
00000010  00 00 00 00 00 f8 00 00  3f 00 ff 00 00 08 00 00  |........?.......|
00000020  00 00 00 00 80 00 80 00  f8 ff 7f 02 00 00 00 00  |................|
00000030  00 00 0c 00 00 00 00 00  8e f0 1b 00 00 00 00 00  |................|
00000180  eb f2 c3 0d 0a 45 72 72  6f 72 20 64 65 20 6c 65  |.....Error de le|
00000190  63 74 75 72 61 20 64 65  20 64 69 73 63 6f 00 0d  |ctura de disco..|
000001a0  0a 46 61 6c 74 61 20 4e  54 4c 44 52 00 0d 0a 4e  |.Falta NTLDR...N|
000001b0  54 4c 44 52 20 63 6f 6d  70 72 69 6d 69 64 6f 00  |TLDR comprimido.|
000001c0  0d 0a 50 72 65 73 69 6f  6e 65 20 43 74 72 6c 2b  |..Presione Ctrl+|
000001d0  41 6c 74 2b 53 75 70 72  20 70 61 72 61 20 72 65  |Alt+Supr para re|
000001e0  69 6e 69 63 69 61 72 0d  0a 00 00 00 00 00 00 00  |iniciar.........|
000001f0  00 00 00 00 00 00 00 00  83 9f ad c0 00 00 55 aa  |..............U.|

Which doesn’t look like a partition table… Now, that’d explain why our virtualisation software refuses to boot it!

Generating the MBR

As mentioned before, we only have to worry about the partition table, but we do need a valid code area; luckily, there is already some software available to do it for us. That’d be ms-sys, which, by the way is not packaged by Debian due to license issues (citation needed; read it long ago, can’t be arsed to look for it now) but it’s just a matter of downloading the source code and compiling.

ms-sys has several options, the one I’m interested in is -m. Now, turns out, ms-sys needs a file to write the data, so let’s create a zeroed one.

$ dd if=/dev/zero of=mymbr count=2048
$ ./ms-sys -f -m mymbr

Notice the -f argument, if it weren’t there, ms-sys would complain about the file not being a disk device, but it’s ok, we know (or hope we know) what we’re doing. Also, we created a 1MB zeroed file (count=2048 in dd), that’s because it’ll be the start of our image, and leaving 1MB at the beginning seems to be a sane thing to do (e.g. gparted does it that way).

Having done that, we get:

00000180  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001b0  00 00 00 00 00 2c 44 63  00 00 00 00 00 00 00 00  |.....,Dc........|
000001c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
00000200  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

That is, no disk identifier, no partition table, but filled in code area and Boot Record Signature; not bad.

We just have to fill in the missing data with a hex editor (e.g. ghex).

As for the disk identifier, I guess we can fill in those bytes (0x1B8—0x1BD) pseudo-randomly, but have really no clue if that’s what dedicated software do :).

So, as for the Partition Table, we just need to fill in the first entry (0x1BE—0x1CD), as a single partition is all we need.

Partition State: 0x80 it’ll be the boot partition.
Starting sector: 0x002021 the partition will start at 1MB, see details for CHS encoding here.
Partition Type: 0x07 for NTFS, use fdisk and then l for more options.
Ending sector: odds are, your partition is big enough as to not fit in these three bytes, in those cases 0xFEFFFF is what should be there.
Starting sector: 0x00080000 that’d be 2048 as stored on a little-endian computer, it’s 2048 because our partition will start after 1MB (2048 sectors of 512 bytes).
Partition length: this will vary depending on your partition size, first find out the size of your partition in bytes and then divide it by 512, or better yet, find the size of your partition in sectors! Hint: man fdisk, man ls. This field must be encoded as stored on a little-endian computer as well.

After all that mess, my partition entry looks like this:

000001b0  00 00 00 00 00 2c 44 63  f2 aa cd f3 12 83 80 00  |.....,Dc........|
000001c0  20 21 07 fe ff ff 00 08  00 00 d8 0b 54 02 00 00  | !..........T...|
000001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Joining the new MBR and the image

So, we make a copy of the mymbr file as myimg.hdd for example. Remember that we created this file as a zeroed 1MB file and then edited the first 512 bytes corresponding to the MBR.

And then, we create, extract, convert or whatever the image, and add it to that file using dd, in my case:

$ gunzip -c Image.gz | ntfsclone -r -O - - | dd of=myimg.hdd bs=1048576 seek=1

The gunzip and ntfsclone parts are just because that’s how I have the image already, in practise you just need to pass the raw data to dd via standard input or create the image directly by indicating the input file (see man dd). Notice the arguments bs=1048576 seek=1 for dd, here, we’re telling dd to start writing on myimg.hdd after 1MB, leaving our 512 byte long MBR plus some zero data intact; by the way, by increasing the block size, the whole process is considerably faster.

Also, do note that we can overwrite the MBR of this newly created virtual hard drive at any time:

$ dd if=mymbr of=myimg.hdd count=1 conv=notrunc

Note that if you leave out the conv=notrunc parameter, you’ll lose all your precious data.

Use the image

Now just use whatever virtualisation solution you want with this raw image or maybe even convert it to another format (like vdi), it’s a perfectly valid virtual hard drive with a single partition.

Conclusion

It does look like quite a mess, but creating a virtual hard drive with a bootable partition from an image or a real hard drive just takes around two minutes, that is, not taking into account the time-consuming dd step, but then again, there we just have to sit back and relax.

Just as a side note, this does not guarantee by any means a migration from a live installation to a Virtual Machine, it does guarantee however that the virtualisation software will try to boot that partition.

As a matter of fact, my migration was “unsuccessful” at first as I got in VirtualBox the dreadful BSOD, but I could boot using kvm — albeit, the guest was pretty slow.

It was caused by the IDE/ATA drivers. So I booted with kvm and used the MergeIDE solution mentioned here. After doing that, I’m able to boot into that image from VirtualBox and it actually runs fast in kvm.

Evilham.com