At the company I work for, being sick of making clean Windows installs, we decided to willingly violate Windows XP’s EULA for the greater good and put together a few open-source tools (basically ntfsclone
, ntfsreloc
, ntfsresize
, gparted
and of course, Linux), wrote a couple of witty scripts and came out with a “free” and nearly legal way of (re)installing Windows on our machines.
Such a method consists basically of having in each machine besides the live Windows installation, a striped out Linux system with a backup image of its Windows (legally registered!). Of course, to save time, we sometimes use that same image to install Windows in more than one than one machine and once it’s been installed, we change the license data and create a new internal backup image with its own license info.
So, I have that set up on my machine as well, except that I have a full, lovely, amazingly useful Debian installation. The problem is, that I can’t be bothered to close, abandon whatever I’m doing and leave my happy place just to boot 5 minutes into Windows, figure out how to do something or test a new script and boot back into Linux to resume my other activities.
Here comes Virtualisation to the rescue, being something I had played with in the past, it wasn’t totally new and I already knew about the different options out there and their pros and cons.
So I decided to give it a shot, but once again, making a fresh Windows install with all of the software needed to make it useful… is just too much of a burden.
Summing up, I’ve got a Linux installation (which will be the Host OS) and a gzipped ntfsclone
Windows XP image (which will, of course, be the Guest OS).
That being, nearly the equivalent of having a live Windows installation, and wanting to migrate it to a Virtual Machine.
There are already several articles about how to do that, but none of the ones I found solved a very simple issue:
You can’t just get the image of a disk partition and boot it!
As a matter of fact, the lovely guys at VirtualBox already tell you that exactly:
Either pull the drive from the windows machine or copy the data with a low level image tool (like dd) to a USB drive or other removable media. If making an image, DO NOT image just the partition, this will not work!
What they seem to be suggesting here, is that you make the image of the full hard drive (as opposed of just one partition), just to get the piece of it you want. That may mean having a 240 Gb of raw data, instead of just the 25 Gb you’re interested in… That’s nearly 90% inefficiency.
To get around this, we need to understand why there is this problem to begin with.
You’ll see, it’s not as simple as that virtualisation software not wanting to boot your image because it is a faulty, buggy or incomplete program (as many would dare to suggest without digging in any further!); the problem here, is how hard drives work, how they were designed and how a real computer understands them.
There are many good articles around, and it’s not my intention at all to duplicate that information, so I’m just going to make a quick introduction to the topics, in order for the solution to make a bit of sense.
So, nowadays, everyone knows that a hard drive can have several partitions that look like “different hard drives” inside of the Operative System, but the information about the different partitions, their format, size, position in the physical hard drive, etc. has to be stored somewhere.
Where? Well, most likely you’ve heard about the infamous Partition Table or Master Boot Record (MBR), and I say infamous because probably the one time you heard about those things, you had to curse a lot due to data loss or all those wasted hours.
There we have it, there’s a mystical thing at the very beginning of our hard drive describing where our partitions are and how they are!
Then, when we try to get our Virtual Machine to boot that partition image we made, it’ll complain about it not being a properly formatted disk or something amongst those lines. Of course, there’s no MBR!
It’s pretty well explained on Wikipedia, but it’s full of historical data and things that, whilst being interesting, are not related to our goal. I’d recommend then, taking a look at these articles.
In short, the MBR is a 512 byte long section with a standard structure:
Those hex numbers in parentheses, correspond to the offset within the MBR in which the sections are located, which is also the absolute offset.
We’ll see later that we only have to worry about the Partition Table, so let’s take a look into it.
The Partition Table is really where the information about our disk partitions is written, it has enough room to define four partitions called primary partitions, one of those can be an extended partition, which will contain another partition table with information about all the logical partitions, but we don’t really care much about it right now, if interested read the articles above or make a quick internet search.
This sector then, has also a standard structure:
That means, that whatever defines the first primary partition, is between sectors 0x1BE and 0x1CD of the MBR.
Those entries have, of course, a structure that is better explained here, but here it goes for completeness’ sake:
What are those CHS, LBA things you ask. Well, in þe old times, it was actually needed to refer to a disk sector by its CHS coordinates (Cylinder, Head, Sector) which is hardware-dependant. However, nowadays software cares more about LBA (Logical Block Addressing) because it’s easier and the abstraction layers do the hard part.
Also, as Dan Strick said on the FreeBSD mail list (and I believe it just because my BIOS agrees):
Modern BIOS geometry most frequently uses 255 heads and 63 sectors/track because that maximizes the addressable part of the disk drive using the basic int13 function.
Cool, we now know that there are three things we need, and roughly how they are, but it was all too abstract, so, as an instructive exercise, why don’t you go to your terminal and execute
$ dd if=/dev/sda count=1 | hd | less
Note that you may get into permission errors, just turn root, use sudo or get privileges to the disk group or whatever helps you get raw access to the disk.
Also note that if you mess up the if=
and write of=
instead, you may be killing your MBR :), read man dd
for more info.
So, mine looks a bit like this (I skipped a part as it’s mostly incomprehensible):
00000000 eb 63 90 d0 bc 00 7c 8e c0 8e d8 be 00 7c bf 00 |.c....|......|..|
00000010 06 b9 00 02 fc f3 a4 50 68 1c 06 cb fb b9 04 00 |.......Ph.......|
00000020 bd be 07 80 7e 00 00 7c 0b 0f 85 0e 01 83 c5 10 |....~..|........|
00000170 be 95 7d e8 34 00 be 9a 7d e8 2e 00 cd 18 eb fe |..}.4...}.......|
00000180 47 52 55 42 20 00 47 65 6f 6d 00 48 61 72 64 20 |GRUB .Geom.Hard |
00000190 44 69 73 6b 00 52 65 61 64 00 20 45 72 72 6f 72 |Disk.Read. Error|
000001a0 0d 0a 00 bb 01 00 b4 0e cd 10 ac 3c 00 75 f4 c3 |...........<.u..|
000001b0 00 00 00 00 00 00 00 00 f7 a4 85 a3 2f d2 80 20 |.........J..... |
000001c0 21 00 17 fe ff ff 00 08 00 00 00 00 80 02 00 fe |!...............|
000001d0 ff ff 83 fe ff ff 73 0a 80 02 92 69 04 00 00 fe |......s....i....|
000001e0 ff ff 17 fe ff ff 05 74 84 02 c1 3e 00 00 00 fe |.......t...>....|
000001f0 ff ff 05 fe ff ff fe bf 84 02 02 c8 1c 10 55 aa |..............U.|
The interesting part is at offset 0x1b0, which is the row in which the partition table starts, notice the section at 0x1B8, where we see f7 a4 85 a3 2f d2
, that’d be this disk’s identifier (I must confess I don’t know if, or how this is important) and right after that, starting at 0x1BE, we find the start of the partition table.
If we try the same thing (hexdump the first 512 bytes) on our image (again, some bits have been skipped):
00000000 eb 52 90 4e 54 46 53 20 20 20 20 00 02 08 00 00 |.R.NTFS .....|
00000010 00 00 00 00 00 f8 00 00 3f 00 ff 00 00 08 00 00 |........?.......|
00000020 00 00 00 00 80 00 80 00 f8 ff 7f 02 00 00 00 00 |................|
00000030 00 00 0c 00 00 00 00 00 8e f0 1b 00 00 00 00 00 |................|
00000180 eb f2 c3 0d 0a 45 72 72 6f 72 20 64 65 20 6c 65 |.....Error de le|
00000190 63 74 75 72 61 20 64 65 20 64 69 73 63 6f 00 0d |ctura de disco..|
000001a0 0a 46 61 6c 74 61 20 4e 54 4c 44 52 00 0d 0a 4e |.Falta NTLDR...N|
000001b0 54 4c 44 52 20 63 6f 6d 70 72 69 6d 69 64 6f 00 |TLDR comprimido.|
000001c0 0d 0a 50 72 65 73 69 6f 6e 65 20 43 74 72 6c 2b |..Presione Ctrl+|
000001d0 41 6c 74 2b 53 75 70 72 20 70 61 72 61 20 72 65 |Alt+Supr para re|
000001e0 69 6e 69 63 69 61 72 0d 0a 00 00 00 00 00 00 00 |iniciar.........|
000001f0 00 00 00 00 00 00 00 00 83 9f ad c0 00 00 55 aa |..............U.|
Which doesn’t look like a partition table… Now, that’d explain why our virtualisation software refuses to boot it!
As mentioned before, we only have to worry about the partition table, but we do need a valid code area; luckily, there is already some software available to do it for us. That’d be ms-sys, which, by the way is not packaged by Debian due to license issues (citation needed; read it long ago, can’t be arsed to look for it now) but it’s just a matter of downloading the source code and compiling.
ms-sys
has several options, the one I’m interested in is -m
. Now, turns out, ms-sys
needs a file to write the data, so let’s create a zeroed one.
$ dd if=/dev/zero of=mymbr count=2048
$ ./ms-sys -f -m mymbr
Notice the -f argument, if it weren’t there, ms-sys
would complain about the file not being a disk device, but it’s ok, we know (or hope we know) what we’re doing. Also, we created a 1MB zeroed file (count=2048
in dd
), that’s because it’ll be the start of our image, and leaving 1MB at the beginning seems to be a sane thing to do (e.g. gparted
does it that way).
Having done that, we get:
00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001b0 00 00 00 00 00 2c 44 63 00 00 00 00 00 00 00 00 |.....,Dc........|
000001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.|
00000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
That is, no disk identifier, no partition table, but filled in code area and Boot Record Signature; not bad.
We just have to fill in the missing data with a hex editor (e.g. ghex
).
As for the disk identifier, I guess we can fill in those bytes (0x1B8—0x1BD) pseudo-randomly, but have really no clue if that’s what dedicated software do :).
So, as for the Partition Table, we just need to fill in the first entry (0x1BE—0x1CD), as a single partition is all we need.
fdisk
and then l
for more options.man fdisk
, man ls
. This field must be encoded as stored on a little-endian computer as well.After all that mess, my partition entry looks like this:
000001b0 00 00 00 00 00 2c 44 63 f2 aa cd f3 12 83 80 00 |.....,Dc........|
000001c0 20 21 07 fe ff ff 00 08 00 00 d8 0b 54 02 00 00 | !..........T...|
000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
So, we make a copy of the mymbr
file as myimg.hdd
for example. Remember that we created this file as a zeroed 1MB file and then edited the first 512 bytes corresponding to the MBR.
And then, we create, extract, convert or whatever the image, and add it to that file using dd
, in my case:
$ gunzip -c Image.gz | ntfsclone -r -O - - | dd of=myimg.hdd bs=1048576 seek=1
The gunzip
and ntfsclone
parts are just because that’s how I have the image already, in practise you just need to pass the raw data to dd
via standard input or create the image directly by indicating the input file (see man dd
).
Notice the arguments bs=1048576 seek=1
for dd
, here, we’re telling dd
to start writing on myimg.hdd
after 1MB, leaving our 512 byte long MBR plus some zero data intact; by the way, by increasing the block size, the whole process is considerably faster.
Also, do note that we can overwrite the MBR of this newly created virtual hard drive at any time:
$ dd if=mymbr of=myimg.hdd count=1 conv=notrunc
Note that if you leave out the conv=notrunc
parameter, you’ll lose all your precious data.
Now just use whatever virtualisation solution you want with this raw image or maybe even convert it to another format (like vdi
), it’s a perfectly valid virtual hard drive with a single partition.
It does look like quite a mess, but creating a virtual hard drive with a bootable partition from an image or a real hard drive just takes around two minutes, that is, not taking into account the time-consuming dd
step, but then again, there we just have to sit back and relax.
Just as a side note, this does not guarantee by any means a migration from a live installation to a Virtual Machine, it does guarantee however that the virtualisation software will try to boot that partition.
As a matter of fact, my migration was “unsuccessful” at first as I got in VirtualBox the dreadful BSOD, but I could boot using kvm
— albeit, the guest was pretty slow.
It was caused by the IDE/ATA drivers. So I booted with kvm
and used the MergeIDE
solution mentioned here. After doing that, I’m able to boot into that image from VirtualBox and it actually runs fast in kvm
.