At the company I work for, being sick of making clean Windows installs, we decided to willingly violate Windows XP’s EULA for the greater good and put together a few open-source tools (basically
gparted and of course, Linux), wrote a couple of witty scripts and came out with a “free” and nearly legal way of (re)installing Windows on our machines.
Such a method consists basically of having in each machine besides the live Windows installation, a striped out Linux system with a backup image of its Windows (legally registered!). Of course, to save time, we sometimes use that same image to install Windows in more than one than one machine and once it’s been installed, we change the license data and create a new internal backup image with its own license info.
So, I have that set up on my machine as well, except that I have a full, lovely, amazingly useful Debian installation. The problem is, that I can’t be bothered to close, abandon whatever I’m doing and leave my happy place just to boot 5 minutes into Windows, figure out how to do something or test a new script and boot back into Linux to resume my other activities.
Here comes Virtualisation to the rescue, being something I had played with in the past, it wasn’t totally new and I already knew about the different options out there and their pros and cons.
So I decided to give it a shot, but once again, making a fresh Windows install with all of the software needed to make it useful… is just too much of a burden.
Summing up, I’ve got a Linux installation (which will be the Host OS) and a gzipped
ntfsclone Windows XP image (which will, of course, be the Guest OS).
That being, nearly the equivalent of having a live Windows installation, and wanting to migrate it to a Virtual Machine.
There are already several articles about how to do that, but none of the ones I found solved a very simple issue:
You can’t just get the image of a disk partition and boot it!
Either pull the drive from the windows machine or copy the data with a low level image tool (like dd) to a USB drive or other removable media. If making an image, DO NOT image just the partition, this will not work!
What they seem to be suggesting here, is that you make the image of the full hard drive (as opposed of just one partition), just to get the piece of it you want. That may mean having a 240 Gb of raw data, instead of just the 25 Gb you’re interested in… That’s nearly 90% inefficiency.
To get around this, we need to understand why there is this problem to begin with.
You’ll see, it’s not as simple as that virtualisation software not wanting to boot your image because it is a faulty, buggy or incomplete program (as many would dare to suggest without digging in any further!); the problem here, is how hard drives work, how they were designed and how a real computer understands them.
There are many good articles around, and it’s not my intention at all to duplicate that information, so I’m just going to make a quick introduction to the topics, in order for the solution to make a bit of sense.
So, nowadays, everyone knows that a hard drive can have several partitions that look like “different hard drives” inside of the Operative System, but the information about the different partitions, their format, size, position in the physical hard drive, etc. has to be stored somewhere.
Where? Well, most likely you’ve heard about the infamous Partition Table or Master Boot Record (MBR), and I say infamous because probably the one time you heard about those things, you had to curse a lot due to data loss or all those wasted hours.
There we have it, there’s a mystical thing at the very beginning of our hard drive describing where our partitions are and how they are!
Then, when we try to get our Virtual Machine to boot that partition image we made, it’ll complain about it not being a properly formatted disk or something amongst those lines. Of course, there’s no MBR!
Master Boot Record
In short, the MBR is a 512 byte long section with a standard structure:
- Code area (0x000—0x1BD)
- Partition Table (0x1BE—0x1FD)
- Boot Record Signature (0x1FE-0x1FF).
Those hex numbers in parentheses, correspond to the offset within the MBR in which the sections are located, which is also the absolute offset.
We’ll see later that we only have to worry about the Partition Table, so let’s take a look into it.
The Partition Table is really where the information about our disk partitions is written, it has enough room to define four partitions called primary partitions, one of those can be an extended partition, which will contain another partition table with information about all the logical partitions, but we don’t really care much about it right now, if interested read the articles above or make a quick internet search.
This sector then, has also a standard structure:
- Entry for Primary Partition #1 (0x1BE—0x1CD).
- Entry for Primary Partition #2 (0x1CE—0x1DD).
- Entry for Primary Partition #3 (0x1DE—0x1ED).
- Entry for Primary Partition #4 (0x1EE—0x1FD).
That means, that whatever defines the first primary partition, is between sectors 0x1BE and 0x1CD of the MBR.
Those entries have, of course, a structure that is better explained here, but here it goes for completeness’ sake:
- Partition State: 0x80 if it’s the boot partition, 0x00 otherwise (1 byte).
- Starting sector CHS coordinates (3 bytes).
- Partition Type (1 byte).
- Ending sector CHS coordinates (3 bytes).
- Starting sector LBA coordinates (4 bytes).
- Partition length in sectors (4 bytes).
What are those CHS, LBA things you ask. Well, in þe old times, it was actually needed to refer to a disk sector by its CHS coordinates (Cylinder, Head, Sector) which is hardware-dependant. However, nowadays software cares more about LBA (Logical Block Addressing) because it’s easier and the abstraction layers do the hard part.
Also, as Dan Strick said on the FreeBSD mail list (and I believe it just because my BIOS agrees):
Modern BIOS geometry most frequently uses 255 heads and 63 sectors/track because that maximizes the addressable part of the disk drive using the basic int13 function.
Real case MBR
Cool, we now know that there are three things we need, and roughly how they are, but it was all too abstract, so, as an instructive exercise, why don’t you go to your terminal and execute
$ dd if=/dev/sda count=1 | hd | less
Note that you may get into permission errors, just turn root, use sudo or get privileges to the disk group or whatever helps you get raw access to the disk.
Also note that if you mess up the
if= and write
of= instead, you may be killing your MBR :), read
man dd for more info.
So, mine looks a bit like this (I skipped a part as it’s mostly incomprehensible):
00000000 eb 63 90 d0 bc 00 7c 8e c0 8e d8 be 00 7c bf 00 |.c....|......|..| 00000010 06 b9 00 02 fc f3 a4 50 68 1c 06 cb fb b9 04 00 |.......Ph.......| 00000020 bd be 07 80 7e 00 00 7c 0b 0f 85 0e 01 83 c5 10 |....~..|........| 00000170 be 95 7d e8 34 00 be 9a 7d e8 2e 00 cd 18 eb fe |..}.4...}.......| 00000180 47 52 55 42 20 00 47 65 6f 6d 00 48 61 72 64 20 |GRUB .Geom.Hard | 00000190 44 69 73 6b 00 52 65 61 64 00 20 45 72 72 6f 72 |Disk.Read. Error| 000001a0 0d 0a 00 bb 01 00 b4 0e cd 10 ac 3c 00 75 f4 c3 |...........<.u..| 000001b0 00 00 00 00 00 00 00 00 f7 a4 85 a3 2f d2 80 20 |.........J..... | 000001c0 21 00 17 fe ff ff 00 08 00 00 00 00 80 02 00 fe |!...............| 000001d0 ff ff 83 fe ff ff 73 0a 80 02 92 69 04 00 00 fe |......s....i....| 000001e0 ff ff 17 fe ff ff 05 74 84 02 c1 3e 00 00 00 fe |.......t...>....| 000001f0 ff ff 05 fe ff ff fe bf 84 02 02 c8 1c 10 55 aa |..............U.|
The interesting part is at offset 0x1b0, which is the row in which the partition table starts, notice the section at 0x1B8, where we see
f7 a4 85 a3 2f d2, that’d be this disk’s identifier (I must confess I don’t know if, or how this is important) and right after that, starting at 0x1BE, we find the start of the partition table.
If we try the same thing (hexdump the first 512 bytes) on our image (again, some bits have been skipped):
00000000 eb 52 90 4e 54 46 53 20 20 20 20 00 02 08 00 00 |.R.NTFS .....| 00000010 00 00 00 00 00 f8 00 00 3f 00 ff 00 00 08 00 00 |........?.......| 00000020 00 00 00 00 80 00 80 00 f8 ff 7f 02 00 00 00 00 |................| 00000030 00 00 0c 00 00 00 00 00 8e f0 1b 00 00 00 00 00 |................| 00000180 eb f2 c3 0d 0a 45 72 72 6f 72 20 64 65 20 6c 65 |.....Error de le| 00000190 63 74 75 72 61 20 64 65 20 64 69 73 63 6f 00 0d |ctura de disco..| 000001a0 0a 46 61 6c 74 61 20 4e 54 4c 44 52 00 0d 0a 4e |.Falta NTLDR...N| 000001b0 54 4c 44 52 20 63 6f 6d 70 72 69 6d 69 64 6f 00 |TLDR comprimido.| 000001c0 0d 0a 50 72 65 73 69 6f 6e 65 20 43 74 72 6c 2b |..Presione Ctrl+| 000001d0 41 6c 74 2b 53 75 70 72 20 70 61 72 61 20 72 65 |Alt+Supr para re| 000001e0 69 6e 69 63 69 61 72 0d 0a 00 00 00 00 00 00 00 |iniciar.........| 000001f0 00 00 00 00 00 00 00 00 83 9f ad c0 00 00 55 aa |..............U.|
Which doesn’t look like a partition table… Now, that’d explain why our virtualisation software refuses to boot it!
Generating the MBR
As mentioned before, we only have to worry about the partition table, but we do need a valid code area; luckily, there is already some software available to do it for us. That’d be ms-sys, which, by the way is not packaged by Debian due to license issues (citation needed; read it long ago, can’t be arsed to look for it now) but it’s just a matter of downloading the source code and compiling.
ms-sys has several options, the one I’m interested in is
-m. Now, turns out,
ms-sys needs a file to write the data, so let’s create a zeroed one.
$ dd if=/dev/zero of=mymbr count=2048 $ ./ms-sys -f -m mymbr
Notice the -f argument, if it weren’t there,
ms-sys would complain about the file not being a disk device, but it’s ok, we know (or hope we know) what we’re doing. Also, we created a 1MB zeroed file (
dd), that’s because it’ll be the start of our image, and leaving 1MB at the beginning seems to be a sane thing to do (e.g.
gparted does it that way).
Having done that, we get:
00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000001b0 00 00 00 00 00 2c 44 63 00 00 00 00 00 00 00 00 |.....,Dc........| 000001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.| 00000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
That is, no disk identifier, no partition table, but filled in code area and Boot Record Signature; not bad.
We just have to fill in the missing data with a hex editor (e.g.
As for the disk identifier, I guess we can fill in those bytes (0x1B8—0x1BD) pseudo-randomly, but have really no clue if that’s what dedicated software do :).
So, as for the Partition Table, we just need to fill in the first entry (0x1BE—0x1CD), as a single partition is all we need.
- Partition State: 0x80 it’ll be the boot partition.
- Starting sector: 0x002021 the partition will start at 1MB, see details for CHS encoding here.
- Partition Type: 0x07 for NTFS, use
lfor more options.
- Ending sector: odds are, your partition is big enough as to not fit in these three bytes, in those cases 0xFEFFFF is what should be there.
- Starting sector: 0x00080000 that’d be 2048 as stored on a little-endian computer, it’s 2048 because our partition will start after 1MB (2048 sectors of 512 bytes).
- Partition length: this will vary depending on your partition size, first find out the size of your partition in bytes and then divide it by 512, or better yet, find the size of your partition in sectors! Hint:
man ls. This field must be encoded as stored on a little-endian computer as well.
After all that mess, my partition entry looks like this:
000001b0 00 00 00 00 00 2c 44 63 f2 aa cd f3 12 83 80 00 |.....,Dc........| 000001c0 20 21 07 fe ff ff 00 08 00 00 d8 0b 54 02 00 00 | !..........T...| 000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
Joining the new MBR and the image
So, we make a copy of the
mymbr file as
myimg.hdd for example. Remember that we created this file as a zeroed 1MB file and then edited the first 512 bytes corresponding to the MBR.
And then, we create, extract, convert or whatever the image, and add it to that file using
dd, in my case:
$ gunzip -c Image.gz | ntfsclone -r -O - - | dd of=myimg.hdd bs=1048576 seek=1
ntfsclone parts are just because that’s how I have the image already, in practise you just need to pass the raw data to
dd via standard input or create the image directly by indicating the input file (see
Notice the arguments
bs=1048576 seek=1 for
dd, here, we’re telling
dd to start writing on
myimg.hdd after 1MB, leaving our 512 byte long MBR plus some zero data intact; by the way, by increasing the block size, the whole process is considerably faster.
Also, do note that we can overwrite the MBR of this newly created virtual hard drive at any time:
$ dd if=mymbr of=myimg.hdd count=1 conv=notrunc
Note that if you leave out the
conv=notrunc parameter, you’ll lose all your precious data.
Use the image
Now just use whatever virtualisation solution you want with this raw image or maybe even convert it to another format (like
vdi), it’s a perfectly valid virtual hard drive with a single partition.
It does look like quite a mess, but creating a virtual hard drive with a bootable partition from an image or a real hard drive just takes around two minutes, that is, not taking into account the time-consuming
dd step, but then again, there we just have to sit back and relax.
Just as a side note, this does not guarantee by any means a migration from a live installation to a Virtual Machine, it does guarantee however that the virtualisation software will try to boot that partition.
As a matter of fact, my migration was “unsuccessful” at first as I got in VirtualBox the dreadful BSOD, but I could boot using
kvm — albeit, the guest was pretty slow.
It was caused by the IDE/ATA drivers. So I booted with
kvm and used the
MergeIDE solution mentioned here. After doing that, I’m able to boot into that image from VirtualBox and it actually runs fast in