The other side of the moon: filesystem

Showing posts with label filesystem. Show all posts

Wednesday, February 17, 2021

Recovering from Big Sur upgrade snafu

Apple recently pushed out a new release of MacOS called Big Sur. Unfortunately, the upgrade process is problematic. Specifically, the upgrader does not check for the required disk space before starting the upgrade, and if the target system doesn't have enough disk space (35GB or so), then the upgrade fails partway through, leaving your system in a mostly unusable state.

This is what happened to me.

My environment

The system was a 13" Macbook with a 128GB SSD drive. 128 is pretty small and doesn't leave much space for too many large items.
The system had just a single user.
At the start of the upgrade, the system had about 13GB of free disk space (>10%).
Desktop, Documents and Photos were backed up to iCloud, but Downloads weren't, and some very large photos & videos had been removed from iCloud to save space there, so they only existed locally.

Prior discussion

Mr. Macintosh has published a very detailed explanation of the issue and various ways to get around it without any data loss. This is a very good article that got me very far in my investigation. I was lucky that the latest updates had been posted just a few hours before I hit the problem myself.

Unfortunately, none of the suggested fixes worked for me.

I couldn't mount the drive in Target Disk Mode as my password wouldn't work (the password still worked when logging in locally, but that took me back to the upgrade loop).
I couldn't start up the system in Recovery Mode as it wanted a password, but again, wouldn't accept the password (the same password that worked when fully booting up).
I couldn't access the disk when booting from an external startup disk because of the same issue.

Many posts I found online seemed to suggest that a firmware password was required, but I'd never set this up.

Single User Mode

Eventually, what showed the most promise was booting into Single User Mode and then fiddling around with all available disk devices.

Password worked for Single User Mode

To start up in Single User Mode, press Cmd+S when starting up until the Apple logo shows up.
The system prompts you for a password, and my password did in fact work in this mode.
After signing in, you're dropped into a unix shell.
There's only a basic file system mounted, which contains a limited number of unix commands and none of your data

Mount the Data partition

Once in single user mode, I had to mount my data partition. I first used the mount command to see what was already mounted. It showed that the only mounted device was /dev/disk1s1. I assumed that my Data partition would be /dev/disk1s2 and that it would have the same filesystem, and I chose a convenient mount point:

# mount -t apfs /dev/disk1s2 /System/Volumes/Data

Miraculously, this did not ask me for a password, and mounted my Data partition. I was able to look through the files and identify potential targets to remove. I also noticed that the disk was no completely full (0 bytes free). This was due to the Big Sur installer, which took up 11GB, and then added a few files, using up the entire 13GB that I had available.

Things were getting a little cumbersome here as most of the unix commands I needed to use were not on the primary partition, but on the mounted partition, so I added the appropriate folders to the unix PATH environment variable:

PATH="$PATH":/System/Volumes/Data/usr/bin

I was starting to see that choosing a 3 level deep path as my mount point perhaps wasn't a great idea. I also learned that while the screen is quite wide, the terminal environment is set to show 80 columns of text, and goes into very weird line wrapping issues if you type past that. It's even worse if you try tab completion at this point.

Transferring large files

Some of the large files & folders I identified were downloaded packages that could be removed. Unfortunately this only got me 2GB back. To get enough space back, I'd have to remove some photos and videos that weren't stored on iCloud. I figured I'd copy them over to an SD card and then could delete them.

I popped in the SD Card, and the kernel displayed some debug messages on the terminal. It told me that the card was in /dev/disk4, so I tried mounting that at a random empty directory:

# mount -t exfat /dev/disk4 /System/VM

This did not work!

No SD Cards in Single User Mode

By default, SD Cards are formatted with an EXFAT file system (the kind used by Windows and all digital cameras). Unfortunately, you cannot mount an EXFAT filesystem in Single User Mode as the exfatfs driver isn't compiled into the kernel. It's loaded up as a dynamic module when required. This only works when booting in standard mode with a kernel that allows dynamic loading. Single User Mode does not.

Reformat the SD Card

This was a brand new SD Card, so I decided to reformat it as an Apple file system. I used a different Macbook to do this, however my first attempt didn't work. It isn't sufficient to just format the SD Card, you also need to partition it, and that's where the filesystem is created.

I created a single APFS partition across the entire SD Card and then tried mounting it.

Unfortunately, now it was no longer at /dev/disk4 even though that's what the kernel debug messages said. Looking at /dev/disk* showed me that /dev/disk5s1 was a potential candidate.

# mount -t apfs /dev/disk5s1 /System/VM

Finally, this worked. I was able to copy my files over, and remove them from the Data partition. This freed up about 45GB, which allowed me to continue with the upgrade.

After the upgrade completed, I appear to have 75GB free. I haven't had a chance to check where the space has changed. I also plan to permanently use the SD Card (256GB) as an external hard drive.

Thursday, December 29, 2005

Using the Samsung SP0802N and ASUS K8V-MX together without lockups

Early this year, I started having hard disk problems (signs of an impending crash), and the decision was to replace my old samsung 40Gig with a new samsung 80Gig. The drive installed was a Samsung SP0802N - since I'd heard mostly good reviews of it. I decided to keep both hard disks connected though, just in case.

A few months ago, the computer started showing signs of corrupted RAM. This isn't something that normally happens on two year old RAM. 2 day old RAM, maybe, 10 year old RAM, maybe, but not 2 year old RAM. Power problems are a possibility, and that's not unexpected in my room. Anyway, the system was checked by a hardware guy, and he said that the motherboard needed to be replaced.

The new motherboard was an ASUS K8V-MX and along with that, we got an AMD Semprom processor.

On my next trip back home, I noticed problems with the system. It was running slower, and was locking up on disk intensive processes. A power cycle was required to get it back, and then there was a high chance that BIOS wouldn't recognise my disk, but would grab grub from my old disk. I didn't have time to look at it back in October or November, but in December, I did.

Three things came to my mind.
- bad power,
- bad hard disk/disk controller
- incompatibility somewhere.

We thought the grounding might be bad throughout the house because the stabiliser and spike buster indicated the same at various outlets. I also read through the motherboard manual. I generally do this before installing a new motherboard, but since I hadn't installed this one, I hadn't read it before. The manual said that a BIOS upgrade was required to function correctly, and that MSDOS and a floppy was required to upgrade the BIOS. I had neither, so ignored that for the moment.

Decided to go get a new hard disk and a UPS, but changed my mind about the hard disk at the last moment, and got just the UPS and some more RAM.

The night before I bought the stuff, I moved the PC to a different room to check (I couldn't get it started in my bedroom), and it started up (which further convinced me that it could have been a power problem). I read through /usr/src/linux/Documentation/kernel-parameters.txt for info on what I could do to stabilise the kernel. That pointed me to other docs, one of which told me that a BIOS upgrade was required for certain ASUS motherboards.

Today, I decided to try upgrading the BIOS. I do not have a floppy drive, or MSDOS, so that was a problem. Booted up from the motherboard CD, which started FreeDOS. FreeDOS, however, only recognises FAT16 partitions, and I had none of those.

Switched back to linux, started fdisk, and tried to create a new FAT16 partition 5MB in size. It created one 16MB in size - I guess it's a least count issue. Had to zero out the first 512 bytes of the partition for DOS to recognise it...

dd if=/dev/zero of=/dev/hda11 bs=512 count=1

Then booted back into FreeDOS and formatted the drive:
format c:

Then booted back into linux to copy the ROM image and ROM writing utility to /dev/hda11, and finally, back to FreeDOS to run the utility.

Ran it, and rebooted to get a CMOS checksum error - not entirely unexpected. Went into BIOS setup and reset options that weren't applicable to my box (no floppy drive, no primary slave, boot order, etc.)

Booted into linux and haven't had a problem yet.

Next step - enable ACPI.

Saturday, November 13, 2004

/home is NTFS

A little over a year ago, at my previous company, I had to change my second harddisk on my PC. It was a bit of an experience, because the service engineer who came to do the job had never encountered linux before, but seemed to think that he could deal with it just like he did windows.

The engineer put in the new hard disk as a secondary master (my old one was a secondary slave to the CDD).

He then booted using a Win 95 diskette... hmm... what's this? Then started some norton disk copy utility. It's a DOS app that goes into a graphics mode... why?

Then started transferring data... hey, wait a minute, I don't have any NTFS partitions. Hit reset! Ok, cool down for a minute. I've got three ext3 partitions. So, now it's time to assess the damage.

Boot into linux - hmm, /dev/hdd1 (/home) won't mount, down to root shell. Get /home out of /etc/fstab and reboot. Ok, runlevel 3 again. Check other partitions - hdd5 (/usr) ... good, hdd6 (/usr/share) ... good... everything else is on hda... good. all my data, is in /home ... not good

So, I start trying to figure out how to recover. google... no luck. google again... one proprietary app, and a couple of howtos on recovering deleted files from ext2 partitions... no good. google again, get some docs on the structure of efs2, and find a util called e2salvage which won't build. time to start fooling around myself.

start by reading man pages. tune2fs, e2fsck, debugfs, mke2fs... so I know that mke2fs makes backups of the superblock, but where are they?

mke2fs -n /dev/hdd1... ok, that's where
dd if=/dev/hdd5 of=superblock bs=4096 count=2
hmm, so that's what a superblock looks like
dd if=/dev/hdd5 of=superblock2 bs=4096 count=2 skip=32768
hey, that's not a superblock. Ok, try various combinations, finally get this:
dd if=/dev/hdd5 of=superblock2 bs=1024 count=8 skip=131071
that's 32768*4-1
Ok, so that's where the second superblock is.

Check hdd1 - second superblock blown away as well. Look for the third... 98304*4-1=393215.. ok, that's good. should I dd it to the first? Hmm, no, e2fsck can do that for me... but, I shouldn't work on the original partition. Luckily I have 30GB of free space to play with, and /home is just 6GB.

dd if=/dev/hdd1 of=/mnt/tmp/home.img
cp home.img bak.home.img

Now I start playing with home.img.

The instructions in e2salvage said to try e2fsck before trying e2salvage, so I try that.

e2fsck home.img
no use... can't find superblock
e2fsck -b 32768 -B 4096 home.img
works... starts the fsck, and gets rid of the journal. this is gonna take too long if I do it manually, so I quit, and restart with:
e2fsck -b 32768 -B 4096 -y home.img
The other option would have been to -p(reen) it, but that wouldn't give me any messages on stdout, so I stuck with -y(es to all questions).

2 passes later it says, ok, got whatever I could.

mount -oloop,ro home.img /mnt/home
yippeee, it mounted

cd /mnt/home; ls
lost+found

ok, so everything's in lost+found, and it will take me ages to sift through all this. Filenames might give me some clues.
find . -type f | less
Ok, scroll, scroll, scroll... hmm, this looks like my home directory... yes.
cp -a \#172401 /mnt/home/philip
scroll some more, find /usr/share/doc (which I keep in /home/doc and symlink from /usr/share/doc). move it back to /usr/share/doc. find jdk1.1.8 documentation... pretend I didn't see that.

find moodle home - yay. find yabb home - yay again. Ok, find a bit more that's worth saving, and copy it over. Many files in each of these directories are corrupted, including mailboxes, and some amount of test data, but haven't found anything serious missing.

All code was in CVS anyway, so rebuilt from there where I had to.

Now decided to try e2salvage anyway, on the second copy of hdd1. It wouldn't compile. Changed some code to get it to compile, it ran, found inodes, directories and the works, then segfaulted. The program tries to read from inode 2, which doesn't exist on my partition, and then it tries to printf that inode without checking the return value.

I'd have fixed that, but the result is used in further calculations, so I just left it at that. The old hard disk was taken away, so I don't have anything to play with anymore.

It'll take me a little while to figure out all that was lost, but so far it doesn't look like anything serious.

Friday, August 27, 2004

Undelete in FreeBSD

A colleague of mine deleted a source file he'd been working on for over a week.

How do you undelete a file on a UFS partition? I'd done it before on ext2, I'd also recovered lost files from a damaged FAT32 partition (one that no OS would recognise), heck, I'd even recovered an ext3 file system that had been overwritten by NTFS. Why should undelete be tough?

Well, the tough part was that in all earlier instances, I had a spare partition to play on, _and_ I had root (login, not sudo) on the box, so could effectively boot in single user mode and mount the affected partition read-only. Couldn't do that here. I'd have to work on a live partition. sweet.

The first thing to do was power down the machine (using the power switch) to prevent any writes to the disk via cron or anything else. We then set about trying to figure out our strategy. A couple of websites had tools that could undelete files, but they'd have to be installed on the affected partition, so that was out of the question.

Now the machine has two partitions, one for / and one for /home. / is significantly smaller than /home, but has enough space to play around 100MB at a time. Decided to give it a try, copying /home to /tmp 10MB at a time.

Command:

dd if=/dev/ad0s1e of=deleted-file bs=1024k count=10

searched through the 10MB file for a unique string that should have been in the file. No match. Next 10 MB:

dd if=/dev/ad0s1e of=deleted-file bs=1024k count=10 skip=10

This was obviously going to take all night, so we decided to script it. (the code is broken into multiple lines for readability, we actually had it all on one line).

for i in 10 20 30 40 50 60 70 80 90; do
    dd if=/dev/ad0s1e of=deleted-file bs=1024k count=10 skip=$i;
    grep unique-string deleted-file && echo $i
done

We'd note down the numbers that hit a positive and then go back and get those sections again. Painful.

Ran through without luck. Had to then go from 100 to 190, so scripted that too with an outer loop:

for j in 1 2 3 4 5 6 7 8 9; do
    for i in 00 10 20 30 40 50 60 70 80 90; do
        dd ..... of=deleted-file ... skip=$j$i; ...

The observant reader will ask why we didn't just put in an increment like i=$[ $i+10 ]
Well, that runs too fast, and we wouldn't be able to break out easily. We'd have to do a Ctrl+C for every iteration to break out. This way the sheer pain of having to type in every number we wanted was enough to keep the limits low. That wasn't the reason. We did it because it would also be useful when we had to test only specific blocks that didn't have successive numbers.

IAC, the number of loops soon increased to 3, and the script further evolved to this:

for s in 1 2 3 4; do
    for j in 0 1 2 3 4 5 6 7 8 9; do
        for i in 00 10 20 30 40 50 60 70 80 90; do
            dd if=/dev/ad0s1e of=deleted-file bs=1024k count=10 skip=$s$j$i &>/dev/null;
            grep unique-string deleted-file && echo $s$j$i
        done
    done
done

Pretty soon hit a problem when grep turned up an escape sequence that messed up the screen. Also decided that we may as well save all positive hit files instead of rebuilding them later, so... broke out of the loops, and changed the grep line to this:

grep -q unique-string deleted-file-$s$j$i || rm deleted-file-$s$j$i

Were reasonably happy with the script to leave it to itself now. Might have even changed the iteration to an auto-increment, except there was no point changing it now since what we had would work for the conceivable future (going into the 10's place would be as easy as changing s to 10 11 12... and we didn't expect to have to go much further than 12 because the partition didn't have that much used space).

We finally hit some major positives between 8700 and 8900. Then started the process of extracting the data. 10MB files are too big for editors, and contains mostly binary data that we could get rid off. There was also going to be a lot of false positives because the unique (to the project) string also showed up in some config files that hadn't been deleted.

First ran this loop:

for i in deleted-file-*; do strings $i | less; done

and tried to manually search for the data. Gave up very soon and changed it to this:

for i in deleted-file-*; do echo $i; strings $i | grep unique-string; done

This showed us all lines where unique-string showed up so we could eliminate files that had no interesting content.

We were finally left with 3 files of 10MB each and the task of extracting the deleted file from here.

The first thing was to find out where in the file the code was. We first tried this:

less deleted-file-8650

search for the unique string and scroll up to the start of the text we wanted. Ctrl+G told us the position into the file that we were at (as a percent of the total). Then scroll to the end and again find the percent.

Now, we were reading 10 blocks of 1MB so using the percentage range, could narrow that down to 1 block.

Again got a percentage value within this 1MB file, and now swapped the block size and count a bit. So we went from 1 block of 1024k to 256 blocks of 4k each. Also had to change the offset from 8650 to 256 times that much. bc came in handy here.

I should probably mention that at this point we'd taken a break and headed out to Guzzler's Inn for a couple of pitchers and to watch the olympics. 32/8 was a slightly hard math problem on our return. Yathin has a third party account of that.

We finally narrowed down the search to 2 2K sections and one 8K section, with about 100 bytes of binary content (all ASCII NULLs) at the end of one of the 2K sections. This section was the end of the file. Used gvim to merge the files into one 12K C++ file complete with copyright notice and all.

If you plan on doing this yourself, then change the three for loops to this:

i=10;
while [ $i -lt 9000 ]; do
    dd ...
    i=$[ $i+10 ];
done

Secondly, you could save a lot of time by using grep -ab right up front so you'd get an actual byte count of where to start looking, and just skip the rest. Some people have suggested doing the grep -ab right on the filesystem, but that could generate more data than we could store (40GB partition, and only 200MB of space to store it on).

The other side of the moon