For the first time, I created a critical WTF-style problem. Luckily it wasn’t on something critical.

After writing some scripts, I’ve able to recover everything (I think)…

Wtf did I do?Tux inside, idiot outside
Today, I was working on transferring old backups offsite to our office backup drive. I used to use a nice script for this I wrote back in 2004 called shifter.sh. It would create the correct dirs (by date) for each reseller on each server, and move the latest backups there, after that, it would chown everything back the way it should be to allow transport/control by ftp. Well, that’s what it should do.

Due to something called ‘old stuff that shouldn’t be used anymore‘ I recently moved all clients away from a certain server. So I opened up the shifter file and deleted the lines belonging to the /home/backupsys/horus/ dirs (or server). Well, took about 2 minutes. What I didn’t see though, was that the code for the next server had an ‘cd ..’ 2 times, to get back to /home/backupsys from the /home/backupsys/{date}/.

So far so good, I ran the script, however it seemed to be taking longer than I could remember. After reading some stuff on the daily wtf some time ago about moving stuff wrong, I canceled the script and tried to see what the heck went wrong. Well, first of all, the dirs I needed to continue weren’t there. Secondly, I found this desturbing image:

ls -alh /

drwxr-xr-x   23 root     root         4.0K Nov 18 14:19 .
drwxr-xr-x   23 root     root         4.0K Nov 18 14:19 ..
drwxr-xr-x    2 backupsys backupsys     4.0K Nov 18 14:19 11-18-06
-rw-------    1 backupsys backupsys      13K Nov 18 14:21 aquota.group
-rw-------    1 backupsys backupsys      15K Nov 18 14:21 aquota.user
-rw-r--r--    1 root     root            0 Oct 11 18:29 .autofsck
drwxr-xr-x    2 backupsys backupsys     4.0K Apr  2  2005 bin
drwxr-xr-x    3 backupsys backupsys     4.0K Aug 20  2004 boot
drwxr-xr-x   20 backupsys backupsys     116K Oct 11 18:30 dev
drwxr-xr-x   46 backupsys backupsys     4.0K Nov 18 00:10 etc
drwx--x--x   22 backupsys backupsys     4.0K Oct 27 22:21 home
drwxr-xr-x    2 backupsys backupsys     4.0K Jan 25  2003 initrd
drwxr-xr-x    9 backupsys backupsys     4.0K Jun 10  2005 lib
drwx------    2 backupsys backupsys      16K Aug 19  2004 lost+found
drwxr-xr-x    2 backupsys backupsys     4.0K Jan 28  2003 misc
drwxr-xr-x    2 backupsys backupsys     4.0K Aug 18  2004 mnt
drwxr-xr-x    2 backupsys backupsys     4.0K Jan 25  2003 opt
dr-xr-xr-x  101 backupsys backupsys        0 Oct 11 20:29 proc
drwxr-x---   13 backupsys backupsys     4.0K Nov 18 14:22 root
drwxr-xr-x    2 backupsys backupsys     8.0K Jun 10  2005 sbin
drwxrwxrwt   12 backupsys backupsys     4.0K Nov 18 14:23 tmp
drwxr-xr-x   18 backupsys backupsys     4.0K Aug  6 14:38 usr
drwxr-xr-x   19 root     root         4.0K Oct 21  2004 var

Wha does this mean?
What this means, besides the /var every dir is owned by the wrong user. The script was killed by me while it was processing the /usr/. I am happy about the fact I killed it, because the next step would be to cd .. a few more times, and then starting to move everything again. Which would have meant, it would have created a situation like described here.

So, WTF happend?!!
Well, rather easy, there were too much cd ..‘s in the shifter.sh code. So ultimately it went to the /. And due to the fact I usually am too lazy to use the right user for these processes, I had root. So the script started to chown EVERY file on the / recursively to ‘backupsys:backupsys’. This usually is very bad. Luckily I canceled the process before the shell died, but I was unable to open a new shell from ssh.

Repairing the problem
As I said, I’m lazy, so I needed a fast way to repair this. A friend of mine, while laughing his arse off, thought of a nice idea, the ugly type of idea’s I’m known of (at least, that’s what some people think of me :( ). I have another old server with the same distro, so I created a quick and dirty script in PHP to write a ‘chown script‘. This script created chown commands based on the server it is on, which can then be transferred to the server with the ‘owner problem’. After using this script, I used another script I created earlier for Direct Admin, to restore /home ownerships.

In total, it took about 2 hours to f*ck the server up, and restore it. And I did it without any noticeable downtime on ANY service :) .

Does this still qualify for The Daily WTF?

(second note; this post might look a bit f*cked. This is caused by something in the WordPress editor thingy, it won’t parse my entered crap right)

Trackback

no comment untill now

Add your comment now

You must be logged in to post a comment.