{"id":42,"date":"2007-04-19T21:53:49","date_gmt":"2007-04-19T20:53:49","guid":{"rendered":"http:\/\/bergs.biz\/blog\/2007\/04\/19\/a-somewhat-nightmarish-debian-upgrade\/"},"modified":"2008-04-16T16:36:37","modified_gmt":"2008-04-16T14:36:37","slug":"a-somewhat-nightmarish-debian-upgrade","status":"publish","type":"post","link":"https:\/\/bergs.biz\/blog\/2007\/04\/19\/a-somewhat-nightmarish-debian-upgrade\/","title":{"rendered":"A somewhat nightmarish Debian upgrade&#8230;"},"content":{"rendered":"<p>You might remember that I while ago I wrote a post about upgrading my office test-server from Debian Sarge to the almost-ready Etch, and that everything went as smoothly as I&#8217;m used to Debian distro-upgrades.<\/p>\n<p>Well, on 2007-04-10 I upgraded my personal production server (that also hosts this blog) to the finally-released Debian 4.0, code-named &#8220;Etch.&#8221; I sticked pretty closely to the <a href=\"http:\/\/www.debian.org\/releases\/stable\/amd64\/release-notes\/\" target=\"_blank\">release notes<\/a>, which always is a good idea even if you&#8217;re an experienced user.<\/p>\n<p>At first, all seemed to go well. I performed the pre-upgrade step to pull in the new <code>libc6<\/code>, and afterwards performed the dist-upgrade that pulled in the remaining packages to be upgraded or to be newly installed in order to satisfy dependencies.<\/p>\n<p><!--more-->BTW, I used <code>aptitude<\/code> for the task, which now is the recommended package manager for the text console. <code>dselect<\/code> has been deprecated. I used it until now, and could operate it blindly, but after using <code>aptitude<\/code> for a couple of times I&#8217;m impressed how easy and logical to use it is, and I now really <em>love <\/em>it.<\/p>\n<p><code>aptitude<\/code> had to upgrade around 580 packages, which went pretty fast since I have a 6 Mbps DSL line. Lots of configuration work could be performed by answering questions posed to you by <code>dialog<\/code>, the tool that provides the text console with dialog boxes with entry fields, radion buttons, and most of the GUI elements you know from X or Windows.<\/p>\n<p>When this main phase of the upgrade was finished, I was about to leave <code>aptitude<\/code> when a nasty error message popped up: Something about <code>\/var\/lib\/blah<\/code> not being able to be read. I wasn&#8217;t sure what this message was trying to tell me, mostly because I was a first-time user of aptitude, so I simply confirmed the message and was dropped to the shell. I then wanted to check the directory in question, and to my horror the <code>Tab<\/code> command-line directory completion feature didn&#8217;t work, but just produced an error message <code>Input\/output error<\/code>.<\/p>\n<p>Those of you who are experienced Unix users probably know by now that I was into trouble. \ud83d\ude41<\/p>\n<p>&#8220;Somehow&#8221; my <code>\/var<\/code> XFS filesystem was damaged, and so the kernel had shut it down. I tried to unmount it to have it repaired upon remount, but they kernel wouldn&#8217;t let me do that because the filesystem was still in use by dozens of processes, mostly services running on my box. Properly shutting them down was impossible &#8212; no wonder since the system basically is unusable without a <code>\/var<\/code> filesystem. So I had to kill the processes, and after that I was able to unmount <code>\/var<\/code>.<\/p>\n<p>I tried to remount it, but couldn&#8217;t, obviously because the damage was more serious than I had hoped. So I tried <code>xfs_repair<\/code> on it &#8212; which also failed. \ud83d\ude41<\/p>\n<p>My last resort was to zap the filesystem log (using <code>xfs_repair<\/code>&#8216;s <code>-L<\/code> option,) which was a <em>very<\/em> dangerous thing to do because this could have totally destroyed the filesystem beyond repair. Fortunately, I <em>was<\/em> able to remount the filesystem afterwards.<\/p>\n<p>First thing I did after I had repaired <code>\/var<\/code> was to reboot the machine. I had <em>not<\/em> upgraded the kernel so far, so I was still running 2.6.8 which was Sarge&#8217;s default kernel. The main reason for <em>not<\/em> upgrading the kernel at that time was that I didn&#8217;t want to change the system too much in a single step, because in case the system didn&#8217;t come up properly after reboot I could be sure it was a problem in one of the new packages, and not a kernel problem.<\/p>\n<p>Luckily enough, the kernel <em>did<\/em> come up properly. I was still hesitating to upgrade to Etch&#8217;s new default kernel 2.6.18, because a) I wasn&#8217;t forced by some dependency, and b) I was scared by the new initial RAM disk procedure now in use with Etch. In Sarge Debian used <code>initrd-tools<\/code>, now in Etch the tool to be used is <code>initramfs-tools.<\/code><\/p>\n<p>So I kept 2.6.8 running&#8230;<\/p>\n<p>Until I did some package maintaining using <code>aptitude<\/code> again, some days later&#8230; When I had the same problem described above again &#8212; my <code>\/var<\/code> filesystem was damaged again. \ud83d\ude41<\/p>\n<p>I employed the procedure described above to repair it, and I was lucky enough to be able to save it yet another time. I still have no explanation why the old kernel could potentially have destroyed the filesystem, because I was running it for about half a year without <em>any<\/em> problems at all. The filesystem code is all within the kernel, so I don&#8217;t think it could be a compatibility problem with a low-level system library, such as libc6.<\/p>\n<p>Anyway, this repeated disastrous experience made me change my mind, and I performed the upgrade to 2.6.18. To my very big surprise, the box came up again fine after rebooting the machine. I had expected to have to boot into the rescue system to revive the machine, because I have my root filesystem on software RAID-1 (<code>md<\/code> device), which always is a delicate thing. But the Debian guys <em>really<\/em> did an excellent job &#8212; all went smoothly.<\/p>\n<p>So the mystery still remains why my <code>\/var<\/code> filesystem was corrupted two times &#8212; always after using <code>aptitude<\/code>. I know that <code>aptitude<\/code> as a user-space program can&#8217;t have destroyed the filesystem, but maybe it tried to write to parts of the filesystem that were &#8220;semi-broken,&#8221; and that triggered a kernel bug that caused the filesystem driver to crash.<\/p>\n<p>I would be very interested to hear your comments on my experience. Did your upgrade work smoothly? Did you have any unusual problems? Do you have any idea as to the filesystem problems I was facing?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You might remember that I while ago I wrote a post about upgrading my office test-server from Debian Sarge to the almost-ready Etch, and that everything went as smoothly as I&#8217;m used to Debian distro-upgrades. Well, on 2007-04-10 I upgraded my personal production server (that also hosts this blog) to the finally-released Debian 4.0, code-named [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[20],"class_list":["post-42","post","type-post","status-publish","format-standard","hentry","category-computers","tag-linux"],"_links":{"self":[{"href":"https:\/\/bergs.biz\/blog\/wp-json\/wp\/v2\/posts\/42","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bergs.biz\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bergs.biz\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bergs.biz\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/bergs.biz\/blog\/wp-json\/wp\/v2\/comments?post=42"}],"version-history":[{"count":0,"href":"https:\/\/bergs.biz\/blog\/wp-json\/wp\/v2\/posts\/42\/revisions"}],"wp:attachment":[{"href":"https:\/\/bergs.biz\/blog\/wp-json\/wp\/v2\/media?parent=42"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bergs.biz\/blog\/wp-json\/wp\/v2\/categories?post=42"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bergs.biz\/blog\/wp-json\/wp\/v2\/tags?post=42"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}