Category Archives: Linux

A deep-dive into UEFI Booting

[According to my own standards, this post about “UEFI booting” was only like 70% “ready” — I had it pending in “draft” state for many months, because I was lacking the time to finish it… I now decided to release it in its current state, simply because I believe it will still be very useful to many people interested in the topic…]

Before we actually dive deep into how UEFI booting works, a short and simple introduction is due.

Introduction

What is UEFI, anyway?

UEFI could be called the successor of the old BIOS concept. It is a unified version and successor of “EFI”, which was an architecture for a platform firmware used to boot operating systems (in the following abbreviated as “OS”), and the corresponding interface to interact with the firmware and the operating system.

The advantages of UEFI over the traditional BIOS are, among others, the following:

  • Boot disks with large partitions (over 2 TB), using GUID partitioning (GPT),
  • network capabilities already in pre-OS phase, and
  • modular design.

Boot Mechanism

So, how does booting with UEFI work?

When you enter your UEFI, you will find a user interface that shows all devices that have been detected, and that support booting. Usually that includes all your hard drives.

You can freely define a desired boot order (regardless of hardware paths, i.e. the way your drives are connected, be it via an SATA port, be it via an NVME slot), i. e. the primary OS that should be booted, if that fails the next OS that should be tried, etc. That’s called your “boot configuration”, and it resides in your motherboard’s NVRAM. (a concept that’s basically the successor of what used to be called “CMOS” in the old days of the BIOS). More specifically, we speak about “UEFI Variables“, which allow the OS and the firmware to interact.

When UEFI’s Firmware Boot Manager wants to boot an OS, it first needs to load something called the OS “Boot Manager.” Common OS boot managers are:

  • BOOTMGFW.EFI used to load Windows, or
  • SHIMX64.EFI used to load Linux

The OS boot manager is located on the “EFI system partition” (ESP). This is a small partition (usually only a few 100 M) at the start of your hard drive, formatted with basically a FAT filesystem. FAT is a very simple filesystem, so that the code to parse it and load files from it can be reasonably small and fit into a boot firmware.

A typical disk layout for a Windows installation may look as follows:

UEFI /EFI System Partition as seen under Windows

The first partition is the ESP, then comes the Windows boot and system drive (with a drive letter of C:), and then comes the recovery partition.

Apart from boot loaders, the ESP can contain kernel images or (device) drivers, e. g. to support hardware that must be initialized prior to the start of the OS, or to give access to a complex filesystem that holds the actual OS to be booted.

Depending on which OS you want to boot, the OS boot manager then loads

  • in the case of Linux: the OS kernel, and the kernel in turn loads the OS, or
  • in the case of Windows: the Windows Boot Loader (\Windows\system32\winload.efi)

Boot Configuration Details

Now that we got a good overview of the mechanism as a whole, let’s dive into the details. Let’s look at the boot configuration of my machine. To do so, invoke the below command (I did it under Ubuntu 23.04, but it should work the same under any reasonably current Linux distro where the tool is installed):

# efibootmgr -v
BootCurrent: 0003
Timeout: 1 seconds
BootOrder: 0000,0002,0003
Boot0000* Windows Boot Manager    HD(1,GPT,a39d5736-7aaf-4be0-b6b6-0852ba2f7803,0x800,0x32000)/File(\EFI\MICROSOFT\BOOT\BOOTMGFW.EFI)WINDOWS………x…B.C.D.O.B.J.E.C.T.=.{.9.d.e.a.8.6.2.c.-.5.c.d.d.-.4.e.7.0.-.a.c.c.1.-.f.3.2.b.3.4.4.d.4.7.9.5.}…R…………….
Boot0002* ubuntu    HD(1,GPT,280ea55d-c182-5242-bb52-a2b40812190c,0x800,0x219800)/File(\EFI\UBUNTU\SHIMX64.EFI)..BO
Boot0003* ubuntu    HD(1,GPT,91b56f9f-3526-404b-b681-1c684551ec4f,0x800,0xd87be)/File(\EFI\UBUNTU\SHIMX64.EFI)..BO

So, what does the above tell us? First, we see that “(Boot)0003” is the boot entry used to start the currently running system. Secondly, the order in which boot is tried is 0000, 0002, and then 0003. So by default, Windows (by the “Windows Boot Manager”) will be booted. Then we see the three boot entries. The star/asterisk (*) after the boot entry shows that all these entries are “active.”

What about the remaining info in the above command output? Immediately after the boot entry, we see the names that are also displayed on screen by the Firmware Boot Manager (“Windows Boot Manager” and two times “ubuntu”). We then see references to the ESPs used to boot these OS.

HD obviously means “hard drive”, then we see a 1 which refers to the first partition on the respective drive, then we see GPT which refers to the partitioning table format, and then we see a UUID. To find the respective partitions, we can use the below command:

# blkid --match-token TYPE=vfat
/dev/nvme0n1p1: LABEL="UBUNTU_TEMP" UUID="EC76-8E7F" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="280ea55d-c182-5242-bb52-a2b40812190c"
/dev/nvme1n1p1: SEC_TYPE="msdos" LABEL_FATBOOT="UBUNTU_MAIN" LABEL="UBUNTU_MAIN" UUID="8140-E4C0" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI" PARTUUID="91b56f9f-3526-404b-b681-1c684551ec4f"
/dev/sda1: UUID="40AD-1127" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="a39d5736-7aaf-4be0-b6b6-0852ba2f7803"

The PARTUUID values in the above output match the UUIDs in the boot configuration as shown by efibootmgr. So, the ESPs are located by searching for the partitions’ UUIDs. That means that you can replug your drives to different ports, or even copy partitions to different drives, and the UEFI boot mechanism will still find them. That’s a nice and very stable design.

UEFI User Interface

Now, let’s enter the UEFI and look at some of the details there. My PC’s motherboard is an MSI, and to enter the UEFI I need to press “F2” after the beep when powering on the PC (from “off” state, not when suspended to RAM, i.e. “sleeping”!) or restarting it.

# lspci
[...]
21:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN570 NVMe SSD 1TB
2b:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN570 NVMe SSD 1TB

ntp running in chroot stopped working after Debian Stretch upgrade

Today I upgraded my root server from Jessie to Stretch, and suddenly ntp stopped working.

I found errors like follows in the log, which were obviously due to failures in name resolution:

2018-05-31T07:44:48.900756+00:00 myhost ntpd[22855]: giving up resolving host 1.debian.pool.ntp.org: Servname not supported for ai_socktype (-8)

The solution to make this work was to bind-mount some files and directories essential for name resolution into the chroot jail.

First create some directories and some dummy files:

# mkdir /var/lib/ntp/etc /var/lib/ntp/lib /var/lib/ntp/proc
# mkdir /var/lib/ntp/usr /var/lib/ntp/usr/lib
# touch /var/lib/ntp/etc/resolv.conf /var/lib/ntp/etc/services

Then update your /etc/fstab as follows:

...
#ntpd chroot mounts
/etc/resolv.conf  /var/lib/ntp/etc/resolv.conf none bind 0 0
/etc/services	  /var/lib/ntp/etc/services none bind 0 0
/lib		  /var/lib/ntp/lib none bind 0 0
/usr/lib	  /var/lib/ntp/usr/lib none bind 0 0
/proc		  /var/lib/ntp/proc none bind 0 0

Finally mount all these binds:

# mount -a

Thanks to the ArchLinux guys where I found this.

To check whether your ntp is working again, you can use the following command which shows a list of peers known to your ntp server:

# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 0.debian.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.000
 1.debian.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.000
 2.debian.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.000
 3.debian.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.000
*ptbtime1.ptb.de .PTB.            1 u   46   64  377   11.483   -0.411   0.201
+ptbtime2.ptb.de .PTB.            1 u   52   64  377   11.502   -0.533   1.069
+ptbtime3.ptb.de .PTB.            1 u   47   64  377   11.451   -0.510   3.866
#batleth.sapient 131.188.3.221    2 u   44   64  377    0.188    1.097   0.176
#5.199.135.170 ( 130.149.17.21    2 u   45   64  377   11.271    0.581   0.396
-mail.morbitzer. 193.175.73.151   2 u   47   64  377    2.760    0.556   0.278
#hotel.zq1.de    161.62.157.173   3 u   46   64  377    0.094    1.384   0.261
-ntp2.m-online.n 212.18.1.106     2 u   47   64  377    7.167   -0.333   0.190
#2a03:b0c0:3:d0: 81.94.123.16     3 u   49   64  377    6.288   -2.071   1.760
#touka.thehomeof 130.149.17.21    2 u   48   64  377    0.206    0.932   0.222
+maggo.info      124.216.164.14   2 u   42   64  377    0.278   -0.137   0.436
+weyoun4.cord.de 124.216.164.14   2 u   44   64  377    2.849   -0.255   0.409
+opnsense.sauff. 222.217.153.8    2 u   43   64  377    0.270   -0.617   0.167
-web1.sys.ccs-ba 192.53.103.104   2 u   35   64  377    0.173   -1.251   0.220
#uruz.org        122.227.206.195  3 u   49   64  377    0.216    1.694   0.309
#clients5.arcani 162.23.41.55     2 u   38   64  377    6.120   -1.500   0.130
+stratum2-1.NTP. 129.70.130.71    2 u   47   64  377   14.043    1.625   0.394

The following command confirms that your current time is actually correct (within certain limits, of course):

# ntpstat
synchronised to NTP server (192.53.103.108) at stratum 2
   time correct to within 15 ms
   polling server every 64 s

If this was helpful, I would be happy to hear from you.

Exim malware scanner issue after upgrade from Jessie to Stretch

Today I finally upgraded by personal root server from Debian Jessie to Stretch, thereby upgrading Exim from 4.84 to 4.89.

After the upgrade, I observed the following errors in mainlog:

2018-05-31 08:02:03 +0000 1fOIX5-0001rg-AM malware acl condition: cmdline  : scanner returned error code: 36096
2018-05-31 08:02:03 +0000 1fOIX5-0001rg-AM H=([IPv6:2a00:6020:1efc:ee20:8857:7824:6a49:8368]) [2a00:6020:1efc:ee20:8857:7824:6a49:8368]:48523 I=[2a01:4f8:141:429::2]:465 Warning: ACL "warn" statement skipped: condition test deferred
2018-05-31 08:02:04 +0000 1fOIX5-0001rg-AM malware acl condition: cmdline  : scanner returned error code: 13
2018-05-31 08:02:04 +0000 1fOIX5-0001rg-AM H=([IPv6:2a00:6020:1efc:ee20:8857:7824:6a49:8368]) [2a00:6020:1efc:ee20:8857:7824:6a49:8368]:48523 I=[2a01:4f8:141:429::2]:465 Warning: ACL "warn" statement skipped: condition test deferred
2018-05-31 08:02:05 +0000 1fOIX5-0001rg-AM malware acl condition: cmdline  : scanner returned error code: 13
2018-05-31 08:02:05 +0000 1fOIX5-0001rg-AM H=([IPv6:2a00:6020:1efc:ee20:8857:7824:6a49:8368]) [2a00:6020:1efc:ee20:8857:7824:6a49:8368]:48523 I=[2a01:4f8:141:429::2]:465 Warning: ACL "warn" statement skipped: condition test deferred

Each of the three cmdline scanners caused an error, as shown above.

It seems there was a change in Exim from upstream, as reported by another user. Somehow it seems that if you define a cmdline scanner that uses a chain of commands, when there was an error return code encountered in the middle of the chain, the whole chain is considered failed.

To “fix” this issue (or rather work-around it), I changed the three ACL clauses as follows:

   warn  message                = This message contains malware ($malware_name)
         set acl_m0      = cmdline:\
-                               /usr/lib/AntiVir/guard/avscan -s --batch --scan-mode=all %s; /bin/echo -e \N"\navira_retval $?"\N:\
+                               /usr/local/bin/avscan_wrapper %s:\
                                \N^avira_retval 1$\N:\
                                \N^.*ALERT::[ \t]+([^;]*)[ \t]+;.*$\N
         malware                = *

I created a “wrapper” that effectively hides error return codes, and forces a return code of 0. The above wrapper looks like this:

#!/bin/bash

ARG="$1"

/usr/lib/AntiVir/guard/avscan -s --batch --scan-mode=all "${ARG}"
/bin/echo -e "\navira_retval $?"

exit 0

To make sure I didn’t break the malware scanning by my changes, I downloaded the EICAR test virus and sent it to myself. Exim caught the “virus” and ditched it.

vzlogger 0.3.9 for Raspbian with microhttpd included

I had a hell of a time compiling vzlogger 0.3.9 for Raspbian with microhttpd included — in the end the resulting binary lacked that functionality.

After a lot of trial-and-error and “forced” the code to be included by hard-coding the following define into as follows:

#define LOCAL_SUPPORT 1
#ifdef LOCAL_SUPPORT
#include "local.h"
#endif /* LOCAL_SUPPORT */

As a convenience to those who want that functionality I’ve attached a ready-made package to this post. Let me know if this helps.

Update 2014-12-28: Version 0.4.0 package with uhttpd support available here.

Update 2015-01-05: Version 0.4.0 package based on Git source with SHA d16c0c4c8d83ab9c13f65eb51d931897e7462bc9 available here.

 

How to roll your own DynDNS…

I didn’t want to rely upon services like DynDNS.org (which obviously was a smart decision since they now pretty much closed their free service) so I rolled my own…

What you need is the following:

  1. Host your domain yourself using the popular nameserver “Bind.”
  2. Host a small CGI script that will tell you your external IP (or use one of the many free services available that do the same).
  3. Run a machine within your LAN 24×7 which can detect changes of your external IP and update your hostname accordingly.

Step 1: Setup Bind for Dynamic DNS Update

to do

Step 2: CGI Script

The CGI script that needs to be deployed somewhere in the Internet to tell you your external IP is very simple and tiny and looks like this:

#!/bin/bash
echo "Content-type: text/plain"
echo ""
echo "$REMOTE_ADDR"

Step 3: External IP Probe

Here’s the script that needs to run periodically on a machine (I use Ubuntu server) within your LAN (or on your Internet gateway, although if you have the means to run stuff on your gateway you could employ a more elegant, “proper” solution):

#!/bin/bash
lockfile="/run/extip"
lockfile-check $lockfile
if [ $? -eq 0 ]; then
    echo "Locked, bailing out..."
    exit 1
fi

lockfile-create $lockfile
filename="/var/lib/extip.txt"
logfile="/var/log/extip.log"
keyfile="/root/var/lib/dyndns/Kmyhost.dyn.example.org.+163+56719.key"
cur_ip=`curl -s http://example.org/cgi-bin/myip.sh`
prev_ip=`cat $filename`
if [ $cur_ip != $prev_ip ]; then
    echo "`date --rfc-3339=seconds` IP changed, old IP: $prev_ip, new IP: $cur_ip" >>$logfile
    echo "$cur_ip" >$filename
    # Wait 5 sec to complete, force kill if nsupdate not done after 10 sec
    timeout -k 10s 5s nsupdate -k $keyfile -v<<EOF
server example.org
zone dyn.example.org.
update delete myhost.dyn.example.org. A
update add myhost.dyn.example.org. 60 A $cur_ip
send
EOF
fi
lockfile-remove $lockfile

The above script — even though it’s pretty small — is not a quick’n’dirty hack, but even employs some sanity checks:

  • It makes sure that only one instance is running at any time, and
  • it uses the timeout command from the Linux coreutils package to enforce that the nsupdate command will be terminated if it takes longer than 10 s (e. g. due to network issues).

I run the above script once a minute as follows:

# cat /etc/cron.d/extip 
* * * * * root /usr/local/bin/pubextip.sh

That’s it!

Let me know what you think. Suggestions how to improve things are, as always, very welcome!

Debian 6.0 to 7.0 upgrade issues…

I upgraded from Debian 6.0 (Squeeze) to Debian 7.0 (Wheezy) today. In general the upgrade was relatively painless, but as always some things went worse than they could… 🙁

My local Subversion repository is using a Berkeley DB, and the underlying BDB version went up from 4.8 to 5.1. In consequence I got an error when I wanted to check in a changed config file:

svn: DB_VERSION_MISMATCH: Database environment version mismatch
svn: bdb: Program version 5.1 doesn't match environment version 4.8

I remember that this has already been an issue with the last major Debian upgrade… Did I miss something in the release notes or package doc, or did the Debian folks miss this one?!

Anyway, here’s how to repair the above (based on instructions found here). Install packages db4.8-util and db5.1-util and execute the following commands:

# cd /path/to/repo
# db4.8_checkpoint -1
# db4.8_recover
# db4.8_archive
log.0000000024
# svnlook youngest ..
746
# db5.1_archive -d

Afterwards you can remove the two packages again.

Next thing I noticed Continue reading Debian 6.0 to 7.0 upgrade issues…

Issues in Ubuntu 13.04 after machine froze…

I recently upgraded from Ubuntu 12.10 to 13.04, and everything seemed to be extremely smooth and painless.

However, a day or two later I suddenly noticed that the fan of my Ubuntu laptop (a Dell Latitude D630) was blowing like hell — the machine had stalled, I couldn’t wake up the desktop again, the screen remained black… I think the hang occurred after I had installed the first updates after upgrading to 13.04… Anyway, I had to hard-reboot the box… And this is when the trouble started… 🙁

Dunno what exactly happened, but the first thing I noticed was boot issues, something like “Cannot mount /boot; ext2: no such filesystem” or something close to that. And indeed the kernel in Ubuntu 13.04 seems to lack support for that admittedly ancient filesystem (cat /proc/filesystems). I fixed that by creating a journal on my /boot filesystem as follows (obviously if you have similar issues, you need to substitute your actual UUID in the command below), thereby migrating the filesystem to ext3:

sudo tune2fs -j UUID=b8ad9dbd-a514-46c8-86af-d2a9cafe3d0c

I also had to update /etc/fstab accordingly, of course:

UUID=b8ad9dbd-a514-46c8-86af-d2a9cafe3d0c /boot           ext3    defaults        0       2

Next thing I noticed that a couple of devices suddenly didn’t work: The touchpad, the touchpoint, my WiFi interface, etc. I quickly found out that obviously the modules required to support the devices weren’t loaded, and it was due to missing/broken modules dependencies. The following file which keeps those dependencies

/lib/modules/3.8.0-19-generic/modules.dep.bin

was truncated (size of 0 bytes).

So I removed it and recreated it by

sudo depmod -a

After I rebooted everything seemed to be fine again.

I hope that this concludes my negative experiences with 13.04, and that the laptop will runs as rock solid again as it used to be under 12.10.

Solved: WordPress on Debian and media uploads not working

I was struggling with a problem that I couldn’t upload media files to my blog — which I host myself on a Debian box in a multi-site configuration. When I tried to do so I got an error message saying that WordPress couldn’t write to the directory. Actually I had this very problem since 2005 when I started this blog — I never took the time to really investigate it (but used the work-around to refer to static content I manually uploaded to my web-server document directory)… 😉

Now that my wife wanted to start blogging, I finally had a good reason to fix this issue. It took me about an hour, and it was done. What I had to do was just two entries in my admin’s Settings (I might not have set up WordPress cleanly when I started it, so you might not have to manually set this) as follows:

And then I needed one Alias in my Apache configuration to redirect the URL prefix to a directory under my web-server document root:

# "Intercepts" files prefix and redirect to "local" (user's) directory
Alias /blog/files /home/rabe/var/www/bergs.biz/blog/files
# This is the prefix to the root of my blog
Alias /blog /usr/share/wordpress

That’s it! Sometimes things are so easy, and it only takes a short while to fix them… 😉

Integrate “AVG Anti-Virus Free Edition” into Exim with ExiScan patch

If you would like to integrate “AVG Anti-Virus Free Edition” into Exim, using the ExiScan patch, this is actually quite easy.

Just insert the following fragment into your Exim config:

  # Reject virus infected messages.
  # Add message to implicit X-ACL-Warn: header
  warn  message         = This message contains malware ($malware_name)
        set acl_m0      = cmdline:\
                          /usr/bin/avgscan --arc %s; echo -e \N"\navg_retval $?"\N:\
                          avg_retval 5:\
                          \NVirus identified *(.*)$\N
        malware         = *
        log_message     = This message contains malware (avg:$malware_name)

Let me know if this works for you — I hacked this up quite quickly, but it seems to do its job…

Courier IMAP: Could not log in after Debian 5.0 upgrade

After I had upgraded my server to Debian 5.0, I found that I could no longer log in via IMAP. I turned authentication debugging on by changing /etc/courier/authdaemonrc as follows:

DEBUG_LOGIN=1

This did not reveal any problems. Here’s an excerpt from mail.info:

authdaemond: received auth request, service=imap, authtype=cram-md5
authdaemond: authmysql: trying this module
authdaemond: cram: challenge=[...], response=[...]
authdaemond: cram: decoded challenge/response, username 'user@example.org'
authdaemond: SQL query: SELECT username, crypt, clear, uid, gid, pop, "",
  "", realname, "" FROM users WHERE username = 'user@example.org'
authdaemond: cram validation succeeded
authdaemond: Authenticated: sysusername=<null>, sysuserid=1000,
  sysgroupid=1000, homedir=/home/user/var/mail/example.org/user,
  address=user@example.org, fullname=Joe User, maildir=<null>,
  quota=<null>, options=<null>

Even though all seemed fine, Thunderbird complained about “server doesn’t support secure authentication.”

So I telnetted into my IMAP server by issuing telnet localhost imap and manually logged in as follows:

a login user@example.org thePass

Now I noticed immediately what was wrong:

* BYE [ALERT] Fatal error: Account's mailbox directory is not owned
  by the correct uid or gid:

The solution is that Courier now by default performs stricter checks on the “sanity” of your setup. I changed /etc/courier/imapd as follows, and all was fine again:

IMAP_MAILBOX_SANITY_CHECK=0