Saturday, 23 July 2016

"$@" is a shell idiom

TL;DR: Use "$@" when you want to pass arguments unchanged to a function or program.

When you read the shell documentation you will see that there are two main ways to refer to all the arguments for passing to a function or program: $* and "$@". What is the difference? This test script will demonstrate it:

#!/bin/sh

testargs() {
       echo testargs: $# arguments
       showargs $*
       showargs "$@"
}

showargs() {
       echo showargs: $# arguments
       for i
       do
               echo $i
       done
}

testargs 1 2 3
testargs word 'quoted phrase' word

The result should be:
testargs: 3 arguments
showargs: 3 arguments
1
2
3
showargs: 3 arguments
1
2
3
testargs: 3 arguments
showargs: 4 arguments
word
quoted
phrase
word
showargs: 3 arguments
word
quoted phrase
word
As you can see, the difference is manifest when an argument has whitespace. "$@" preserves the arguments, not parsing it again to break up at whitespace in arguments. Think if it as an idiom meaning pass arguments verbatim.

Why would you ever use $* though? Here's a place where you shouldn't use "$@".
su -c "$*" user
If you were to use "$@" and it contained multiple arguments, only the first argument would be used by -c and the others would follow, causing a syntax error. This however means that if you want to pass arguments with whitespace to -c, you have to quote them and escape the quotes too.

Saturday, 9 July 2016

Installing Linuxmint 18 on RAID 1

TL;DR: Can't be done easily.

I had a PC with dual disks that I was running under openSUSE Leap 42.1 in RAID 1. I wanted to install Linuxmint 18 on it before giving it away.

First problem I encountered was that the partitioner in the installer knows nothing about RAID.

After some reading I found that the solution is to use gparted to create the RAID partitions first. But gparted didn't know how to do this. I figured out that the mdadm tool was missing, so I did:

apt-get install mdadm

Note that this installs to RAM as it's a live filesystem so needs to be redone if the install is restarted. Also there are some post-install script errors but mdadm gets installed.

Now with gparted I could create the RAID partitions and assemble them. Alternatively it can be done from the command line:

mdadm --assemble --scan

Now md0 and md1 appeared on the list of "disks" in the partitioner and I could assign them to / and /home.

The system installation went fine until it was time to install GRUB. It failed when it couldn't do

grub-install /dev/sdb /dev/sda

Changing the target to just /dev/sda and clicking Continue did nothing. This seems to be an installer bug.

So I did this from the command line:

mount /dev/md0 /mnt
grub-install --root-directory=/mnt /dev/sda
umount /mnt

When I rebooted from disk it stopped at the GRUB prompt. So I tried to boot manually.

grub> linux /boot/vmlin.... --root=/dev/md0
grub> initrd /boot/initrd...
grub> boot

The ... are where you should use TAB completion to select the kernel and initramfs images respectively. Unfortunately this failed when systemd tried to mount the root partition. The reason is that the raid modules are not present in the stock initramfs file. This was built by the Linuxmint 18 distributors.

At this point I gave up. Obviously Linuxmint 18 isn't designed to support this kind of RAID installation. If you want to go further you should rebuild the initramfs with the RAID modules included. Do let us know if you overcome this next hurdle.

And remember if you get it to work, you should install the mdadm package permanently, as it's not part of the default package set.

Monday, 19 October 2015

Enable EXIF rotation for geeqie on openSUSE

I found that while I could rotate a JPEG image from the Edit > Orientation menu, I could not Apply Orientation to Image Content. For this extra programs are required. For JPEG it is exiftran, so I installed that package. For other image formats other packages might be required.

However that by itself is not enough, because geeqie uses a desktop spec file and a helper script to do the rotation for a variety of image types. In the desktop spec file, the helper script is specified as geeqie-rotate. Unfortunately this is in /usr/lib/geeqie and not on the search path.

My solution was to make symlinks from /usr/lib/geeqie/* to /usr/local/bin so that the desktop spec file could invoke the helper script.

Now I can easily fix up those vertical pictures generated by a camera without an orientation sensor.

Thursday, 28 May 2015

Adding fonts to OpenNX

One of my users needed to use NX (version 3) due to working over a lower bandwidth connection. I followed the instructions to install a FreeNX server on CentOS. The user followed the instructions to install an OpenNX client on Windows.

Unfortunately many glyphs came out as squares in CAD applications. After some research, I arrived at these conclusions:

  1. NX version 3 uses client side fonts. It is also possible to use an X font server, but that partly defeats the purpose of NX since the fonts will have to be served over the connection. The reason OpenNX couldn't render all glyphs is because it is distributed with only a basic set of fonts.
  2. NX version 4 doesn't require client side fonts. Unfortunately it is not free software.
Looking into OpenNX, I saw that it uses Xming underneath, with a font directory. Well, what if I installed more fonts there?

I fetched the Xming-fonts installer package from Sourceforge and ran it, selecting the dejavu fonts as well.

I renamed the misc and TTF font directories under OpenNX (typically C:\Program Files\OpenNX\share\Xming\fonts), then copied the misc, TTF and dejavu font directories from Xming (typically C:\Program Files\Xming\fonts) into the OpenNX directory.

I edited (as administrator) the font-dirs file in the Xming root directory under OpenNX to add the paths of the extra font directories for TTF and dejavu.

I started a NX session, opened a terminal window and ran xlsfonts, and voila, I had a much larger set of fonts. The CAD applications ran without missing fonts.

PS: A caveat, when you paste the client DSA key from the server into OpenNX, make sure you end it with a newline or the key cannot be parsed.

Friday, 10 April 2015

Found duplicate PV: using /dev/... not /dev/...

When you mix software RAID1 (md) and LVM, in some situations you can get this message:

Found duplicate PV: using /dev/sdb1 not /dev/md0 ...

and the LVM doesn't assemble. The exact device names may differ, of course. But how does this happen?

What happened that at some point vgscan was run and read partition(s) that were later made into RAID1 members and saw a Physical Volume (PV) UUID on it. Since the PV UUID of a RAID1 array is identical to the PV UUID of the members, you get duplicate(s).

RAID1 members are usually not candidates for PVs, as vgscan  normally excludes such devices from consideration. However there is a cache: /etc/lvm/cache/.cache which may contain outdated entries. In the example above it contained an entry for /dev/sdb1 which should have been filtered out by virtue of being in a RAID array. The solution is simple: just run vgscan again to update the cache. But you may have a problem if the device is needed for booting up. If the root devices is on a different partition or you have a rescue DVD you might be able to mount the root filesystem containing /etc read-write and refresh the cache.

Some articles suggest editing the lvm.conf file to specify a filter to exclude the RAID1 members. Try refreshing the cache first before you resort to this as it should just work.

This problem occurred in the context of converting in-situ a filesystem on a single disk to reside in a RAID1.

Thursday, 9 April 2015

Converting single disk to RAID1 in-situ

You have this Linux system that doesn't use RAID. You start to worry about the loss of files (from the last backup; you do backups, right?) and downtime should the disk fail. Maybe it is a good idea to have RAID. But how to retrofit RAID1 without a lot of downtime backing up, reformatting the disks and restoring the data?

I suspected there might be a way to start off with a degraded RAID1 array on the second, new disk, copy the partitions on the old disk onto it, change the type of the old disk to RAID element, add it to the array and let it resync. Sure enough it can be done, and François Marier has blogged it. In fact he goes further and shows how to reinstall the boot loader. I didn't have to do this because my partition is /home. The critical tip is the use of the keyword missing to create the degraded array without issues.

In my case the decision to go RAID1 was done after a failed disk caused loss of files. It was not a wise decision by the system builder to not use RAID1 in the first place.

I've varied the procedure a little. Instead of putting ext4 directly on the RAID partition, I put a logical volume on it, and then created an ext4 partition inside that. This allows me to migrate the content to a larger disk if expansion is needed in future, using logical volume operations, with little downtime.

There's one thing you should do if you decide to use logical volumes on the RAID. After you have assembled the RAID array, run vgscan. This will reinitialise the cache in /etc/lvm/cache/.cache. Otherwise it will contain entries for the components of the array and cause failure to assemble later on with a mysterious (to me at first) duplicate PV error because it thinks the array components are candidates for volumes. LVM is normally configured to ignore components of RAID arrays but only if the cache is up to date. See here for more details.

A couple of caveats: On other Linux systems mdadm.conf may be in /etc, not /etc/mdadm. Also the mdadm --detail --scan command to get the mdadm.conf line will contain a spares=1 directive if run while the array is resyncing. Remove it, or you will have problems next boot.

Saturday, 17 January 2015

ssh hangs at SSH2_MSG_KEX_DH_GEX_GROUP trying to connect to servers behind Cisco firewall

Today I was unable to ssh to some CentOS servers behind a Cisco firewall. I was connected using AnyConnect. When I ran ssh with -v, it showed me that it stopped at expecting SSH2_MSG_KEX_DH_GEX_GROUP.

A search on the Internet turned up this article: Natty Narwhal: Problems connecting to servers behind (Cisco) firewalls using ssh. Shortening the Cipher and MAC list as suggested solved the problem. Apparently due to overflowing some packet size limit somewhere. I'll leave it to the experts to work out what it is about Cisco and ssh.