Technicalfeline: 2015

Monday, 19 October 2015

Enable EXIF rotation for geeqie on openSUSE

I found that while I could rotate a JPEG image from the Edit > Orientation menu, I could not Apply Orientation to Image Content. For this extra programs are required. For JPEG it is exiftran, so I installed that package. For other image formats other packages might be required.

However that by itself is not enough, because geeqie uses a desktop spec file and a helper script to do the rotation for a variety of image types. In the desktop spec file, the helper script is specified as geeqie-rotate. Unfortunately this is in /usr/lib/geeqie and not on the search path.

My solution was to make symlinks from /usr/lib/geeqie/* to /usr/local/bin so that the desktop spec file could invoke the helper script.

Now I can easily fix up those vertical pictures generated by a camera without an orientation sensor.

Thursday, 28 May 2015

Adding fonts to OpenNX

One of my users needed to use NX (version 3) due to working over a lower bandwidth connection. I followed the instructions to install a FreeNX server on CentOS. The user followed the instructions to install an OpenNX client on Windows.

Unfortunately many glyphs came out as squares in CAD applications. After some research, I arrived at these conclusions:

NX version 3 uses client side fonts. It is also possible to use an X font server, but that partly defeats the purpose of NX since the fonts will have to be served over the connection. The reason OpenNX couldn't render all glyphs is because it is distributed with only a basic set of fonts.
NX version 4 doesn't require client side fonts. Unfortunately it is not free software.

Looking into OpenNX, I saw that it uses Xming underneath, with a font directory. Well, what if I installed more fonts there?

I fetched the Xming-fonts installer package from Sourceforge and ran it, selecting the dejavu fonts as well.

I renamed the misc and TTF font directories under OpenNX (typically C:\Program Files\OpenNX\share\Xming\fonts), then copied the misc, TTF and dejavu font directories from Xming (typically C:\Program Files\Xming\fonts) into the OpenNX directory.

I edited (as administrator) the font-dirs file in the Xming root directory under OpenNX to add the paths of the extra font directories for TTF and dejavu.

I started a NX session, opened a terminal window and ran xlsfonts, and voila, I had a much larger set of fonts. The CAD applications ran without missing fonts.

PS: A caveat, when you paste the client DSA key from the server into OpenNX, make sure you end it with a newline or the key cannot be parsed.

Friday, 10 April 2015

Found duplicate PV: using /dev/... not /dev/...

When you mix software RAID1 (md) and LVM, in some situations you can get this message:

Found duplicate PV: using /dev/sdb1 not /dev/md0 ...

and the LVM doesn't assemble. The exact device names may differ, of course. But how does this happen?

What happened that at some point vgscan was run and read partition(s) that were later made into RAID1 members and saw a Physical Volume (PV) UUID on it. Since the PV UUID of a RAID1 array is identical to the PV UUID of the members, you get duplicate(s).

RAID1 members are usually not candidates for PVs, as vgscan normally excludes such devices from consideration. However there is a cache: /etc/lvm/cache/.cache which may contain outdated entries. In the example above it contained an entry for /dev/sdb1 which should have been filtered out by virtue of being in a RAID array. The solution is simple: just run vgscan again to update the cache. But you may have a problem if the device is needed for booting up. If the root devices is on a different partition or you have a rescue DVD you might be able to mount the root filesystem containing /etc read-write and refresh the cache.

Some articles suggest editing the lvm.conf file to specify a filter to exclude the RAID1 members. Try refreshing the cache first before you resort to this as it should just work.

This problem occurred in the context of converting in-situ a filesystem on a single disk to reside in a RAID1.

Thursday, 9 April 2015

Converting single disk to RAID1 in-situ

You have this Linux system that doesn't use RAID. You start to worry about the loss of files (from the last backup; you do backups, right?) and downtime should the disk fail. Maybe it is a good idea to have RAID. But how to retrofit RAID1 without a lot of downtime backing up, reformatting the disks and restoring the data?

I suspected there might be a way to start off with a degraded RAID1 array on the second, new disk, copy the partitions on the old disk onto it, change the type of the old disk to RAID element, add it to the array and let it resync. Sure enough it can be done, and François Marier has blogged it. In fact he goes further and shows how to reinstall the boot loader. I didn't have to do this because my partition is /home. The critical tip is the use of the keyword missing to create the degraded array without issues.

In my case the decision to go RAID1 was done after a failed disk caused loss of files. It was not a wise decision by the system builder to not use RAID1 in the first place.

I've varied the procedure a little. Instead of putting ext4 directly on the RAID partition, I put a logical volume on it, and then created an ext4 partition inside that. This allows me to migrate the content to a larger disk if expansion is needed in future, using logical volume operations, with little downtime.

There's one thing you should do if you decide to use logical volumes on the RAID. After you have assembled the RAID array, run vgscan. This will reinitialise the cache in /etc/lvm/cache/.cache. Otherwise it will contain entries for the components of the array and cause failure to assemble later on with a mysterious (to me at first) duplicate PV error because it thinks the array components are candidates for volumes. LVM is normally configured to ignore components of RAID arrays but only if the cache is up to date. See here for more details.

A couple of caveats: On other Linux systems mdadm.conf may be in /etc, not /etc/mdadm. Also the mdadm --detail --scan command to get the mdadm.conf line will contain a spares=1 directive if run while the array is resyncing. Remove it, or you will have problems next boot.

Saturday, 17 January 2015

ssh hangs at SSH2_MSG_KEX_DH_GEX_GROUP trying to connect to servers behind Cisco firewall

Today I was unable to ssh to some CentOS servers behind a Cisco firewall. I was connected using AnyConnect. When I ran ssh with -v, it showed me that it stopped at expecting SSH2_MSG_KEX_DH_GEX_GROUP.

A search on the Internet turned up this article: Natty Narwhal: Problems connecting to servers behind (Cisco) firewalls using ssh. Shortening the Cipher and MAC list as suggested solved the problem. Apparently due to overflowing some packet size limit somewhere. I'll leave it to the experts to work out what it is about Cisco and ssh.

Friday, 16 January 2015

NVIDIA driver, libglx.so and hardware acceleration crashing X server

My users on CentOS complained that Firefox would crash the X server after I updated the package xorg-x11-server-Xorg.

A search returned suggestions to disable hardware acceleration in Firefox. However a user mentioned that when he reinstalled the NVIDIA driver, it gave the warning libglx.so is not a symbolic link before setting things right.

An investigation showed that the NVIDIA installer replaces the file /usr/lib64/xorg/modules/extensions/libglx.so with a symlink to /usr/lib64/xorg/modules/extensions/libglx.so.X.Y where X.Y is the NVIDIA package version. So every time the xorg-x11-server-Xorg package is reinstalled, it replaces the symlink and hardware acceleration fails, crashing the X server.

The same problem for Debian is documented in this forum thread.

I could blacklist xorg-x11-server-Xorg in yum but I rather not do that. Since I may forget when an update of that package happens, I wrote a shell script to restore the symlink if removed and a cron job to call it periodically. But I'm also looking to make a Puppet stanza to do this.

Thursday, 15 January 2015

Squid failing to fetch IPv6 web resources and dns_v4_first

I had a strange symptom on my Chromebook on my home LAN. A particular Internet web page would not work because a jQuery file distributed by cdnjs.cloudflare.com could not be fetched. When I disabled the use of a proxy in my Chromebook, the web page worked. I have a squid caching proxy on my LAN, partly to be able to zap advertisments. So I knew it had something to do with squid.

Maybe it was a bad object in the cache? I knew there was a way to delete cached objects from the command line, and a search showed me that all I had to do was add these lines to /etc/squid/squid.conf:

acl purge method PURGE

http_access deny purge !localhost

and then use the command:

squidclient -m PURGE https://cdnjs.cloudflare.com/blah.js

to remove it. However when I ran

squidclient https://cdnjs.cloudflare.com/blah.js

to fetch it again, I saw that squid was trying to use the IPv6 address of cdnjs.cloudflare.com to get the resource and failing.

My LAN is fully IPv6 enabled because my Linux machines all support the IPv6 stack, I have a BIND server giving out IPv6 addresses, and in fact a lot of the internal traffic such as Apache goes over IPv6 transparently. I wish I had an IPv6 broadband connection but I don't so I cannot use IPv6 addresses in the outside world.

So the problem boiled down to: how can I prevent squid from trying to reach IPv6 sites. It turns out that the directive dns_v4_first is intended for this. Just adding:

dns_v4_first on

to /etc/squid/squid.conf worked and now I can view that Internet web page.

One symptom remained unexplained though, why didn't the Chrome browser on the host running squid suffer the same problem? I can only surmise that it's because in that case the proxy is specified as an IPv4 address and port so squid thinks, the request is coming in from a IPv4 host so I'll forward this request to an IPv4 origin. The Chromebook however, sends the request to the (internal) IPv6 address of squid so squid thinks it's allowed to forward the request to an IPv6 origin.

Friday, 9 January 2015

Apache won't start on CentOS 5 because self-signed certificate used by mod_nss expired

You are not likely to hit this error with the usual configuration since mod_ssl is the one normally used. However mod_nss is used by the Fedora Directory Server (aka 389 Directory Server now) in CentOS for the console for LDAP authentication (centos-idm-console). Possibly also used by RHEL.

One day I found that Apache would not start. The error log indicated that the certificate had expired. I searched and searched for how to generate a new certificate and tried various things. Too many steps and too hard.

Then I had an idea. The certificate must have been generated when the package mod_nss was installed. Let's try reinstalling it. First we move the old database directory out of the way:

mv /etc/httpd/alias /etc/httpd/alias.old

Then reinstall mod_nss:

yum reinstall mod_nss

and voila, after a while, a new database with a fresh certificate was generated and I could start Apache.