Monday 26 December 2022

Reducing the size of virtual machines exported from VirtualBox

VirtualBox can export virtual machine images which is handy if you want to give someone a ready to use OS for example.

I had an old scanner which I was driving under XP, using the USB forwarding feature. I wanted to give a friend the scanner and an OVA image containing the ready to go OS. The problem is that unused blocks on the NTFS filesystem will affect the compression of the image. Even though I had defragmented the virtual disk and moved the used blocks to the front using a pre-8.0 version of Ultradefrag, the unused blocks after the used area still contained old random data.

Enter sdelete. This is a free utility from Microsoft that wll zero unused blocks. You use it like this:

sdelete -z c:
I ran this once on the virtual disk, then shutdown the virtual machine and exported it to OVA format. The result was quite satisfactory; what was once a 5.1 GB dump turned out more like 2.1 GB.

Sunday 7 August 2022

OpenVPN newer versions require new configuration to use longer Diffie-Hellman keys

I found that after an upgrade, my openvpn setup no longer worked with my client. The message in the log file was:

OpenSSL: error:1408518A:SSL routines:ssl3_ctx_ctrl:dh key too small

Long story short, you need to use dh2048 keys now. Run the command in the sample server.conf file in the sample-config-files of your OpenVPN distribution to generate a dh2048.pem, put it in the same directory where you have dh1024.pem and edit the config file for the parameter dh.

Other changes which I needed to make which you might or might not were:

I turned off comp-lzo compression as suggested. This change also needs to be made in the client config files.

I had the parameter: cipher AES-256-CBC Just let it autonegotiate to AES-256-GCM now. If you wish, turn that paramter to data-ciphers-fallback if you wish AES-256-CBC to be still considered.

Fortunately, aside from disabling comp-lzo, I didn't need to regenerate the client keys or change the rest of the config.

Saturday 16 July 2022

Prefer IPv4 addresses to IPv6

This started when I could not refresh a repo on my Linux installation. Doing a manual wget --spider on the metadata file that it could not retrieve revealed that it was trying to connect to the IPv6 address of the mirror site.

My Internet provider does offer a dual stack IP service but I have not activated the IPv6 portion yet. So the question was how to make sure any http/https clients prefer the IPv4 address.

A web search gave the answer, you edit the file /etc/gai.conf which controls the behaviour of the getaddrinfo(3) function. In it you will find a comment to uncomment the line:

precedence ::ffff:0:0/96  100

if you want IPv6 addresses to sort lower than IPv4 addresses. I also reloaded the nscd service to flush any cached entries.

I suspect I hit this problem because I run my own DNS resolver. If you are relying on a DNS relay from your Internet provider they may have filtered out IPv6 answers for the sake of compatibility.

Monday 30 May 2022

Alias for hexdump for canonical output

The man page for hexdump from util-linux states:

-C, --canonical

Canonical hex+ASCII display. Display the input offset in hexadecimal, followed by sixteen space-separated, two-column, hexadecimal bytes, followed by the same sixteen bytes in %_p format enclosed in '|' characters. Invoking the program as hd implies this option.

This also happens to be my preferred format for working with microprocessor code. However my Linux distro's package chose not to provide a link or alias from hd to hexdump.

No matter, I just made a symlink called hd in my private bin directory to /usr/bin/hexdump and it works as described.

Friday 8 April 2022

Using fail2ban with systemd and firewalld

I installed fail2ban to watch over an openvpn service on a system that uses systemd (and hence journald), as well as firewalld. These are the changes I had to make. Most of it is derived from this wiki entry.

The filter /etc/fail2ban/filter.d/openvpn.local unchanged from the wiki:

# Fail2Ban filter for selected OpenVPN rejections
#
#

[Definition]

# Example messages (other matched messages not seen in the testing server's logs):
# Fri Sep 23 11:55:36 2016 TLS Error: incoming packet authentication failed from [AF_INET]59.90.146.160:51223
# Thu Aug 25 09:36:02 2016 117.207.115.143:58922 TLS Error: TLS handshake failed

failregex = ^ TLS Error: incoming packet authentication failed from \[AF_INET\]<HOST>:\d+$
           ^ <HOST>:\d+ TLS Auth Error
           ^ <HOST>:\d+ TLS Error: TLS handshake failed$
           ^ <HOST>:\d+ VERIFY ERROR
           ^ <HOST>:\d+ Connection reset, restarting

ignoreregex =

The jail file /etc/fail2ban/jail.d/openvpn.local enables the jail for openvpn and the wiki missed out the .local extension without which it will not be registered:

# Fail2Ban configuration fragment for OpenVPN

[openvpn]
enabled  = true
port     = 1194
protocol = udp
filter   = openvpn
journalmatch = _SYSTEMD_UNIT=openvpn@yourhost.service  
backend  = systemd
logpath  = /var/log/fail2ban.log
maxretry = 3

The important line is the backend = systemd. The journalmatch makes scanning the journal more efficient. Usually the openvpn service has @hostname appended as there could be more than one instance. You could also log it to the system journal but here I use a separate log file.

And finally you need to link this jail to firewalld in /etc/fail2ban/jail.local:

# Do all your modifications to the jail's configuration in jail.local!
[DEFAULT]
banaction = firewallcmd-ipset
This calls firewall-cmd to use an ipset to ban the hosts.



Wednesday 2 March 2022

Whole array operations on bash arrays

Today I wanted to take all the elements of a bash array and append /** to each one. This was for a script that syncs selected photo albums to cloud storage using rclone. As you know, rclone syncs directories, so to limit the transfer to a subset of of the directories, I used an --include pattern. Say the albums are cny13 fiji13 misc13 under archived. Then the command required is:

rclone sync archived mydrive:Photos --include "{cny13/**,fiji13/**,misc13/**}"

So the question is how to get this from an array containing:

declare -a ALBUMS=(cny13 fiji13 misc13)

Of course, I could run a small loop where I append /** to each element and accumulate in another array. This does work and efficiency isn't really an issue. But I guess the old APL fan in me was awakened and I wondered if I could transform the whole array in one fell swoop.

I tried:

echo "${ALBUMS[@]}/**"

but this only got me:

cny13 fiji13 misc13/**

The clue was supplied in an online tutorial on bash arrays in which an example was shown of parameter substitution. This in fact works just as well on arrays element by element.

echo "${ALBUMS[@]/%/\/**}"

This substitutes the end of line with /** escaping the leading slash, so we get:

cny13/** fiji13/** misc13/**

I haven't explained how the , is inserted between items, but the entire script using another trick to create a join function shows it:

#!/bin/bash

function join { local IFS="$1"; shift; echo "$*"; }

declare -a ALBUMS=(cny13 fiji13 misc13)
declare -a includes=("${ALBUMS[@]/%/\/**}")
cd ~/Albums || exit 1
albums=$(join , ${includes[@]})
rclone sync archived mydrive:Photos --include "{$albums}"

Wednesday 19 January 2022

Scp cannot handle file times before the Unix epoch

Native Linux filesystems have been able to store file timestamps before the Unix epoch, 1 Jan 1970, for some time now, due to use of 64-bit time_t.

Today I discovered that scp cannot transfer timestamps before the epoch. Here I copied a file which contains a scan of an old photo that I have timestamped back to the day it was sent:

-rw-r--r-- 1 me users 245768 Jan  1  1970 /tmp/1946-07-31-myrtle.pdf

By comparison, rsync does the right thing:

-rw-r--r-- 1 me users 245768 Jul 31  1946 /tmp/1946-07-31-myrtle.pdf

I suppose I could look into the scp protocol to discover why this is.

Watch out for non-breaking spaces on screen scrapes

I have been ripping DVDs I own to MP4 files for convenience of viewing on a tablet. Sometimes I need to get additional information about the episodes. For example I wanted to name the episodes of the Granada Sherlock Holmes TV series with the title of the tale for easy selection. For example 17-The_Musgrave_Ritual.mp4 is much preferred to 17.mp4.

On Wikipedia, episodes of many TV series are tabulated. You can highlight the contents of the table, and paste into a Libreoffice spreadsheet. This can then be exported as a CSV file for futher processing, e.g. with a Python program to generate a shell script that will rename the files the desired way.

This blog post is to point out that screen scraping will also capture the underlying characters in the tables, including extended characters in UTF-8 encoding. No surprise that this includes the non-breaking space: &nbsp; or 0xA0 in 8-bit encoding or \uC2A0 in UTF-8 encoding. So when processing the CSV file, this needs to be converted to a space or your shell scripts won't work. Here's an example of the conversion needed.

datestring = row[5].replace(u"\xa0", " ")

This was to generate a touch -d 'date' episode.mp4 command. Touch kept telling me the date format was invalid until I investigated the date string and found a non-breaking space in it.