Wednesday, 2 March 2022

Whole array operations on bash arrays

Today I wanted to take all the elements of a bash array and append /** to each one. This was for a script that syncs selected photo albums to cloud storage using rclone. As you know, rclone syncs directories, so to limit the transfer to a subset of of the directories, I used an --include pattern. Say the albums are cny13 fiji13 misc13 under archived. Then the command required is:

rclone sync archived mydrive:Photos --include "{cny13/**,fiji13/**,misc13/**}"

So the question is how to get this from an array containing:

declare -a ALBUMS=(cny13 fiji13 misc13)

Of course, I could run a small loop where I append /** to each element and accumulate in another array. This does work and efficiency isn't really an issue. But I guess the old APL fan in me was awakened and I wondered if I could transform the whole array in one fell swoop.

I tried:

echo "${ALBUMS[@]}/**"

but this only got me:

cny13 fiji13 misc13/**

The clue was supplied in an online tutorial on bash arrays in which an example was shown of parameter substitution. This in fact works just as well on arrays element by element.

echo "${ALBUMS[@]/%/\/**}"

This substitutes the end of line with /** escaping the leading slash, so we get:

cny13/** fiji13/** misc13/**

I haven't explained how the , is inserted between items, but the entire script using another trick to create a join function shows it:

#!/bin/bash

function join { local IFS="$1"; shift; echo "$*"; }

declare -a ALBUMS=(cny13 fiji13 misc13)
declare -a includes=("${ALBUMS[@]/%/\/**}")
cd ~/Albums || exit 1
albums=$(join , ${includes[@]})
rclone sync archived mydrive:Photos --include "{$albums}"

Wednesday, 19 January 2022

Scp cannot handle file times before the Unix epoch

Native Linux filesystems have been able to store file timestamps before the Unix epoch, 1 Jan 1970, for some time now, due to use of 64-bit time_t.

Today I discovered that scp cannot transfer timestamps before the epoch. Here I copied a file which contains a scan of an old photo that I have timestamped back to the day it was sent:

-rw-r--r-- 1 me users 245768 Jan  1  1970 /tmp/1946-07-31-myrtle.pdf

By comparison, rsync does the right thing:

-rw-r--r-- 1 me users 245768 Jul 31  1946 /tmp/1946-07-31-myrtle.pdf

I suppose I could look into the scp protocol to discover why this is.

Watch out for non-breaking spaces on screen scrapes

I have been ripping DVDs I own to MP4 files for convenience of viewing on a tablet. Sometimes I need to get additional information about the episodes. For example I wanted to name the episodes of the Granada Sherlock Holmes TV series with the title of the tale for easy selection. For example 17-The_Musgrave_Ritual.mp4 is much preferred to 17.mp4.

On Wikipedia, episodes of many TV series are tabulated. You can highlight the contents of the table, and paste into a Libreoffice spreadsheet. This can then be exported as a CSV file for futher processing, e.g. with a Python program to generate a shell script that will rename the files the desired way.

This blog post is to point out that screen scraping will also capture the underlying characters in the tables, including extended characters in UTF-8 encoding. No surprise that this includes the non-breaking space:   or 0xA0 in 8-bit encoding or \uC2A0 in UTF-8 encoding. So when processing the CSV file, this needs to be converted to a space or your shell scripts won't work. Here's an example of the conversion needed.

datestring = row[5].replace(u"\xa0", " ")

This was to generate a touch -d 'date' episode.mp4 command. Touch kept telling me the date format was invalid until I investigated the date string and found a non-breaking space in it.

Tuesday, 28 December 2021

Text formatting on a mainframe

Today I recalled that as a student in the 70s I discovered that the line printer attached to the Univac 1108 mainframe at my university's computer centre could print lower case characters. So I modified the Ratfor workalike of the Unix roff program that I had entered into the computer to treat all alphabetic characters as lower case, and implemented escape codes to raise (and lower later) the case for capitalised words and acronyms. This was to be able to use punch cards as input, as the terminals were not always available. I don't remember if I implemented auto-detection of beginning of sentences, probably not. I even typeset my undergraduate thesis this way.

At that time I was using Ratfor as a structured Fortran, having been introduced to it and the book that described it, Software Tools, at a work experience stint, and had not yet encountered Unix in person. It was only when I did my masters that I learnt Unix and C.

I have wondered if the computer operators were surprised by the appearence of lower case printout since everybody else seemed to accept that UPPER CASE was the only case available.

Monday, 4 October 2021

Have you tried turning it (VirtualBox) off and on again?

I had a strange problem where USB devices were detected on the host system (Linux) but did not appear as available devices for a VirtualBox guest (XP). The strange thing was that I could use the devices normally on Linux, but they just didn't appear to the guests, not just XP but also a Debian guest.

I searched high and low for reports of the problem but nobody else seemed to have had it. And it was working before, although I was not sure if there was an intervening Linux kernel upgrade.

Finally in desperation I decided to reboot the system. Lo and behold, after that the devices became available to the guest. My best guess it that somehow the part of VirtualBox handling the USB forwarding got wedged. So give the old The IT Crowd advice a try.

Saturday, 27 March 2021

My first and only postcardware

Back in the 90s I wrote a couple of DOS programs that turned a PC, even an ancient model like the XT, into a diskless printer spooler, using the LPR or JetDirect protocols over Ethernet.  It was my attempt to repurpose superannuated PCs for useful work. The PC could even be network booted, thus not requiring even a floppy drive.

Anyway, I released the program as open source postcardware, which is a variant of shareware, where the user is encouraged to send a postcard to the author if they like the program. I got a few postcards, mostly from Europe, in the years that followed. Here are the 4 I found in my old letters. I think I have another from New Caledonia but I haven't found it yet.

Since then I have released open source software under more conventional but less interesting licenses.

A Bonn scene

Same author, also Bonn

Heidelberg in winter

Lisbon, but I think the author was Dutch

Tuesday, 8 December 2020

Get gitlab browsing statistics for your project

Unlike github, gitlab doesn't provide a web page to see the browsing activity on your project. But there is an API to fetch this information in JSON format. I explain how it's done so you can avoid the pitfalls I encountered.

In the documentation, all the API calls are with respect to a base URL, which is currently:

https://gitlab.com/api/v4

This StackOverflow post confirms this base URL. It also shows how the access token is passed to the GET request, as an extra header.

Getting a personal access token is described here with screenshots. You can set an expiry date for the token. Make sure you save a copy of it as you cannot see it again once generated.

Now if you look at the API documentation, the relative URL to get the statistics for the last 30 days is described here and is:

/projects/:id/statistics

This is where I got fooled. The : is not to be entered literally. Instead substitute the numeric ID of your project.

Putting it all together, here is the curl command to get the statistics for a fictional project ID of 1234.

curl -H 'PRIVATE-TOKEN: yourtokenhere' https://gitlab.com/api/v4/projects/1234/statistics

And here is typical output:

{"fetches":{"total":3,"days":[{"count":3,"date":"2020-11-23"}]}}

There is no newline at the end, JSON doesn't need it. But you'll be using some other utility to parse and display the statistics in a user-friendly form.

Now you can use this information to get all the other information exposed by the API. Some relative URLs don't require authentication if the project is public.