Back in 2002 I wanted to time-shift radio programs for later listening and also to listen in the car. Since a computer can automatically record at the appropriate times, this was the obvious choice of method. All I had to do was to connect up the sound card to the tuner, set up Linux cron jobs and I would have WAV files which I could then convert to MP3. After a few months it occurred to me that I should also save details of what was broadcast so that I could later find out details of the program, the songs and the artists. So after the recording I would save a copy of the web page of the program details. So began a practice that has continued to the present day. I even recorded programs during vacations as my Linux workhorse is on 24/7 so I only needed to keep the tuner on.
Some comments about the technology I used that has evolved over the years. In the early days I played some of the files at more convenient times. I also copied others onto cassette tape to play in the car. When car cassette players dwindled I switched to copying the programs onto a personal MP3 player connected to the AUX input of the car CD player. At the time car MP3 CD players were rare and audio CD R/Ws too slow to burn. Eventually I got a car MP3 CD player and I started making MP3 data CDs to listen in the car. But that is also going the way of the dodo and I'm down to my last CD R/W discs. I'll probably switch to one of: 1. a flash drive containing the files (the player has a USB slot), 2. a personal player connected to the AUX input, 3. a smartphone beaming the audio via Bluetooth to the player. The first method would seem to be a direct replacment, but for some reason with flash drives it takes much longer for the player to find the restart point after starting the car.
The sound source also changed over the years. In the beginning I used a hifi tuner. Then FM radio cards became available and I installed one in my PC. The signal path was still analog though, a jumper cable between the audio out of the tuner to the audio in of the sound card. Finally when the programs were moved off the AM and FM bands to DAB+ I bought DAB+ radios to feed the sound card.
A few years ago I changed the compression method from MP3 to AAC for better quality, but I still generate MP3s on the side for consumption in the car.
So my history is a reflection of the progress in technology over the last 2 decades.
When I saved the mp3 files the naming scheme I used was: program-YYYYMMDD.mp3. Later on when more than one program could be recorded in a day I appended HH. So it's possible to tell when a program was recorded without looking at the timestamps, which can be altered by copying. But it appears I have been careful to preserve the ctime attribute through generations of workhorses copying from HD to HD.
The HTML files are saved with the htm extension, originally as named on the website, but later I switched to saving them with the date and hour encoded in the filename. Fortunately there is metadata in the HTML that records the date. But not in all cases.
The desired output is an index.html file containing a table with one row for each program, with links to the playlist HTML file and the audio MP3 file(s), the date and a description of the program. Note that we are not doing a high volume of data processing since we only parse the HTML files and only handle the MP3 files by name, so run time isn't an issue. Here is a example of desired output:
We need two main data structures. First, a dictionary of MP3 files, keyed by date. Note that a date could have more than one MP3 file, as some programs had two segments broadcast at different times on the day. so the mp3files attribute is actually a list. Second as each HTML file is parsed, create an associative array of the attributes (title, HTML file link, description). The keys of this associative array are fixed, i.e. "title", "htmfile", "description"....Read more »