Saturday.
Aug 10

14:45 Time for a Negroni and a Ichnusa? Not quite, but it is time to check in with some Half Baked Ideas while at the beach

LocalJournal / CTG

These two projects are being mentioned together as there are several overlapping / conflicting core components. The main focus of LocalJournal is to publish local info: local as in 'per-village' – the sort of stuff that would appear in village/parish magazines. The 'info' itself is predominantly made up of news and events, and events almost always include venue data.

The Cambridge Travel Guide is meant to be all about pubs and restaurants – a resource for tracking places you've visited and sharing photos and opinions with friends. As with LocalJournal, there's a key 'venues' component.

Open Pubs and scraped restaurant data.

When looking for a postcode dataset, I found Open Pubs containing about 51K listings of pubs including postcode and geolocation. I then also found a larger dataset of over 1M pubs and restaurants across 20 or so European countries, with similar location data. I thought about matching pubs to villages as part of the Local Journal project, but the plan now is to merge the two pubs and restaurants datasets into one which I'll call 'pubs and restaurants' and publish on the CTG domain.

Merging 1M+ records
This could take a while

The Open Pubs data has C50K records, and the other dataset over 1M, so I adapted the pubs schema to represent how I want the final structure. Now I just need some 'outer join' magic to import the bigger set ignoring items where the name matches. Non-simple queries on large tables using text fields and string matching can be demanding and slow, so yesterday I changed the column type of the venue name in the larger dataset from 'text' to 'varchar', assuming I'd the be able to index the name and speed up any outer join queries. The alteration of the column type caused everything to timeout except for the database process itself, which I kept an eye on for over an hour. 1M records, on a laptop on a beach, it could have been worse. I just hope I haven't got sand in the keyboard. Time to try some outer joining...