Generating London Locality Data

Using the TfL Santander public bicycle hire scheme as a proxy for active travel presents several problems. There is, for example, uncertainty as to how representative public bicycle hire use might be of more general trends in active travel. Others relate to how best to process bicycle docking station data to produce meaningful measures of activity.

TfL release live data on the status of approximately 800 docking stations around London. Each station belongs to a ‘village’ describing its local neighbourhood (click on station to see others in the same village; double-click to see all stations):

Groupings vary considerably in size and there is some inconsistency in geographic placement and in naming, so to reduce the number of localities, the following changes were made:

Naming Consistency & Abbreviation

Old name New name
Kings Cross King’s Cross
Parsons Green Parson’s Green
Queen Elizabeth Olympic Park Olympic Park
St. James’s St James’s
St. John’s Wood St John’s Wood
St.John’s Wood St John’s Wood
St Lukes St Luke’s
St. Luke’s St Luke’s
St. Paul’s St Paul’s
The Borough Borough
The Regent’s Park Regent’s Park


id Station Old village New locality
5 Sedding Street Sloane Square Knightsbridge
8 Maida Vale Maida Vale St John’s Wood
25 Doric Way Somers Town Euston
27 Bouverie Street Temple Holborn
47 Warwick Avenue Station Maida Vale St John’s Wood
79 Arundel Street Temple Strand
80 Webber Street Southwark Elephant & Castle
104 Crosswall Tower Aldgate
107 Finsbury Leisure Centre St Luke’s Finsbury
116 Little Argyll Street West End Soho
120 The Guildhall Guildhall Bank
124 Eaton Square Belgravia Knightsbridge
127 Wood Street Guildhall Bank
130 Tower Gardens Tower Aldgate
140 Finsbury Square Moorgate Liverpool Street
149 Kennington Road Post Office Oval Kennington
160 Waterloo Place St James’s West End
181 Belgrave Square Belgravia Knightsbridge
183 Riverlight North Nine Elms Battersea Park
199 Great Tower Street Monument Bank
203 West Smithfield Rotunda Farringdon Barbican
207 Grosvenor Crescent Belgravia Knightsbridge
215 Moorfields Moorgate Barbican
224 Queensway Kensington Gardens Hyde Park
228 St. James’s Square St James’s West End
247 St. John’s Wood Church Regent’s Park St John’s Wood
255 Clifton Road Maida Vale St John’s Wood
259 Bourne Street Belgravia Pimlico
267 Regency Street Westminster Pimlico
276 Lower Thames Street Monument Bank
298 Curlew Street Shad Thames Bermondsey
302 Putney Pier Wandsworth Putney
307 Black Lion Gate Kensington Gardens Hyde Park
331 Bunhill Row Moorgate Barbican
350 Queen’s Gate Kensington Gardens Hyde Park
393 Snow Hill Farringdon Holborn
404 Palace Gate Kensington Gardens Hyde Park
423 Eaton Square (South) Belgravia Knightsbridge
440 Kennington Oval Oval Kennington
452 St. Katharine’s Way Tower Wapping
459 Gunmakers Lane Old Ford Victoria Park
467 Southern Grove Bow Mile End
487 Canton Street Poplar Limehouse
509 Fore Street Guildhall Bank
521 Driffield Road Old Ford Victoria Park
527 Hansard Mews Holland Park Shepherd’s Bush
559 Abbotsbury Road Holland Park Kensington
566 Westfield Ariel Way White City Avondale
587 Monument Street Monument Bank
601 BBC White City White City Avondale
606 Addison Road Holland Park Ladbroke Grove
611 Princedale Road Holland Park Ladbroke Grove
612 Wandsworth Rd Isley Court Wandsworth Road
631 Battersea Park Road Nine Elms Battersea Park
634 Brook Green South Brook Green Hammersmith
637 Spencer Park Wandsworth Common Clapham
648 Peterborough Road Sands End Parson’s Green
650 St. Mark’s Road North Kensington Portobello
653 Simpson Street Clapham Battersea
654 Ashmole Estate Oval Kennington
673 Hibbert Street Battersea Wandsworth
675 Usk Road Clapham Junction Wandsworth
689 Spanish Road Clapham Junction Wandsworth
704 Mexfield Road East Putney Wandsworth
706 Snowsfields London Bridge Bermondsey
707 Barons Court Station West Kensington Hammersmith
718 Ada Street Hackney Central Haggerston
719 Victoria Park Road Hackney Central Victoria Park
728 Putney Bridge Road East Putney Putney
732 Duke Street Hill London Bridge Borough
739 Hortensia Road West Brompton West Chelsea
748 Hertford Road De Beauvoir Town Haggerston
742 Blenheim Crescent Ladbroke Grove Portobello
757 Harcourt Terrace West Brompton Earl’s Court
768 Clapham Common North Side Clapham Common Clapham
769 Sandilands Road Walham Green Sands End
773 Tallis Street Temple St Paul’s
775 Little Brook Green Brook Green Hammersmith
783 Monier Road Hackney Wick Olympic Park
790 Stratford Station Olympic Park Olympic Park
793 Cromer Street Bloomsbury Clerkenwell
797 Ossulston Street Somers Town Euston
807 Bevington Road West North Kensington Portobello
817 Riverlight South Nine Elms Battersea Park
838 Fore Street Avenue Guildhall Bank

This reduces the number of localities to 84:

The table of modified station localities is stored in an SQLite database:

CREATE TABLE stations(id INTEGER NOT NULL, name TEXT, village TEXT, lon REAL, lat REAL);
.mode csv
.import tflBicycleStationsWithLocalities.csv stations

From these we can create a set of locality centroids that we use as the basis of activity mapping, where each locality is associated with the bicycle activity of all the docking stations within it (noting the names station_id and station_name are used for consistency with other city data even though each is a collection of docking stations):

SELECT village AS station_id, village AS station_name, AVG(lat) AS lat, AVG(lon) AS lon
FROM stations
GROUP BY village;

Generating Locality Activity Values with SQLite

Bicycle activity is generated from the TfL feed that indicates the status of each docking station. We log this feed at 10 minute intervals storing the data in an SQL table:

  availableBikes INTEGER,
  availableDocks INTEGER,
  installed INTEGER,
  locked INTEGER,
  temporary INTEGER,
  t TEXT);

To update with more recent data, ensure the most recent data are at the end of a day and then update from the following day onwards. For example

DELETE FROM usage WHERE t > '2020-05-05';

Download the current month from remote database, and add to usage:

.mode csv
.import myUpdatedData.csv usage

Join locality to usage

Because docking stations are referenced only by their id, we need to join the locality to each reading. We also create a new day_of_year column for temporal aggregation by truncating the timestamp of each reading:

  SELECT stationId, village, availableBikes, t, substr(t,0,11) AS day_of_year
  FROM usage
  LEFT JOIN stations ON stationId = id;

Calculate activity index

For each station: accumulate the number of times number of docked bikes changes in consecutive readings on a given day. Can use the SQLite window operations and then sum with an aggregate function.

Additionally we group the results by village and day of year and create an id that combines village name with day of week.

Note also that to be consistent with other datasets, we rename ‘village’ as ‘station’ even though it is an aggregation of several stations in a locality.

CREATE TABLE station_daily_time_series AS
    day_of_year AS date,
    village AS station,
    village||'_'||strftime('%w',day_of_year) AS id,
    SUM(activity) AS count
    ( SELECT
        abs(first_value(availableBikes) OVER win - last_value(availableBikes) OVER win) AS activity
      FROM usage2 WINDOW win AS
  GROUP BY village, day_of_year
  ORDER BY day_of_year,village;

Calculate benchmark data

To calculate expected activity for each locality we take the mean activity for the period Monday 6th January to Sunday 1st March grouped by day of week:

CREATE TABLE station_reference_bicycle AS
  SELECT id,AVG(count) AS value
  FROM station_daily_time_series
  WHERE date > '2020-01-06' AND date < '2020-03-02'
  GROUP BY id;

And then export:

.headers on
.mode csv
.output StationReference-Bicycle.csv
SELECT id,count FROM station_reference_bicycle ORDER BY id;

Export station daily time series

.headers on
.mode csv
.output StationDailyTimeSeries-Bicycle.csv
SELECT date,station,id,count FROM station_daily_time_series ORDER BY date,id;

Hourly Usage

We can perform a similar process of comparing temporally adjacent docking counts for deriving hourly usage, except this time don’t need to export an id to compare with reference data:

  SELECT stationId, village, availableBikes, t, substr(t,0,14) AS hour_of_year
  FROM usage
  LEFT JOIN stations ON stationId = id;
CREATE TABLE hourly_time_series AS
    hour_of_year AS date,
    village AS station,
    SUM (activity) AS count
    ( SELECT
        abs(first_value(availableBikes) OVER win - last_value(availableBikes) OVER win) AS activity
      FROM usageHourly WINDOW win AS
  GROUP BY village, hour_of_year
  ORDER BY hour_of_year,village;

To derive the average hourly activity across the whole scheme:

SELECT date, AVG(count) AS count
FROM hourly_time_series
GROUP BY date;


Differences from expectation

We can now use the generated files in the visualization specification just as we would for more direct sensor measurements.

Geographic Patterns

Drag slider to see change over time.