Manual Down/Uploading: The Last Barrier to Open Data Integration


A lot of open data that our concierge service is asked for has to be manually downloaded in CSV format. Our client companies then have to read these CSVs again into their databases, analyze the information, and integrate the data into their products.

This process has to be repeated each time the original datasets are updated, to ensure the models and products are making use of the most up-to-date information. As expected, this leads to long delays when trying to commercialize open data; the process is costly, labour intensive, and difficult. Thankfully, there is now a way to automate loading open datasets into your systems.

The Solution:


On June 4, 2018, after a five-month testing period, Statistics Canada released its Web Service API. Initial public responses were positive. Over the fall, work continued, and after conducting user consultations, they’ve also opted to provide API access to census profile data, and plan to extend the API with more functionality in the future.

Through this new web service, developers now have code-level access to large datasets of high quality which they can build and integrate right into their products, without any manual downloading or uploading. In contrast to the past, a much larger amount of information can be consumed in an automated method, over a shorter period of time when developing models, visualizations and other data-driven products. This lowers effective cost of development and drives open data commercialization.

Getting Started With Statistics Canada’s Web Service API:
So how do you use this API? We’ve attached some simple sample code to get you started. To access any of the Statistics Canada tables in full, you can use any language that supports HTTP method requests with the following protocol:

Result:


Below you can find the R code for such a request:

----------------------

#load the library to send GET requests

require(httr)

#Send the GET request. The API creates a link through which you can download your data.

#Note that the 14100287 is your table code. You can change this to download what you want.

#You can find IDs for products by coordinating them with CANSIM numbers you come across on the StatsCan website.

#A full mapping from CANSIM numbers to IDs is available here: https://www.statcan.gc.ca/eng/developers/concordance

#This specific request downloads the entire table. The data is large and takes time to read into memory

Files<-GET("https://www150.statcan.gc.ca/t1/wds/rest/getFullTableDownloadCSV/14100287/en")

#Check the status of the request to ensure it’s worked successfully

http_status(Files)

#Download the file from the link that the Statistics Canada API has generated (this is content(Files)$object).

download.file(content(Files)$object, paste(getwd(),"/","download", sep=""), method="auto", quiet = FALSE)

#Unzip the file into your directory

unzippedist<-unzip("download.zip", list=TRUE)

#Check the names of the files you just unzipped (some zips contain multiple files, metadata csv files, etc.)

unzippedist$Name

#Read the relevant CSV files

read.csv(unzippedist$Name[1])

----------------------

There are many other requests that you can use and they return data in various ways. You can check here for additional instructions: https://www.statcan.gc.ca/eng/developers


There are many other requests that you can use and they return data in various ways. You can check the following site for instructions here.