Canadian station data rescue

Created by gilbert.p.comp… on - Updated on 07/18/2016 10:13

This page is for discussing sources of Canadian station data and planning for future data recovery.

 

A list of stations having data before 1930 in Canada or its vicinity that are expected to be in ISPD version 4 is here.

A list of stations digitized by Environment Canada, Climate Research Research Division is here. Not all of these stations are included in the ISPDv4.

apologies for being a bit late in contributing... I would have been tempted to agree with Vicky on the overhead of correcting poor OCR recognition of tabular data. We made several attempts using ABBY Fineprinter, considered to be one of the best OCR engines, but gave up because of the same issue. However, I've discovered Bytescout which has a module specifically designed to manage tabular data (in .pdf form) conversion to .xls. We ran it across some images of Antarctic data and got manageable results. As it turns out, the Antarctic data has already been digitised and included in the ISPD, so we don't have extended experience with Bytescout.

Bytescout is pretty pricey stuff but they make the pdf to xls module available for free at:
http://bytescout.com/?q=/download/download_freeware.html, and look for PDF Viewer./

If your images are not in pdf, most image viewers will save a jpg (or whatever) as pdf. We use freeware Irfanview for all our image work
and it's never let us down with its exteded capabilities.

and BTW Vicky, Bytescout are fellow Canucks (Vancouver)

Mac
Todd Weather Folios Team
Adelaide, South Australia

victoria.slonosky

Sun, 09/07/2014 - 13:12

From what I understood of the presentation on Tuesday at ACRE, we would have to request the archival documents from the data and archives department of the Meteorological Service of Canada (MSC), and they would retrieve the files from the archive at Western Ontario where they're being stored.

victoria.slonosky

Thu, 09/04/2014 - 19:22

A few months ago a short-list was put together of Canadian locations which we think may have long series of observations and/or are in data sparse regions, which I'll update based on Gil’s station list and notes from last week and post up here. The original files can be retrieved from the current archives by the MSC under the agreement of the long-term loan of the papers. However, once they’re retrieved, something (probably digital scanning or photography?) needs to be done with the paper copies as a first step to retrieving the information. Depending on the number of pages to be scanned, a formal request and financing structure may need to be considered.

One question to consider would be whether to get the information from the printed annual books or from the original stations records, or use a combination. I'm inclined to look at the original records as I think they may have more information and fewer [printing, transcription, etc] errors, but the printed tables might be easier for volunteer digitization.

Vicky, Original sources are always best. I think volunteers working on the original sources will be ok - we need them more for the handwritten. One can always make the argument that Optical Character Recognition would work on the scanned tables, even if the success rate isn't great. What is MSC? best wishes, gil

Add new comment

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.