historical weather data

The Ultimate Guide to Historical Weather Data

Want to know what the weather conditions were like on October 11th last year? 

You’re looking for a data set that includes the temperature, wind speed, wind direction, peak gust, humidity, precipitation, and similar parameters for that specific day, as well as on that same date each of the previous 15 years.

Then, assume the desired locations for this query are in the heart of downtown Boston and throughout urban and suburban reaches of Tokyo, Singapore, Mumbai, and Houston. 

These types of requests are common from various industry leaders seeking to protect assets and manage commercial operations around the world:

  • Global commercial air traffic delays can be correlated to the onset, duration, and severity of winter weather events for more efficient diversion in the future; construction site safety hazards can be discovered and effectively avoided by taking historical weather into account
  • Energy infrastructure faults can be avoided with better qualification of historical power demand statistics based on weather
  • Outdoor events coordinators and logistics specialists can use previous weather data to determine staffing needs weeks or even months ahead.

And of course, emerging modern technologies can benefit by compiling mass amounts of data to train machine learning models with the best weather data available. 

Where can you access these kinds of weather data, with such degrees of specificity and in a relatively little amount of time?

The short answer is that you have to sift through the historical weather data archives to suit the needs of a given use case. Many different data sources are available, from ground- and space-based observing platforms as well as from numerical weather model systems called “reanalyses”.

Since sourcing, collecting, and gaining access to both the raw and extensive collections of gridded data can be a daunting process without the requisite know-how, we are providing a complete guide for all things historical weather to make things simple and eliminate the guesswork. Here’s what you need to know to get access to the best possible historical weather data around the globe.

The Brief History of Archived Weather Data

In order to understand the enormous value of a historical weather archive, it is first necessary to understand the history of weather data archiving. So let’s go back in time.

Though weather has been happening for millennia on Earth, scientists only became interested in creating records within the last two hundred years or so. Today, we collect weather data with:

  • About 800 weather balloons launched every 12 hours worldwide
  • Thousands of commercial planes and ships are in transit gathering weather data
  • Automated surface weather observation stations on the field at airports and atop of ocean buoys
  • Millions of satellite pixels and radar scans are retrieving multiple weather parameters every few minutes, or even every few seconds

The vast amount of weather data sources begs the question: how do we effectively blend inputs arising from so many different and unique kinds of instruments? How do we deal with such voluminous feeds of incoming weather data? Where does all that detailed atmospheric information ultimately end up?

At the very beginning, painstaking efforts were taken to man each weather observing post and generate records–by hand in many cases. The accepted norm was to house these tangible records within the walls of National Weather Bureau facilities. The sheer number of hours was staggering and the data payout was understood to be non-commensurate. However, with the advent of radio, remote cellular communications, and eventually the internet, came a rapid improvement in meteorological observation and storage. The concept of maintaining virtual archives was born. 

The Current State of Historical Weather Data

Nowadays, you can navigate directly to a government web server like the United States’s National Centers for Environmental Information, where the majority of public weather data has been funneled to since the beginning of dedicated weather record keeping. But, you many only be able to access the most recent several years of a chosen weather dataset. After all, the federal budget for critical observations occasionally lapses and the weather itself sometimes intervenes to knock out meteorological instrument systems. In some cases, a comprehensive selection of weather data for the period of record is only possible only by special order. It can take weeks as information is pulled from static memory offline and combined before delivery in your mailbox. There is then the line of numerous other users (with similar requests) to contend with. It can take quite a while.

But meteorologists and computer scientists have been working together for some time to develop ways to collect, aggregate, assimilate, impute, and deliver research-quality weather data to the end-user. This is especially true in the private weather sector. The most forward-thinking companies continue to refine sophisticated atmospheric data algorithms that ingest the myriad of weather inputs, combine them with a background “best guess” climatology of some kind to produce a smoothed, continuous representation of the multi-dimensional atmosphere (that is, parsing the atmosphere in the horizontal, in the vertical, as well as in time).

We refer to this special technique as “reanalysis” and only the most powerful computing systems are capable of carrying it out routinely over the entire globe. And occasionally, the reanalysis is refreshed–as novel, faster, more physically consistent numerical model systems for weather analysis are developed. Aside, by feeding the entire history of weather observations through each new revision of weather analysis frameworks, marked fluctuations in the data likely represent real patterns in nature in lieu of artifacts that arise from changing analysis approaches. 

The end result of these strategic technical collaborations is the instantiation of vast data repositories, which are continuously populated with ever-increasing stores of atmospheric observations. 

Better Historical Weather Data 

As mentioned before, weather information, be it from observations or reanalysis models, consists of many parameters, each possibly with its own physical sampling approach, unit conventions for reporting, and time(s) of delivery. And to reiterate, it is common that observations are ingested by a weather reanalysis model to produce regular output before archiving the data itself. Table 1 below provides examples of what types of information would be available from a basic weather store as part of a standard weather data archive.

Table 1. Example parameters, archive codes, and their descriptions from a basic weather data store.

Parameter Code Description
Eastward Wind at 10 m U10 Wind component in the east-west direction near the surface
Northward Wind at 10 m V10 Wind component in the north-south direction near the surface
Instantaneous Wind Gust at 10 m FG10 Momentary wind gust speed near the surface
Temperature at 2 m T2m The temperature of the air near the surface
Relative Humidity RH The fraction of water vapor that the air can hold at a given temperature
Surface Pressure Psfc Pressure at the local elevation
Mean Sea Level Pressure Pmsl Pressure at the approximate sea level if the local elevation is greater than 0 meters
Accumulated Precipitation Ptot The total precipitation that falls within a given time interval

Sometimes though, reanalysis models, which bring together sources of surface, upper-air, satellite, and radar observations and provide the “core” meteorological variables for regular output will provide just a little bit extra too. What we mean is that the weather reanalysis model produces a whole assortment of other diagnostics that can be more thoroughly interrogated. To the trained atmospheric scientist who is well versed in analyzing multiple weather parameters at once, these extraneous model outputs constitute a gold mine of potential insights. Thus, value-added metrics can aid meteorologists in contributing critical decision-making criteria for a client.

For example, the most likely height of the base of the clouds can be gleaned from near-surface temperature and moisture. Cloud base height can also communicate information about low-level visibility. Vertical wind shear over different altitude intervals can be determined using winds, pressure, and temperature characteristics across multiple levels of the atmosphere, which is crucial in hindsight for airlines looking to schedule service at a given airport throughout the different months of the year. Pair temperature output with the amount of liquid precipitation and you can understand the depth of frozen precipitation–which can guide snow removal operations or inform future orders of road treatment products. These extras are “derived” quantities that come about from proprietary post-processing techniques and selected examples are briefly given in Table 2 below.

Table 2. Example parameters, archive codes, and their descriptions from a derived weather data store.

Parameter Code Description
Cloud-top Height CTH The maximum elevation of the top of storm clouds
High Cloud Fraction HCAP The percentage of area covered by cloud at high altitude
Low Cloud Fraction LCAP The percentage of area covered by cloud at low altitude
Average Vertical Velocity W The approximate speed of upward and downward motions
Precipitation Type Ptype The indicator for determining the type of precipitation (Rain, Ice Pellets, Snow)
Snow Accumulation Snow The total frozen precipitation that falls in a given time interval
Downwelling Solar Flux DSF The amount of solar radiation reaching the surface

To be sure, weather data in its purest forms is messy and almost always requires that some form of quality control is carried out by the analyst, short of running a full-blown weather reanalysis. There are very important considerations at this stage, for example, prior to interpreting any weather data statistics or drawing conclusions thereafter. 

For one, the units of a weather parameter matter considerably! Often times the native or “raw” data will come in one form, such as in meters per second for wind speed (which is a common form for scientific research applications), whereas the rest of us expect wind speeds in miles per hour. The rub is that we’re facing a difference in magnitude by a factor of roughly two when converting between these units for wind speed. That delta could potentially be the difference between hoisting the large crane for hauling materials at a construction site, or not–if the units are not correctly accounted for! Next, the data that comes out of the archive characterizing a given position could represent the combination of observations from adjacent locations because there were none available at the exact reference point originally; then the archive presents a result via weighting available observations in space, in time, or both. High-end archive products will qualify these data with associated confidence metrics to say, “there is a 95% chance that the true data value falls between X and Y.” Lastly, it is common practice that each data product or family of products will be stamped with some identifier that corresponds to a set of tags and attributes (that is, “metadata”). So it is best to question whether the 1’s in the data are truly 1’s and not 1.0 times 103 kg m-3 – any required data conversions can readily be determined by unloading any and all auxiliary information from the metadata accordingly.

To this point, we have covered the essential aspects of weather data, but we have not even touched on how downscaling methods can be incorporated into the production of veritable bleeding-edge historical weather archive products. For this, we confront the notion that scientists cannot possibly observe the atmosphere everywhere and at every instant in time. Yet with the increasing demand for weather intelligence at micro-scales in the commercial and industrial sectors (like providing conditions from one farm plot to the next or on opposite faces of a ski resort), Tomorrow.io has pioneered new techniques for increasing the granularity of data using a mix of physical reasoning and proven statistics. In essence, these downscaling techniques are a means to translate coarsely spaced data from global weather models and sparse observations into fine-mesh detail pinpointed every couple of miles over almost the entire surface of Earth. Such data science innovation has prompted Tomorrow.io to fundamentally redesign the paradigm for how clients interact with weather data archives.

Access to Historical Weather Data

OUT with static database granules and IN with scalable, flexible archive data objects in the commercial data cloud. Traditionally, the massive weather collections that are the subject of the discussion thus far have been hosted within a number of separate, digital silos on the internet. In actuality, this might imply that many clients attempt to access a given data subset through a single-host server bastion. Read bottleneck.

The wall clock time adds up quickly when the job technically entails logging on to the data portal, then copying individual files in their native formats (like gridded binary files or network common data format) to a local directory and building software to unpack the data. The next steps in the lengthy process involve applying any necessary transformations to properly display information before porting that data to a downstream application on given network infrastructure.

Whereas historical weather data like Tomorrow.io’s new-normal weather data archiving methodology allows multiple remote clients to reference the same instance of all data existing within its cloud repository at once. As an added bonus, this decreases the amount of system memory overhead and all clients are permitted to draw from the same data objects simultaneously, which turns out to be quite efficient in terms of data transfer. Here is a critical point of differentiation between the traditional and cloud-based data archive paradigms–essentially whether the input-output processes are done serially or in parallel. Think of individual ships entering a harbor by transiting a channel one by one vs. many ships mooring along the shoreline simultaneously. The latter is optimum for getting the maximum number of vessels to a safe, secure harbor by a long shot. The same can be said for parallel cloud data-store access compared to the traditional alternative. New schemes for weather data archiving can effectively amount to a decrease in time-to-delivery by a factor of 10-100 times. In other words, that means getting automatic access to archived weather data in seconds or minutes as opposed to the user manually facilitating data transfer during the course of multiple hours or days (let alone, weeks) by the old way.

With the modern cloud-computing paradigm, historical weather data is easily accessible through the Tomorrow.io Platform. Tomorrow.io’s engineers have embedded an intuitive web interface layer within the company’s platform which enables the user to request historical data parameters for any combination of locations and times from the archive. To assist the user, a full catalog of data stores is provided as a virtual “yellow pages” directory, per se, succinctly summarizing Tomorow.io’s vast trove of weather information over multiple decades. For basic queries, including the core set of weather parameters at a handful of locations from a few selected days, the Tomorrow.io Platform backend rapidly consolidates data chunks from the archive and delivers just-in-time. This is often done in just milliseconds and on output, the data is organized in a list for ease of readability. For more-involved inquiries that specific geographic limits or the need for gridded data partitions over multiple years, for example, the data are intelligently aggregated on the backend at lightning speed. Then these data cubes are made available via several ways to suit the user’s choice–like via weblink or batch download option. All this, done with the ease and speed that is to be expected from top-shelf cloud-engineered solutions. Less time is spent collecting and making sense of the data, yielding more time for taking decisive action based on data insights from the Tomorrow.io Platform.

On that last point, we deeply appreciate that many of our global business partners are tapping into Tomorrow.io’s historical weather data to drive their enterprises forward in a big way. For reference, by the time the clock strikes midnight each day (or 00:00 UTC, whatever the preference for tracking time may be), Tomorrow.io’s historical weather data appends terabytes of new data to its stores. There is no need for clients to update the configuration on a given Tomorrow.io data feed as we pipe that tremendous quantity of new data through production (as well as our regimen of multi-layered quality-control processes). The data is immediately available to make new iterations of machine learning models and power downstream applications; realistically speaking, performance/operations/logistics models are getting smarter overnight by virtue of their uninterrupted connection to Tomorrow.io, the most comprehensive historical weather data archive that is currently available on the market.

Why choose the Tomorrow.io Historical Archive?

From the very early stages, Tomorrow.io has repeatedly recognized the opportunity to better serve its clients by providing access to years, even decades, of weather data. Remarkably, the historical weather database is fine-tuned for every location around the world, and so that means no matter where business assets or key facilities are located, Tomorrow.io has them covered.

Tomorrow.io’s proprietary techniques also combine various sources of historical data, because it is often apparent that raw, global meteorological data are incomplete or inconsistent; data may be manifest at random or uncoordinated times, rarely covering the same ground location twice in the same day, or being subject to outage and observing system malfunction. Tomorrow.io’s meteorological operations teams expertly manage the ingest process for both traditional and non-traditional types of data, before merging these inputs into the stable, hardened archive. Meanwhile, Tomorrow.io’s research scientists remain dedicated to the mission of providing the best technology and software improvements to evolve Tomorrow.io’s data serving capabilities and overall functionality. As Tomorrow.io adds more data from an increasingly diverse assortment of Earth-observing platforms in the future, it will eventually be host to thousands of terabytes of data. Business stakeholders and company partners from all over the world can then leverage these highly accurate meteorological data, ad infinitum.

Tomorrow.io’s historical weather archive has an uptime of 99.9% and it is hosted by only the most trusted enterprise cloud service providers. The Tomorrow.io subscription plans are customizable to accommodate multiple echelons of user requirements. Because let’s face it, sometimes there might not be an imminent need for weather input data every couple of miles over the entire globe, from thirty years ago until now (but we can still quickly deliver on that type of request!). Subscription tiers are as follows:

  • Tier-3 affords the client to access most weather parameters from single stations and for selected years
  • Tier-2 access unlocks archive data stores featuring basic gridded meteorological data with resolutions as high as approximately 30 km globally, every hour.
  • Tier-1 subscribers gain unlimited access Tomorrow.io’s flagship, proprietary observation and modeling archive data, which includes hyperlocal regional and 5-km global data with timesteps as short as 15 minutes. 

The beauty of Tomorrow.io’s cloud-based historical weather archive is that users can opt to take as much or as little data as needed–all on demand.

Get access to Tomorrow.io’s historical weather data archive now.

More from Tomorrow.io