NOAA/NWS/NCEP/Environmental Modeling Center
has a request for data out, one which gives anyone near water who can read a thermometer a chance to do some science. There's a science history behind why this is a request and I'll give my own, biased, view of that.
All data have errors and are messy. Though George Box's comment is often repeated at modelers ("All models are wrong, some models are useful.") it is equally applicable to data and data analysis. All data are wrong, some data are useful.
In the case of sea surface temperatures (sst), efforts to analyze global ocean sst started with badly distributed data sources -- ships. They give you a fair idea of what the temperature is along the paths the ships take. So the ship route between New York and London is pretty well-observed by ship and has been for a long time. But not many ships go through the south Pacific towards Antarctica. If you want to know what's happening down there, you need a different data source. One such is buoys. Though, again, buoys are distributed in a biased way, being mostly towards shore (so that they can be maintained and repaired).
Then came satellites and all was good, eventually, for a while. Polar orbiting satellites see the entire globe. Starting with instruments launched in the early 1980s, it has been possible to make pretty good analyses of global sst, at least on grid cells 50-200 km on a side. Since that is as good or better than any of the ship+buoy analyses could do, that was a great triumph. The ship and buoy data, though, remained and remains important. One of the problems that satellite information faces is that the instruments can 'drift', that is, read progressively too warm, or too cold. To counter that possibility and other issues, the surface data (in situ data) is used as a reference. So for a time in, say, the early 2000s, all was good.
But both scientists and users of scientific information are never satisfied for long. For sst, some of the users are fishermen -- some fish have very particular temperature preferences. As it became possible to do a pretty good global 50 km analysis, with new data over about 2/3rds of the ocean every day, scientists and users started demanding more frequent updates of information, and on a finer grid. They also got increasingly annoyed about the parts of the ocean that only got new observations every 5-20 days. This includes areas like the Gulf Stream, where it is often cloudy for extended periods. The traditional satellites are great, but don't see through cloud.
Another major user of sst information is numerical weather prediction. When weather models were using cells 80-200 km on a side, the sst at 100 km (say) was a pretty good match. But weather models continued to push to higher resolution, so that by the early 2000s, 10 km grids weren't unheard of. The reason for such small grid spacing in weather prediction models is that weather 'cares' about events at very small scale. If weather cares about those smaller scales, then it become important to provide information about sst at the smaller scales. An inadvertent proof of that was made when a model made a bad forecast for a December 2000 storm, and the cause was traced back to an sst analysis that was too coarse. See Thiebaux and others, 2003 for the full analysis.
Plus, of course, there is interesting oceanography that requires much finer scale observations than 100 km. So a couple of different efforts developed. One was to use microwave data to derive sea surface temperatures. AMSR-E was the first microwave instrument used for sst in operations, as far as I know. (Sea ice isn't the only thing that you can see with microwaves!) That addressed the issue of seeing the Gulf Stream (and other cloudy areas) most days. The other was to start pushing for higher resolution sst analysis. This lead to an international effort to analyze the global ocean at high (say 25 km and finer, sometimes 10 km and finer) grid spacing. More is involved in that than just changing a parameter in the program. (You'll get an answer if you do that, but it won't be as good as you had at the coarser grid spacing).
On the ocean side, this quality of the high resolution analyses is doing relatively well. But as you go to finer grid spacings, new matters appear. The Great Lakes are very large, so they can be seen by satellite easily, and they have buoy data through at least part of the year, so that the satellite observations can be corrected at need. But ... go to a finer grid spacing weather model and you discover that there are a lot of lakes smaller than the Great Lakes. For a 4 km model, there are some thousands of lakes just in North America. And none of them have buoys, and almost none even have climatologies. Also at this grid spacing, you start seeing the wider parts of rivers.
Here's where an opportunity arises for people who live near a shore (whether river, lake, or ocean). NOAA/NWS/NCEP/Environmental Modeling Center is requesting observations of water surface temperatures to use as a check on their analysis of temperatures in areas close to shore. (close equals, say, up to 50 km (30 miles), and at least 400m (quarter mile) from shore). Check out the project's web page at Near Shore Lake Project
As always, I don't speak for my employer or any groups I might be a member of. I'm pretty certain that all people who work on sst would disagree with at least parts of my above mini-history. Be that as it may, it should be a fun project.
Showing posts with label sst. Show all posts
Showing posts with label sst. Show all posts
07 February 2013
28 September 2010
Does Lake Superior Remember the Last Ice Age?
I'm more than a little surprised by this post by Steven Goddard. His answer to my title question is yes. That he's wrong isn't very interesting. We all make mistakes, and particularly so when speaking outside areas that we've studied. The two main physical processes which show his error are interesting in their own right, and I'll take this chance to discuss them -- they are rivers (which say 200 years should be noticeable), and what happens to fresh water at 4 C (which says the memory is 1 year [oops, 6 months]).
First, I'll take a look at a less interesting error that minimal self-checking would have pointed to a difficulty. But that introduces a useful tool -- the 'sanity check'. Namely, he suggests that the reason Lake Superior is still cold is because it's so large that it is still adjusting to the end of the last ice age. That's about 10,000 years ago. Ok, suppose this line of reasoning is true. While Superior is large, is it tiny compared to the oceans. If Superior takes 10,000+ years to adjust, something 10 times bigger should take 100,000+ years to adjust. The ocean is about 100,000 times larger (in volume) than Lake Superior.
First, I'll take a look at a less interesting error that minimal self-checking would have pointed to a difficulty. But that introduces a useful tool -- the 'sanity check'. Namely, he suggests that the reason Lake Superior is still cold is because it's so large that it is still adjusting to the end of the last ice age. That's about 10,000 years ago. Ok, suppose this line of reasoning is true. While Superior is large, is it tiny compared to the oceans. If Superior takes 10,000+ years to adjust, something 10 times bigger should take 100,000+ years to adjust. The ocean is about 100,000 times larger (in volume) than Lake Superior.
16 November 2009
Where is the surface?
I just commented on my facebook status that I'm at a meeting about sea surface temperature. That part was safe. Rest of the comment was to observe that I'm now back to wondering whether the sea has a surface, where it is if it does, and if it does, whether it has a temperature. That prompted a friend to comment 'Great ... this is going to bug me now.' So for him, here's a longer version.
This sort of question is very common to science. Of course my musing for facebook is overstated. But there is usually a real question about what exactly it is you've observed when you take an observation. When you have very different observing methods, they may well observe things that are different from each other. There are, let's say 4, different ways of observing the sea surface's temperature. For a diagram, see the wikipedia article on sea surface temperature
The standard method, and reference for others, is calibrated buoys that carry a thermometer at a known depth, typically 1 meter. A major drawback to this method (all methods of observing have drawbacks!) is that you need a buoy. They're not cheap, and it would take several million of them to give us a high resolution data set for global sea surface temperature (acronymed SST).
This sort of question is very common to science. Of course my musing for facebook is overstated. But there is usually a real question about what exactly it is you've observed when you take an observation. When you have very different observing methods, they may well observe things that are different from each other. There are, let's say 4, different ways of observing the sea surface's temperature. For a diagram, see the wikipedia article on sea surface temperature
The standard method, and reference for others, is calibrated buoys that carry a thermometer at a known depth, typically 1 meter. A major drawback to this method (all methods of observing have drawbacks!) is that you need a buoy. They're not cheap, and it would take several million of them to give us a high resolution data set for global sea surface temperature (acronymed SST).
03 April 2009
How much detail is there really?
I'm thinking about sea surface temperature (SST) these days, but the approach here is one that can be applied to many situations, even ones outside weather and climate. A common, important, and not always easy, questions is -- just how much detail do you need? The more detail, the more expensive it is to make a good product, whether that's an analysis of sea surface temperature, a climate model, or a surface in a video game. Of course, what I'd like is the sea surface temperature every few meters over the entire globe. If that's more than necessary at some time, I could average it down. But ... it would take an awful lot of storage to save temperatures every few meters (my back yard, my neighbor's, my front yard, ...) over the whole globe.
Let's start by looking at an actual high resolution global product, though not every few meters! The SST analysis at http://polar.ncep.noaa.gov/sst/ gives a value every 1/12th of a degree in latitude and longitude, one about every 9 km (6 miles). It has about 9 million values. Let's also suppose that this is fine enough resolution that everything important is represented.
The worst resolution is to use 1 number for the entire globe, the average for all ocean points. To measure how bad this is, I'm going to compute the root mean square error. (Those who know what this is can skip to the next paragraph.) It is often abbreviated rmse. To find it, we go through every ocean point in the grid and find the difference between the value there and the average. Then we multiply this difference by itself (square it -- this avoids the marksmen statistician story*). Then add up these squares for every ocean point. This is a big and not interesting number. One thing that would be more interesting is the average value of the squared error -- the mean square error. So we divide by the number of points that were involved. This also tells us the error variance. Since we think more in terms of temperature and temperature changes than squares of temperature changes, we take the square root of the mean square error -- get the rmse. This is a figure which represents a typical magnitude of how far off we expect to be. We could be either warmer or colder by this much, but this is the magnitude.
* Two statisticians went to a shooting range and each fired at the target. The first missed by 1 meter to the left (-1 meter). The second missed by 1 meter to the right (+1 meter). They then congratulated each other on their fine marksmanship because on average they had hit the bullseye. Their average error was indeed zero. But their rms error was 1 meter.
When I compute the RMSE for using global mean temperature instead of the full resolution grid, I find 12 C. That's ... enormous. The difference between water at 20 C (68 F) and 32 C (90 F) is pretty large! So, clearly, we can't be satisfied with an RMSE of 12 C. But now we have a method for looking at the resolution we need, and a notion of how bad you can get.
Then I made my program average over smaller boxes than the whole globe, say 90 degrees on a side -- London to Chicago, equator to pole -- and found the RMSE comparing those box averages to the original temperatures in the full resolution grid. No surprise that boxes that large were pretty bad. But ... once I got down to boxes 2 degrees on a side (which is something like 200 km, or 120 miles), the RMSE was down to 0.5 degrees.
This is still definitely not zero, but it isn't bad. When a typical satellite used for the job -- such as the AVHRR instrument on NOAA-18 -- is used to make an observation, it has an RMSE (compared to a buoy's thermometer at about the same location at about the same time) of about 0.5 degrees. In other words, with boxes 2 degrees on a side, the average represents what is happening in sea surface temperature about as well as getting a single observation from satellite. We've also managed to reduce our RMS error by about 95% as compared to using only a single number. On the other hand, even though we've captured 95% of what's going on, we only need to use 16,200 numbers -- instead of the 9,331,200 we started with. 95% of the information of the full grid, with only 0.2% as much data.
We've caught 90% of what is happening (reduced the rmse by 90%) when the boxes are 6 degrees on a side (600 km, 360 miles). And it's 99% once we're down to boxes only 0.5 degrees (50 km, 30 miles) on a side (which means only about 3% as many data points are needed to represent the full data set to 99% accuracy).
Now, let's translate this back to some situations we might care about. In trying to construct climatologies of sea surface temperature, we run in to the problem that as we go back in time, there are fewer and fewer data points. On the other hand, if we have 1 observation in each box 6 degrees on a side, we've managed to capture 90% of what is happening in the sea surface temperature. In other words, a much sparser data set than we might imagine could indeed represent an awful lot of what is happening in the ocean. A global grid at 6 degrees resolution has only 1800 points, so we need only 1800 observations to fill it in our simple-minded way.
At 2 degrees resolution, we've captured 95% of what happens in sea surface temperature (at least to this quick little glance -- I only looked at 1 day, as analyzed by 1 center, etc.) So, if we had a good global ocean model at 2 degree resolution, we'd actually be pretty far along in being able to predict sea surface temperatures (model climate, etc.) well. In practice, there are processes that happen in smaller areas than the 2 degree box which can change the whole box's average and we, therefore, want finer resolution than 2 degrees. More about that in a different post.
In thinking about observing systems, if we only 'need' 1 observation every 200 km or so, and we have satellites that can take an observation every 4 km (like the one above) we're all done, right? Unfortunately, no. The problem is, that satellite looks for clouds. If there are clouds -- and cloudy areas can easily stretch for 1000 km -- the satellite can't see the sea surface to tell us what the temperature is down there. So we need other data sources -- ships, buoys, other sorts of satellite (ones that can see through clouds) to fill in even just the 200 km (2 degrees latitude-longitude) boxes each day. Plus we need to observe the detail in the oceans that are involved in those other processes I mentioned. It isn't just for models that they're important -- fishing also cares.
Let's start by looking at an actual high resolution global product, though not every few meters! The SST analysis at http://polar.ncep.noaa.gov/sst/ gives a value every 1/12th of a degree in latitude and longitude, one about every 9 km (6 miles). It has about 9 million values. Let's also suppose that this is fine enough resolution that everything important is represented.
The worst resolution is to use 1 number for the entire globe, the average for all ocean points. To measure how bad this is, I'm going to compute the root mean square error. (Those who know what this is can skip to the next paragraph.) It is often abbreviated rmse. To find it, we go through every ocean point in the grid and find the difference between the value there and the average. Then we multiply this difference by itself (square it -- this avoids the marksmen statistician story*). Then add up these squares for every ocean point. This is a big and not interesting number. One thing that would be more interesting is the average value of the squared error -- the mean square error. So we divide by the number of points that were involved. This also tells us the error variance. Since we think more in terms of temperature and temperature changes than squares of temperature changes, we take the square root of the mean square error -- get the rmse. This is a figure which represents a typical magnitude of how far off we expect to be. We could be either warmer or colder by this much, but this is the magnitude.
* Two statisticians went to a shooting range and each fired at the target. The first missed by 1 meter to the left (-1 meter). The second missed by 1 meter to the right (+1 meter). They then congratulated each other on their fine marksmanship because on average they had hit the bullseye. Their average error was indeed zero. But their rms error was 1 meter.
When I compute the RMSE for using global mean temperature instead of the full resolution grid, I find 12 C. That's ... enormous. The difference between water at 20 C (68 F) and 32 C (90 F) is pretty large! So, clearly, we can't be satisfied with an RMSE of 12 C. But now we have a method for looking at the resolution we need, and a notion of how bad you can get.
Then I made my program average over smaller boxes than the whole globe, say 90 degrees on a side -- London to Chicago, equator to pole -- and found the RMSE comparing those box averages to the original temperatures in the full resolution grid. No surprise that boxes that large were pretty bad. But ... once I got down to boxes 2 degrees on a side (which is something like 200 km, or 120 miles), the RMSE was down to 0.5 degrees.
This is still definitely not zero, but it isn't bad. When a typical satellite used for the job -- such as the AVHRR instrument on NOAA-18 -- is used to make an observation, it has an RMSE (compared to a buoy's thermometer at about the same location at about the same time) of about 0.5 degrees. In other words, with boxes 2 degrees on a side, the average represents what is happening in sea surface temperature about as well as getting a single observation from satellite. We've also managed to reduce our RMS error by about 95% as compared to using only a single number. On the other hand, even though we've captured 95% of what's going on, we only need to use 16,200 numbers -- instead of the 9,331,200 we started with. 95% of the information of the full grid, with only 0.2% as much data.
We've caught 90% of what is happening (reduced the rmse by 90%) when the boxes are 6 degrees on a side (600 km, 360 miles). And it's 99% once we're down to boxes only 0.5 degrees (50 km, 30 miles) on a side (which means only about 3% as many data points are needed to represent the full data set to 99% accuracy).
Now, let's translate this back to some situations we might care about. In trying to construct climatologies of sea surface temperature, we run in to the problem that as we go back in time, there are fewer and fewer data points. On the other hand, if we have 1 observation in each box 6 degrees on a side, we've managed to capture 90% of what is happening in the sea surface temperature. In other words, a much sparser data set than we might imagine could indeed represent an awful lot of what is happening in the ocean. A global grid at 6 degrees resolution has only 1800 points, so we need only 1800 observations to fill it in our simple-minded way.
At 2 degrees resolution, we've captured 95% of what happens in sea surface temperature (at least to this quick little glance -- I only looked at 1 day, as analyzed by 1 center, etc.) So, if we had a good global ocean model at 2 degree resolution, we'd actually be pretty far along in being able to predict sea surface temperatures (model climate, etc.) well. In practice, there are processes that happen in smaller areas than the 2 degree box which can change the whole box's average and we, therefore, want finer resolution than 2 degrees. More about that in a different post.
In thinking about observing systems, if we only 'need' 1 observation every 200 km or so, and we have satellites that can take an observation every 4 km (like the one above) we're all done, right? Unfortunately, no. The problem is, that satellite looks for clouds. If there are clouds -- and cloudy areas can easily stretch for 1000 km -- the satellite can't see the sea surface to tell us what the temperature is down there. So we need other data sources -- ships, buoys, other sorts of satellite (ones that can see through clouds) to fill in even just the 200 km (2 degrees latitude-longitude) boxes each day. Plus we need to observe the detail in the oceans that are involved in those other processes I mentioned. It isn't just for models that they're important -- fishing also cares.
Subscribe to:
Comments (Atom)