SR SSD 2000-24
Experiences from SCRAPE 99
The National Weather Service Strategic Plan through 2005 places emphasis on providing improved mesoscale warning and forecast products. This is sensible, given the majority of the nation's destructive weather evolves over life cycles of less than 12 hours. While much of the NWS mission has been assumed by techniques that are largely automated (medium and long-range NWP, for example), the arena of short-term prediction has remained a secure province of the forecasters. The determination of quantitative precipitation within the 0-18 hr time frame in particular remains a crucial forecaster pursuit at both NCEP and WFO levels.
AWIPS, WSR-88D, GOES imagery and derived products, profilers, surface mesonets, ACARS soundings, RUC/MAPS assimilation systems ... the list goes on and on. Thanks to NWS modernization, WFO forecasters can choose from a veritable smorgasbord of observation, analysis and display resources when assessing the mesoscale environment. More important, high-resolution model output is providing some tantalizing glimpses beyond the 0-3 hr time frame during which convection can be realistically predicted by means of extrapolation alone.
Concurrent with these new tools come increased expectations regarding the accuracy of mesoscale-oriented products and services. Today's line forecaster is being asked to provide longer-lead warnings, more accurate QPFs for significant rainfall, and better all-around short-term forecasts. But is this improvement in fact happening, or is it even realistic to expect it to happen given the observational limitations that currently exist at these temporal and spatial scales? Before we develop long-term plans and build user expectations, it is important that we quantify the accuracy of the short-term forecasts which WFOs currently provide, particularly those related to convective precipitation. How best do we improve the process by which we generate and disseminate these products so that we can maximize: a) user value, and b) cost-effectiveness as we enter this era of shrinking human and financial resources? As we explore this issue, the following are some other questions we might consider.
To answer questions such as these, WFO Birmingham has undertaken a multi-year operational exercise known as SCRAPE, the Summer Convective Rainfall in Alabama Precipitation Experiment. This project is part of a broader COMET-sponsored collaboration between the Birmingham office and the Global Hydrology and Climate Center (GHCC) in Huntsville, Alabama. The primary long-term goals of SCRAPE are as follows.
This paper briefly discusses some of the methodology and preliminary findings from SCRAPE during 1999 (hereafter referred to as SCRAPE99), as well as future plans for the project.
2. Forecast Methodology
SCRAPE is a multi-year operational project being conducted at WFO Birmingham during the summers of 1999, 2000, and 2001. Essentially, SCRAPE99 was a shakedown for the overall project; its main purpose was to expose any bugs existing in the project methodology. Between early June and late August 1999, five meteorologists produced 24 short-term precipitation forecasts for the Birmingham county warning area (CWA). Each package was prepared around midday, and consisted of a pair of probabilistic QPFs valid for the 12-hr period from 1800 to 0600 UTC. The first of these, referred to as CAT1, graphically depicted the probability of measurable precipitation at any given spot within the CWA. The second product (CAT2), represented probabilities of precipitation meeting or exceeding 1.00" at any given point across the area.
Each forecaster was asked to follow a standard methodology in preparing the QPF products. Utilizing an auxiliary AWIPS workstation adjacent to the operations area, he or she first evaluated all available observed data, including upper air analyses and soundings, surface analyses, and satellite and radar imagery. This was followed by an assessment of all available model output, including Eta, RUC, and MM5 products. In support of this project, the GHCC established an MM5 mesoscale model domain over Alabama and central Tennessee with a grid spacing of 12 km. Output from these real-time runs was made available to WFO forecasters via FTP during the late morning hours, and were viewed using NTRANS software. After perusing the observed and modeled data, the forecaster prepared a graphical composite chart, similar to the "mesocasts" produced by the Olympic support group at the 1996 Summer Games in Atlanta. This map delineated all features which might play a role in the formation and evolution of convection during the subsequent hours, including boundaries, cloudy/clear interfaces, upper-level features, etc. The forecasters also wrote a short synopsis to accompany this graphic. As a final step after completing the graphical QPFs, the participants filled out a questionnaire soliciting their opinions on the value of the Eta, RUC, and MM5 output available for their forecasts, as well as their confidence in that day's ZFP (zone forecasts) and SCRAPE products.
The SCRAPE99 QPF products were verified using 6-hr Stage III precipitation data provided in graphical form by the Lower Mississippi River Forecast Center (LMRFC) on their Internet Web page. These graphical products were archived and later utilized to retrieve precipitation amounts on a 240 point grid encompassing the SCRAPE forecast domain. Due to archiving problems, only 15 of the original 24 SCRAPE99 forecasts could be verified. Point values from the CAT1 and CAT2 probabilistic QPFs for each of the 15 available days were manually extracted at each of the grid points, and paired with the verifying precipitation amounts. These data were then compiled to produce the Brier scores and reliability diagrams discussed in Section 3. Responses to the survey questions regarding model confidence were also compiled and averaged.
In addition to the SCRAPE forecasts, QPF output from the MM5 and Eta models was also archived throughout the exercise. Each of the models' QPFs for the 15 relevant SCRAPE99 days were subjectively rated from 1 (poor) to 10 (excellent) for their ability to replicate both a) areas of precipitation, and b) rain-free regions. These data were compiled and averaged, and also appear in Section 3.
3. Preliminary Results
Probabilistic precipitation forecasts can be quantitatively assessed in a number of ways. To gain some preliminary insights into the quality of the SCRAPE99 QPFs, two simple measures were employed: Brier scores and reliability diagrams (both of which are treated in thorough detail by Wilks ). The Brier score (BS) represents the mean-squared difference between the probability forecast and a binary precipitation observation (1 = rain; 0 = no rain), and ranges from 0 (perfect forecast) to 1 (no skill). A typical reliability diagram contains a plot of an event's observed relative frequency as a function of its forecast probability, and depicts the "calibration" of a forecast set. Perfect calibration would be represented as a diagonal line across the diagram, denoting a 1:1 correspondence between the forecasts and observed relative frequencies. In practice, the plot departs from this ideal trajectory at one or more points across the range of forecast probabilities, revealing either a dry or wet bias for those PoP values.
Figure 1 depicts BS values and fractional precipitation coverages for each of the available SCRAPE99 CAT1 forecasts. Clearly, the BS corresponded extremely well with precipitation coverage; days with greater observed coverage were associated with higher BS values, implying less accurate QPFs during these regimes. Conversely, predominately dry forecasts (which usually yield low BS scores) always preceded dry days. While it's obvious that QPFs become more difficult during periods of widespread coverage, one might hope to see less correlation between the two quantities, implying some evidence of skill during these more challenging regimes. An important aim of the study was to gauge the relative accuracy of the SCRAPE forecasts vs. the Birmingham zone forecasts in effect simultaneously. The average CAT1 BS derived from the values in Fig. 1 is 0.15. In comparison, the BS derived from the ZFP on each of the 15 days was 0.16, indicating a slight increase in skill by the SCRAPE forecasters. One might have expected a greater improvement in skill with the SCRAPE products, but the sample size may limit the significance of this finding.
The reliability diagram for the CAT1 forecast group is shown in Fig. 2. Overall, the SCRAPE forecasts are well calibrated (if a bit wet), except in the 30-39% range where a distinct dry bias appears. By comparison, the ZFP PoPs display a broad dry bias through most of the range from 20% to 70%. These results are not surprising, given that the 20-40% range mirrors climatology and tends to be a common PoP choice. In both cases, the forecasters might be better served by adjusting their PoPs upward slightly from climatology during times of perceived scattered coverage. SCRAPE CAT2 reliabilities are presented in Fig. 3. In contrast to the CAT1 results, these findings are uniformly wet-biased and show essentially no reliability. This illustrates the difficulty (or folly?) of forecasting heavier precipitation amounts during summer convective situations when areal coverage is usually quite limited.
Subjective evaluations of the various models and forecast products were also conducted during SCRAPE99. It is interesting that the highest levels of confidence (Fig. 4) were reserved for the two human-generated products - SCRAPE and ZFP. Average confidence was ascribed to each of the three models, with the MM5 outpacing its RUC and Eta counterparts slightly. A special qualitative comparison of the MM5 and Eta QPFs was also performed. Figure 5 depicts ratings of forecast "quality" by the two models in areas where precipitation subsequently occurred. The MM5 was judged better than the Eta on most days, with the greatest differences occurring when coverage was sparse. This is an expected result, given the higher resolution of the MM5 (12 km vs. the Eta's 32 km), and its use of the Kain-Fritsch cumulus parameterization scheme, which typically generates wider precipitation coverage than the Eta's Betts-Miller method. Similar ratings of forecast quality in non-precipitation areas are presented in Fig. 6. In these cases, the Eta routinely outperformed the MM5, due again to the latter system's tendency to produce a spatially excessive QPF (at least over the Southeast states).
4. Discussion and Future Plans
Despite the limited results, SCRAPE99 demonstrated the feasibility of a larger operational study of this type. SCRAPE-2000, being conducted during the current summer, incorporates a number of changes. This year's exercise will involve seven forecaster participants instead of five, and it will attempt to produce forecasts on at least 50 separate days. This will result in a much larger sample set encompassing a wider variety of weather regimes. In cooperation with the GHCC, model and precipitation archiving methods will also become more rigorous. These steps should allow for a comprehensive analysis of both the SCRAPE and model forecasts, hopefully leading to more definitive results.
The overriding motive behind SCRAPE is to probe the limits of our ability to produce detailed short-term convective precipitation forecasts, and to determine the optimal method for presenting this information to our users. How detailed can these forecasts become? Are graphical short-term forecasts (NOWs) in our future? What is the proper role of mesoscale modeling, given that local models of various kinds are being investigated at more and more WFOs? It is imperative that additional studies of this type be conducted at forecast offices, and soon. The 2-5 Day forecasts and associated objective guidance systems provided by NCEP models, though far from perfect, already possess skill levels (vs. climatology) comparable to those of most human forecasters. Given the expected incremental increase in these systems' performance during the coming decade, it seems realistic to assume many of the longer-term forecast tasks currently performed by WFO personnel will eventually be surrendered to automated means. This will leave the mesoscale as an area for special attention by the WFO forecaster; an arena where the meteorologist can exploit his or her training and experience to provide significant "added value" to a baseline computer-generated product. In fact, it might be argued our continued ability to provide value in the form of precise short-term predictions will ultimately determine the viability of forecasting as a profession in the next two decades, if not sooner.
Acknowledgments. The first author would like to express his appreciation to the following members of the WFO Birmingham staff who have participated in the SCRAPE project so far: Jim Westland, Mark Linhares, Patricia Hart, Roger McNeil, Bob Kilduff, Ron Murphy, Kevin Pence, and Greg Machala. Thanks are also due to Dr. Bill Lapenta of the GHCC for providing the MM5 data, and to Dan Smith of Southern Region Scientific Services Division, whose support made this project possible.
Wilks, D.S., 1995: Statistical Methods in the Atmospheric Sciences. Academic Press, San Diego, 467 pp.