Go to NWS Southern Region
To the NWS Homepage Welcome to National Weather Service Southern Region Go to the NOAA Homepage
Spacer   Endcap

Flood Categories
Minor - Some public inconvenience, but minimal or no property damage likely.
Moderate - Closure of secondary roads. Transfer to higher elevation may be necessary to save property. Some evacuations may be required.
Major - Extensive inundation and property damage. Usually characterized by the evacuation of people and livestock and the closure of primary and secondary roads
Record - The highest observed River stage or discharge at a given site during the period of record keeping.
Categorical Flood Forecast Verification System - History and Information

1. Introduction
The National Weather Service (NWS) River Forecast Centers(RFC) are responsible for providing flood forecasts for major rivers and streams throughout the country. For some time now, there has been a recognized need for a river forecast verification system to evaluate the RFC's skill in the delivery of this service. The agency cites two goals for verification: (1) Improve accuracy and timeliness of river forecasts and (2) document overall trends in forecast performance (OHD, 2001). Verification of river forecasts at individual locations are required to identify areas where forecast skill is lacking. Hydrologic development efforts can then be concentrated on the areas where an RFC consistently does a poor job of forecasting. Accumulating pertinent verification information for an RFC over a period of time, an RFC can show trends in their overall river prediction performance. As technology and science continue to advance, "improvements" will continue to be made to our hydrologic models and their underlying infrastructure. We are now capable of generating river forecasts quicker and at higher resolution than ever before. Without concrete performance measures, there is no way of knowing how these changes affect the accuracy of our forecasts.

With these two goals in mind, the Southern Region Categorical Flood Forecast Verification System(CFFVS) was developed. The system is designed to provide a meaningful analysis of flood forecast performance at the individual forecast point scale, as well as over a larger scale, such as RFC wide or even nation wide.

2. Verification History
There has never been a national river forecast verification in place until the recent initiative by the Office of Hydrologic Development (OHD), which was fueled by the agency's increased emphasis on "performance measures". Verification methodology has been debated over the years, but chiefly due to the complexity of the problem, a consensus has never been reached on the proper metrics necessary to measure forecast performance.

Several river forecast verification initiatives have been implemented over the years. Most of these schemes, including the current NWS national verification software, measure forecast skill based on statistical error and bias. Commonly expressed in terms of absolute error or root mean square error (RMSE), these methods measure the difference between forecasted and observed stages. These types of statistics serve a useful purpose of comparing forecasts at a single location, or tracking model performance. For instance, the Arkansas-Basin River Forecast Center (ABRFC) used RMSE to evaluate model performance with and without QPF (ABRFC,1997). This information can be valuable when comparing different forecast techniques, but is meaningless when aggregated for multiple sites. RMSE is low during recession and baseflow conditions, but increases greatly during increased hydrologic activity. The ABRFC has shown that over a three year period, there was a .86 correlation between monthly flows and forecast error (ABRFC, 2001). Simply put, RMSE applied to river forecast verification shows when it rained, and when it didn't. Clearly, forecast error statistics alone in no way convey how well an RFC did in providing a river forecast service.

In 1988, the original concept of a categorical flood forecast verification system was presented in NOAA Technical Memorandum NWS HYDRO-43 (Morris,1988). Morris suggested that floods are events, similar to weather phenomena such as tornadoes and hurricanes. As events, floods can be classified as to magnitude: minor, moderate, or major. With categories defined, we can readily verify the magnitude of a flood, and verify how accurate our forecasts were with respect to these established categories. For instance, if we observed a major flood, did we forecast a major flood? If so, how much lead time did the forecast provide? If we didn't, how badly did we miss forecasting a major flood?. By answering these basic questions we "frame the service", or provide statistics relative to the hydrologic significance of the flood event.

In December, 1999 under the direction of Southern Region Hydrologic Services, a verification team was formed with members from ABRFC, WGRFC, and SRH. The team was tasked with implementation of a categorical flood verification scheme based on HYDRO-43. Additional requirements were placed on the team to keep the process "simple and automated". After several iterations, the verification team agreed on a proposed set of verification metrics in late 2000. In January, 2001 the team was expanded to include the other SR RFCs and the Norman, Oklahoma Service Hydrologist, and a final consensus was reached on verification procedures.

We are currently in the implementation stage, and preparing to install the software at all SR RFCs by June, 2001.

3. Categorical Flood Forecast Verification System
CFFVS is setup to run on the Advanced Weather Information Prediction System (AWIPS) at an RFC. The verification process can be broken down into three discrete steps:
1. Data Assimilation
2. Forecast Verification
3. Executive Summaries

Data assimilation begins with the compilation of category thresholds to verify against. It also includes the ongoing archival of forecast and observed river stages. The forecast verification step consists of the computation and output of categorical statistics for each individual forecast point. The final step produces executive summaries which are computed from overall statistical scores for a designated time period. These aggregated scores, can then be compiled and presented to upper management to track RFC performance. Each of these steps is discussed in detail below.

3.1 Categorical Data Assimilation
Before implementation of a categorical verification scheme, the category levels for each forecast point must be determined. The CFFVS uses the flood severity categories of minor, moderate, and major to classify forecasts. We have further stratified forecast locations, by adding an action stage category, since many RFCs use this stage as a threshold to initiate flood forecasts, and a record flood category, to provide specific verification information for the most extreme events. Table 1 lists each verification category, and where the category levels are accessed by the CFFVS.

Table 1 Flood Categories.
Category Severity Reference Source
0 Action Stage OFS Rating Curve File
1 Minor Flooding(Flood Stage) OFS Rating Curve File
2 Moderate Flooding IHFS Floodcat Table
3 Major Flooding IHFS Floodcat Table
4 Record Flooding OFS Rating Curve File

In the Southern Region, the Service Hydrologists set action stages and flood stages as well as the threshold stages for minor, moderate, and major flooding. There are Interactive Hydrologic Forecast System (IHFS) database fields available for each of these stages. The Service Hydrologists also are responsible for determining the flood of record for all locations. Each Service Hydrologist tabulates this pertinent data for all forecast points in their Hydrologic Service Area (HSA) and provides it to the proper RFC. The RFC then enters the data into their IHFS database and OFS Rating Curve files. Assimilation of this categorical information requires a considerable amount of effort on the front end, but once the categories have been established, only infrequent updates are required.

Although desirable, it is not imperative to have valid entries for each category. CFFVS will stratify forecasts into any and all categories that are defined. For instance, if no action stage is defined in the OFS rating curve file, the lowest verification category will be minor flooding. There may be instances where the flood of record is lower than some of the other categorical stages. This is most likely to occur at forecast points that have a short historical record. In this case, the record flood category is ignored. Record flood level must exceed the major flood category to be considered as a valid flood category.

3.2 Archive Database
The second, and ongoing data assimilation requirement, is databasing of the forecast data to verify, and the observed data to verify against.

Currently, all SR RFCs prepare their forecasts using the XSETS software. XSETS, which stands for Xwindows based SHEF Encoded Time-Series, was originally developed at ABRFC, and is now supported nationally by the Office of Hydrologic Development (OHD). This software extracts forecast time-series generated by the NWS River Forecast System (NWSRFS), and reformats them into the Standard Hydrometeorological Exchange Format (SHEF). An example of an XSETS forecast is shown in Appendix A. The SHEF products are disseminated to Weather Service Forecast Offices (WSFOs) where they are used as guidance in the preparation of public river forecasts. The products are also passed to a SHEF decoder at the local RFC where they are stored in the IHFS database. Each forecast ordinate is stored in a separate database record in the IHFS fcstheight table. Each record in fcstheight contains a gauge height value, a valid time which is the date and time for the forecast ordinate, and a basis time, which is the date and time the forecast was issued.

Observed gauge height data is also ingested into the IHFS database. Products containing SHEF encoded observations are processed through the SHEF decoder and posted to the height table in the IHFS database. As with the forecast data, each individual observation is stored as a record in the database.

Before we can verify a forecast, both the forecast and observed data must be moved from the operational IHFS database to the archive database. The archive database structure we have implemented simply mimics the database structure of the fcstheight and height tables in the IHFS database. Appendix XX shows the database schema for the fcstheight and height tables. Standard query language (SQL) procedures are used to copy data to the archive database. In our implementation, we initiate the SQL script daily to populate the archive database. The SQL script archives all forecast time series, and all hourly gauge height observations.

Even though we archive all hourly observations, the CFFVS examines only the observations that have identical times as the forecast ordinates. Since most of our forecasts produce six hour ordinates at synoptic times, six hour observations at 0,6,12, and 18Z would suffice in most cases. The intermediate observations may be useful to estimate missing synoptic time data, but are not required if disk space is limited. It should be noted that if an observation does not exist in the archive database at the same time of a forecast ordinate, no verification is attempted. Also, CFFVS does not provide any data quality control. It is up to the RFC to insure that bad observations have been removed, and any interpolated stages have been entered. ABRFC has developed a couple of utility programs to aid in data inspection, but currently there are no user applications to correct erroneous data in the archive database.

4.0 Forecast Verification
The CFFVS is a derivative of the methods outlined in Dave Morris' HYDRO-43. In order to make the process "automated and simple" the methods were adapted to utilize time series forecasts and observations. When the original concept was introduced in 1988, forecast and observed data were not readily available in a format which could quickly and easily be processed by automated methods. RFCs prepared river forecasts in a text format, using qualitative terms to describe the shape and magnitude of the forecast hydrograph. Commonly the forecasts contained statements such as "crest near", "rise to", or were bracketed between an upper and lower stage using terminology like "crest between". Timing of the crest was also generalized, using broad terms such as "midday" or "late evening", or even "next Tues". Other terms were included in the forecast to describe the shape of the hydrograph. These were usually expressed as times to rise above or fall below critical stages, such as bankfull or flood stage. While these statements legitimately expressed the uncertainty in the flood prediction process, they made verification subject to interpretation of what the forecaster was trying to convey. The sparsity of observed data, also made verification difficult to implement. While real-time data at frequent intervals was quickly becoming available, daily gage height readings from COOP observers provided the only observations at many locations.

4.1 Verification Definitions
Before explaining the methodology behind CFFVS, it is important to understand the following definitions:

Forecast : Each forecast ordinate of each forecast time series is verified as its own forecast. If a three day forecast is issued with six hour ordinates, 12 separate forecasts will be evaluated. If the forecast is updated six hours later, another 12 forecasts will be verified. Any reference to a forecast in these definitions should be construed as an individual forecast ordinate.

Flood Category : Each individual forecast point is stratified into flood categories ranging form action stage to flood of record. Forecasts below action stage are not verified. Category stages must increase as categories progress from action stage to flood of record.

Hit : If a forecast and corresponding observation at the same valid time are in the same flood category, the forecast is credited with a hit.

Lead Time : A categorical lead time is the number of hours from the time of forecast issuance to the time of the forecast hit. A lead time is only computed when, (1) the ordinate's forecast and observation are in the same category, and (2) the previous ordinate's observation was lower than the current category. This restricts lead time calculations to instances where the stage is rising, and crossing categories.

Miss : Conversely, a miss is given when a forecast is in one category, and the corresponding observation is in a different category.

Categorical Error : This is the amount the forecast would have to be changed to reach the observed category. Categorical error is only computed when you have a miss.

False Alarm: A false alarm is given when the forecast is above the lowest defined category, and the observation is below the lowest category.

No Forecast Miss: A no forecast miss is the least desirable score. In this case, the observation is above the lowest defined category, and there is no corresponding forecast ordinate. It can be thought of as an unforecasted event.

Non-Flood Forecast: Any forecast that is below the lowest defined category with an observed stage also below the lowest defined stage is a non-flood forecast. Non-flood forecasts are not counted in the verification process.

Number of Events: The number of events is determined by summing the total number of its, misses, false alarms, and no forecast misses for a specified period.

Probability of Detection (POD): The number of categorical flood forecast hits divided by the total number of categorical flood forecasts observed.
POD = (Hits / (Hits + Misses + No Forecast Misses) ) {1.00 is best}

False Alarm Ratio (FAR): The number of missed categorical flood forecasts divided by the total number of categorical flood forecasts issued.
FAR = (False Alarms / (False Alarms + Hits) ) {0.00 is best}

4.2 Verification Example
Multiple forecasts are often issued for a particular flood event. These forecasts may be updated every 6 to 12 hours, depending on office policy and how the forecast is tracking. In this example three time series forecasts were issued for this event.

Figure 1: Figure of multiple forecast issuances

Each forecast time series is verified with observed data. For our example we will examine Fcst 1.

Figure 2: Example of a single forecast issuance (Fcst 1)

Each forecast ordinate for Fcst 1 is evaluated against the observed stage at the same valid time.

Figure 3: Example of forecast verification for Fcst 1

Each forecast ordinate is classified as a hit, miss, no forecast miss, or false alarm within the pertinent category. Lead times are calculated for hits where the previous observation was below the current category, and categorical error is calculated for misses (Table 2). The verification procedure is repeated for Fcst2 and Fcst3, and the results are tallied for each category. Figure 4 contains a generalized process flow.

Table 2: Tabulation of verification results for forecast example.
Period Forecast Category Observed Category Categorical Result Comment
1 No forecast Minor Minor No Fcst Miss Unforecast flood
2 Minor Minor Minor Hit No Lead Time Computation
3 Moderate Moderate Moderate Hit Calculate Lead Time
4 Moderate Major Major Miss Calculate Cat. Error
5 Moderate Moderate Moderate Hit No Lead Time Computation
6 Minor Moderate Moderate Miss Calculate Cat. Error
7 Minor Minor Minor Hit No Lead Time Computation
8 Minor Minor Minor Hit No Lead Time Computation
9 Minor No Flood Minor False Alarm  
10 No Flood No Flood No Flood No Verification

Much thought was put into the lead time computation. After several iterations, we arrived at the current method which seems to provide a realistic value of lead time. The requirement that the previous ordinate's observation fall in a lower category isolates the cases where the river has changed categories, and is on the rising limb. This limits the number of cases where lead time is calculated, but provides an accurate depiction of categorical lead time.