|
1.
Introduction
The National Weather Service (NWS) River Forecast Centers(RFC) are
responsible for providing flood forecasts for major rivers and streams
throughout the country. For some time now, there has been a recognized
need for a river forecast verification system to evaluate the RFC's
skill in the delivery of this service. The agency cites two goals
for verification: (1) Improve accuracy and timeliness of river forecasts
and (2) document overall trends in forecast performance (OHD, 2001).
Verification of river forecasts at individual locations are required
to identify areas where forecast skill is lacking. Hydrologic development
efforts can then be concentrated on the areas where an RFC consistently
does a poor job of forecasting. Accumulating pertinent verification
information for an RFC over a period of time, an RFC can show trends
in their overall river prediction performance. As technology and
science continue to advance, "improvements" will continue to be
made to our hydrologic models and their underlying infrastructure.
We are now capable of generating river forecasts quicker and at
higher resolution than ever before. Without concrete performance
measures, there is no way of knowing how these changes affect the
accuracy of our forecasts.
With
these two goals in mind, the Southern Region Categorical Flood Forecast
Verification System(CFFVS) was developed. The system is designed
to provide a meaningful analysis of flood forecast performance at
the individual forecast point scale, as well as over a larger scale,
such as RFC wide or even nation wide.
2.
Verification History
There has never been a national river forecast verification in place
until the recent initiative by the Office of Hydrologic Development
(OHD), which was fueled by the agency's increased emphasis on "performance
measures". Verification methodology has been debated over the years,
but chiefly due to the complexity of the problem, a consensus has
never been reached on the proper metrics necessary to measure forecast
performance.
Several
river forecast verification initiatives have been implemented over
the years. Most of these schemes, including the current NWS national
verification software, measure forecast skill based on statistical
error and bias. Commonly expressed in terms of absolute error or
root mean square error (RMSE), these methods measure the difference
between forecasted and observed stages. These types of statistics
serve a useful purpose of comparing forecasts at a single location,
or tracking model performance. For instance, the Arkansas-Basin
River Forecast Center (ABRFC) used RMSE to evaluate model performance
with and without QPF (ABRFC,1997). This information can be valuable
when comparing different forecast techniques, but is meaningless
when aggregated for multiple sites. RMSE is low during recession
and baseflow conditions, but increases greatly during increased
hydrologic activity. The ABRFC has shown that over a three year
period, there was a .86 correlation between monthly flows and forecast
error (ABRFC, 2001). Simply put, RMSE applied to river forecast
verification shows when it rained, and when it didn't. Clearly,
forecast error statistics alone in no way convey how well an RFC
did in providing a river forecast service.
In
1988, the original concept of a categorical flood forecast verification
system was presented in NOAA Technical Memorandum NWS HYDRO-43 (Morris,1988).
Morris suggested that floods are events, similar to weather phenomena
such as tornadoes and hurricanes. As events, floods can be classified
as to magnitude: minor, moderate, or major. With categories defined,
we can readily verify the magnitude of a flood, and verify how accurate
our forecasts were with respect to these established categories.
For instance, if we observed a major flood, did we forecast a major
flood? If so, how much lead time did the forecast provide? If we
didn't, how badly did we miss forecasting a major flood?. By answering
these basic questions we "frame the service", or provide statistics
relative to the hydrologic significance of the flood event.
In
December, 1999 under the direction of Southern Region Hydrologic
Services, a verification team was formed with members from ABRFC,
WGRFC, and SRH. The team was tasked with implementation of a categorical
flood verification scheme based on HYDRO-43. Additional requirements
were placed on the team to keep the process "simple and automated".
After several iterations, the verification team agreed on a proposed
set of verification metrics in late 2000. In January, 2001 the team
was expanded to include the other SR RFCs and the Norman, Oklahoma
Service Hydrologist, and a final consensus was reached on verification
procedures.
We
are currently in the implementation stage, and preparing to install
the software at all SR RFCs by June, 2001.
3.
Categorical Flood Forecast Verification System
CFFVS is setup to run on the Advanced Weather Information Prediction
System (AWIPS) at an RFC. The verification process can be broken
down into three discrete steps:
1. Data Assimilation
2. Forecast Verification
3. Executive Summaries
Data
assimilation begins with the compilation of category thresholds
to verify against. It also includes the ongoing archival of forecast
and observed river stages. The forecast verification step consists
of the computation and output of categorical statistics for each
individual forecast point. The final step produces executive summaries
which are computed from overall statistical scores for a designated
time period. These aggregated scores, can then be compiled and presented
to upper management to track RFC performance. Each of these steps
is discussed in detail below.
3.1
Categorical Data Assimilation
Before implementation of a categorical verification scheme, the category
levels for each forecast point must be determined. The CFFVS uses
the flood severity categories of minor, moderate, and major to classify
forecasts. We have further stratified forecast locations, by adding
an action stage category, since many RFCs use this stage as a threshold
to initiate flood forecasts, and a record flood category, to provide
specific verification information for the most extreme events. Table
1 lists each verification category, and where the category levels
are accessed by the CFFVS.
Table
1 Flood Categories.
|
Category |
Severity |
Reference Source |
|
0 |
Action Stage |
OFS Rating Curve File |
|
1 |
Minor Flooding(Flood Stage) |
OFS Rating Curve File |
|
2 |
Moderate Flooding |
IHFS Floodcat Table |
|
3 |
Major Flooding |
IHFS Floodcat Table |
|
4 |
Record Flooding |
OFS Rating Curve File |
In
the Southern Region, the Service Hydrologists set action stages
and flood stages as well as the threshold stages for minor, moderate,
and major flooding. There are Interactive Hydrologic Forecast System
(IHFS) database fields available for each of these stages. The Service
Hydrologists also are responsible for determining the flood of record
for all locations. Each Service Hydrologist tabulates this pertinent
data for all forecast points in their Hydrologic Service Area (HSA)
and provides it to the proper RFC. The RFC then enters the data
into their IHFS database and OFS Rating Curve files. Assimilation
of this categorical information requires a considerable amount of
effort on the front end, but once the categories have been established,
only infrequent updates are required.
Although
desirable, it is not imperative to have valid entries for each category.
CFFVS will stratify forecasts into any and all categories that are
defined. For instance, if no action stage is defined in the OFS
rating curve file, the lowest verification category will be minor
flooding. There may be instances where the flood of record is lower
than some of the other categorical stages. This is most likely to
occur at forecast points that have a short historical record. In
this case, the record flood category is ignored. Record flood level
must exceed the major flood category to be considered as a valid
flood category.
3.2
Archive Database
The second, and ongoing data assimilation requirement, is databasing
of the forecast data to verify, and the observed data to verify
against.
Currently,
all SR RFCs prepare their forecasts using the XSETS software. XSETS,
which stands for Xwindows based SHEF Encoded Time-Series, was originally
developed at ABRFC, and is now supported nationally by the Office
of Hydrologic Development (OHD). This software extracts forecast
time-series generated by the NWS River Forecast System (NWSRFS),
and reformats them into the Standard Hydrometeorological Exchange
Format (SHEF). An example of an XSETS forecast is shown in Appendix
A. The SHEF products are disseminated to Weather Service Forecast
Offices (WSFOs) where they are used as guidance in the preparation
of public river forecasts. The products are also passed to a SHEF
decoder at the local RFC where they are stored in the IHFS database.
Each forecast ordinate is stored in a separate database record in
the IHFS fcstheight table. Each record in fcstheight contains a
gauge height value, a valid time which is the date and time for
the forecast ordinate, and a basis time, which is the date and time
the forecast was issued.
Observed
gauge height data is also ingested into the IHFS database. Products
containing SHEF encoded observations are processed through the SHEF
decoder and posted to the height table in the IHFS database. As
with the forecast data, each individual observation is stored as
a record in the database.
Before
we can verify a forecast, both the forecast and observed data must
be moved from the operational IHFS database to the archive database.
The archive database structure we have implemented simply mimics
the database structure of the fcstheight and height tables in the
IHFS database. Appendix XX shows the database schema for the fcstheight
and height tables. Standard query language (SQL) procedures are
used to copy data to the archive database. In our implementation,
we initiate the SQL script daily to populate the archive database.
The SQL script archives all forecast time series, and all hourly
gauge height observations.
Even
though we archive all hourly observations, the CFFVS examines only
the observations that have identical times as the forecast ordinates.
Since most of our forecasts produce six hour ordinates at synoptic
times, six hour observations at 0,6,12, and 18Z would suffice in
most cases. The intermediate observations may be useful to estimate
missing synoptic time data, but are not required if disk space is
limited. It should be noted that if an observation does not exist
in the archive database at the same time of a forecast ordinate,
no verification is attempted. Also, CFFVS does not provide any data
quality control. It is up to the RFC to insure that bad observations
have been removed, and any interpolated stages have been entered.
ABRFC has developed a couple of utility programs to aid in data
inspection, but currently there are no user applications to correct
erroneous data in the archive database.
4.0
Forecast Verification
The CFFVS is a derivative of the methods outlined in Dave Morris'
HYDRO-43. In order to make the process "automated and simple" the
methods were adapted to utilize time series forecasts and observations.
When the original concept was introduced in 1988, forecast and observed
data were not readily available in a format which could quickly
and easily be processed by automated methods. RFCs prepared river
forecasts in a text format, using qualitative terms to describe
the shape and magnitude of the forecast hydrograph. Commonly the
forecasts contained statements such as "crest near", "rise to",
or were bracketed between an upper and lower stage using terminology
like "crest between". Timing of the crest was also generalized,
using broad terms such as "midday" or "late evening", or even "next
Tues". Other terms were included in the forecast to describe the
shape of the hydrograph. These were usually expressed as times to
rise above or fall below critical stages, such as bankfull or flood
stage. While these statements legitimately expressed the uncertainty
in the flood prediction process, they made verification subject
to interpretation of what the forecaster was trying to convey. The
sparsity of observed data, also made verification difficult to implement.
While real-time data at frequent intervals was quickly becoming
available, daily gage height readings from COOP observers provided
the only observations at many locations.
4.1
Verification Definitions
Before explaining the methodology behind CFFVS, it is important
to understand the following definitions:
Forecast
: Each forecast ordinate of each forecast time series is verified
as its own forecast. If a three day forecast is issued with six
hour ordinates, 12 separate forecasts will be evaluated. If the
forecast is updated six hours later, another 12 forecasts will be
verified. Any reference to a forecast in these definitions should
be construed as an individual forecast ordinate.
Flood
Category : Each individual forecast point is stratified into
flood categories ranging form action stage to flood of record. Forecasts
below action stage are not verified. Category stages must increase
as categories progress from action stage to flood of record.
Hit
: If a forecast and corresponding observation at the same valid
time are in the same flood category, the forecast is credited with
a hit.
Lead
Time : A categorical lead time is the number of hours from the
time of forecast issuance to the time of the forecast hit. A lead
time is only computed when, (1) the ordinate's forecast and observation
are in the same category, and (2) the previous ordinate's observation
was lower than the current category. This restricts lead time calculations
to instances where the stage is rising, and crossing categories.
Miss
: Conversely, a miss is given when a forecast is in one category,
and the corresponding observation is in a different category.
Categorical
Error : This is the amount the forecast would have to be changed
to reach the observed category. Categorical error is only computed
when you have a miss.
False
Alarm: A false alarm is given when the forecast is above the
lowest defined category, and the observation is below the lowest
category.
No
Forecast Miss: A no forecast miss is the least desirable score.
In this case, the observation is above the lowest defined category,
and there is no corresponding forecast ordinate. It can be thought
of as an unforecasted event.
Non-Flood
Forecast: Any forecast that is below the lowest defined category
with an observed stage also below the lowest defined stage is a
non-flood forecast. Non-flood forecasts are not counted in the verification
process.
Number
of Events: The number of events is determined by summing the
total number of its, misses, false alarms, and no forecast misses
for a specified period.
Probability
of Detection (POD): The number of categorical flood forecast
hits divided by the total number of categorical flood forecasts
observed.
POD = (Hits / (Hits + Misses + No Forecast
Misses) ) {1.00 is best}
False
Alarm Ratio (FAR): The number of missed categorical flood forecasts
divided by the total number of categorical flood forecasts issued.
FAR = (False Alarms / (False Alarms + Hits)
) {0.00 is best}
4.2
Verification Example
Multiple forecasts are often issued for a particular flood event.
These forecasts may be updated every 6 to 12 hours, depending on
office policy and how the forecast is tracking. In this example
three time series forecasts were issued for this event.
Figure 1: Figure of multiple forecast issuances

Each
forecast time series is verified with observed data. For our example
we will examine Fcst 1.
Figure
2: Example of a single forecast issuance (Fcst 1)
Each
forecast ordinate for Fcst 1 is evaluated against the observed stage
at the same valid time.
Figure
3: Example of forecast verification for Fcst 1
Each
forecast ordinate is classified as a hit, miss, no forecast miss,
or false alarm within the pertinent category. Lead times are calculated
for hits where the previous observation was below the current category,
and categorical error is calculated for misses (Table 2). The verification
procedure is repeated for Fcst2 and Fcst3, and the results are tallied
for each category. Figure 4 contains a generalized process flow.
Table
2: Tabulation of verification results for forecast example.
| Period
|
Forecast
Category |
Observed
Category |
Categorical
Result |
Comment |
| 1 |
No
forecast |
Minor
|
Minor
No Fcst Miss |
Unforecast
flood |
| 2 |
Minor |
Minor |
Minor
Hit |
No
Lead Time Computation |
| 3 |
Moderate |
Moderate |
Moderate
Hit |
Calculate
Lead Time |
| 4 |
Moderate |
Major |
Major
Miss |
Calculate
Cat. Error |
| 5 |
Moderate |
Moderate |
Moderate
Hit |
No
Lead Time Computation |
| 6 |
Minor |
Moderate |
Moderate
Miss |
Calculate
Cat. Error |
| 7 |
Minor |
Minor |
Minor
Hit |
No
Lead Time Computation |
| 8 |
Minor |
Minor |
Minor
Hit |
No
Lead Time Computation |
| 9 |
Minor |
No
Flood |
Minor
False Alarm |
|
| 10 |
No
Flood |
No
Flood |
No
Flood |
No
Verification |
Much
thought was put into the lead time computation. After several iterations,
we arrived at the current method which seems to provide a realistic
value of lead time. The requirement that the previous ordinate's
observation fall in a lower category isolates the cases where the
river has changed categories, and is on the rising limb. This limits
the number of cases where lead time is calculated, but provides
an accurate depiction of categorical lead time.
|