MW&A

Hydroclimatology and Solar Explorations

Climate Forecasts & Predictions, Uncategorized

Attribution Detection of Confirmation Bias Correction

I think that both the growing use of large author cohorts and of jargon, can obscure problems and challenges in the hydroclimatologic sciences.  In this example, I’ve simply informally worked to  verify the Western US Runoff reporting by the cluster of authors of [1] (GOC).  Each of the terms in my title had to be combed through starting with this reference, if I remember correctly.  My focus is consistently on reproduction of records and reproducibility of process, along with the skill assessment.  The jargon was a big part of my initial barrier to entry.  The overall moral I think is to stay focused on the reproducibility of the records (the records of the skill of the forecast), regardless of the favored terms.

When I first read about bias correction in this context of [1], I initially thought that their intent was to correct for measured bias in their model results.  But once I recognized the full extent of the modeling span, I realized that bias correction although an interesting point often again equated with jargon, was not a suitable term for their exercise.

The reference trains took me through mysterious topics including attribution detection, so I also took another look at actual data to confirm if they detected the appropriate attribution.  In fact, I think attribution detection of confirmation bias correction is a good example of how sometimes truth about jargon is stranger than fiction.  Yes, that is a joke about jargon.

In my featured image excerpt from [1], one sees the GOC authors’ simulation of past flows of the Colorado River, along with their future projections starting at about 2001.  I’ve added overlays of actual data from the Animas River and the Bear Mountain stream near the Utah Wyoming border of the upper watershed network of the Colorado River.  The data seem comparable for the gray zone of their training period (their period of history matching).  Once past the publication date, everything beyond appears to be more mystery-matching than history matching.

I also found an actual error in addition to the poor skill so far.  The email excerpt which follows captures some of the latest status of that error.  I reported the error to Dr. Cayan an author of [1] a few years ago.   I’m still hoping for some attribution.  The error appears important but the continued public messaging for content premiered in [1] appears to maintain high confidence in their alarming climate projections.  Dr. Cayan I appreciate that you acknowledged the error below and I encourage you to bring this forward and through the quality control component of the enterprise which published [1].

References

[1] Cayan, D., M. Tyree, K. E. Kunkel, C. Castro, A. Gershunov, J. Barsugli, A. J. Ray, J. Overpeck, M. Anderson, J. Russell, B. Rajagopalan, I. Rangwala, and P. Duffy. 2013. “Future Climate: Projected Average.” In Assessment of Climate Change in the Southwest United States: A Report Prepared for the National Climate Assessment, edited by G. Garfin, A. Jardine, R. Merideth, M. Black, and S. LeRoy, 101–125. A report by the Southwest Climate Alliance. Washington, DC: Island Press.

[2] personal communication below with author of [1]

From: Daniel Cayan <dcayan@ucsd.edu>
Sent: Saturday, June 30, 2018 11:30:37 AM
To: Michael Gary Wallace
Subject: Re: I’m trying to trace some data in a work product of yours

hi Michael     we have chased this farther      Dave is putting a description together–will send nxt wk

rgrds Dan

On Sat, Jun 30, 2018 at 9:36 AM, Michael Gary Wallace <mwa@unm.edu> wrote:

Hi Daniel, I haven’t heard anything and so am checking in.  I know that this is an important concern, if only because I served as the lead technical checker for …… I’m curious to know if there is any kind of a plan to resolve the irreproducibility, and in the meantime to inform the numerous downstream customers …

If you have already resolved, I’m very interested to learn that as well.

Thanks in advance,

Mike Wallace

From: Michael Gary Wallace
Sent: Monday, June 11, 2018 9:10 AM

To: Daniel Cayan
Subject: Re: I’m trying to trace some data in a work product of yours

Many Thanks Daniel,

.. If I happen to run across something that might be helpful I’ll also share.

..  Your feedback is also very welcome news to me since it demonstrates that I may not be crazy after all at least on this topic.

From: Daniel Cayan <dcayan@ucsd.edu>
Sent: Sunday, June 10, 2018 5:35:02 PM
To: Michael Gary Wallace
Subject: Re: I’m trying to trace some data in a work product of yours

hi Michael      i believe you have identified a problem, but awaiting some data–

my former programmer responded last wkend and said she would search for files

but nothing back from her yet.

in meantime, Dave Pierce accessed the GDO VIC basin data, which we believe is

the source of fig 6.10,  and tried to replicate

the Rio Grande projections on fig 6.10  —  his result didn’t match, and i am concerned

that fig 6.10 might be wrong, though the model ensemble membership that Dave included

also may not match our original   Dave’s note is pasted below and a couple of graphics

are attached.   i know your focus is Colorado Basin, so this is only a clue.

more as soon as it emerges, sorry this is painfully slow

Dan

……………

from Dave Pierce, 5 days ago–

Hi Dan,

I poked around a bit regarding that email you mentioned to me querying about the Southwest chapter of the national climate assessment Figure 6.10, which shows Apr 1 SWE, Apr – Jul runoff, and Jun 1 soil moisture for a bunch of B1 and A2 models. To that end, I downloaded the Rio Grande data from the USBR green data oasis CMIP3 VIC runs. These are downscaled with BCSD.

I was not able to match Figure 6.10 with what I downloaded and how I processed the data. My version of the plot, and the original panel from Fig. 6.10, are attached. The mismatch includes an overall different scaling as well as a different temporal pattern. There do seem to be similarities in the temporal patterns (I’m looking at the peaks in my version vs. Fig 6.10), so I think Fig 6.10 was calculated with something similar to the data I downloaded, but not exactly the same thing.

I’m not sure what to say about the overall scaling mismatch. Perhaps there was a mistake in the units conversion? Assuming the big peak at year 2031 seen in both my plot and the original Figure 6.10 is the same thing, which I think is extremely likely, then the difference in scaling between the two figures is such that my values are about 1.39 times larger than the values shown in Fig 6.10, or equivalently the values in Fig 6.10 are 0.72 times the values i see. Neither number is bringing to mind any particular value for units conversion.

Another thing that you notice on my plot vs. Fig 6.10 is that my plot has more orange (sres A2) peaks interspersed between the green peaks. I find this particularly noticeable between years  2040-2055, and 2070-2090. This suggests to me that there is one or more A2 model that I downloaded and am including that was not included in Fig. 6.10.

One odd difference between the two plots is that Fig 6.10 clearly shows some individual future traces falling to about zero value, which I do not see in my plot. In Fig 6.10, the overall decline in the A2 multi-model ensemble average runoff value is, I would estimate, about 50% by the end of the century (heavy orange line). The drop I calculate is nowhere near that much. It’s almost as if Fig. 6.10 somehow has its zero value shifted.

Anyways, for what it’s worth,

–Dave

 

On Sun, Jun 10, 2018 at 1:29 PM, Michael Gary Wallace <mwa@unm.edu> wrote:

Can you update me if you can?  I am under some pressure in a paper in review to explain or somehow account for your method, which I compare to mine.  The reviewer wants to know if the poor VIC skill I identified for a stream in my Western US study area is typical or atypical.

My paper is nominally accepted so I don’t think the answer I provide will impact its publication, but I’d like to be responsive and accurate.

Mike Wallace

From: Michael Gary Wallace
Sent: Saturday, June 2, 2018 1:34 PM

To: Daniel Cayan
Subject: Re: I’m trying to trace some data in a work product of yours

Hi Dan,

I mean no offense, and I’m certainly not perfect, but when it comes to reproducible data I always try to be slavish.  I’m moving forward with a preliminary conclusion that your resource cannot be reproduced.  I can certainly post a correction at a later date.

In addition to more reproducible data, I think it would be a good improvement to transparency for your group to find a way to overlay the observed runoff for the same time series, and to issue a corrected report which shows such overlays for all time series you feature.  Anyone can then develop a relative assessment of the goodness of fit.  That data also would need to be in a table(s) so more quantitative performance measures could be readily applied by outsiders.

That added information might help others such as me to best interpret the bias correction methodology which is hard to understand without the observations.  I don’t know the difference now between bias-correction and re-initialization, both of which come up at times associated with CMIP products.

Mike W.

 

From: Michael Gary Wallace
Sent: Sunday, May 27, 2018 10:23:13 AM
To: Daniel Cayan
Subject: Re: I’m trying to trace some data in a work product of yours

Ok thanks Dan.  If you have also moved on to another publication which points to similar themed time series, I’m happy to start from there if you can point me to the document(s).

Otherwise, I’ll wait to hear back from you later this week.  I’ve dug around a few times through similar file sets and perhaps one possibility to address my question on that specific run (sresa1b.bccr_bcm2_0.1.monthly.runoff.1950.nc)  is that only some of the 112 historical sets were included in the chart.  Then to know which sets were used would be a great help.   I can’t find a description in your report which clarifies this but again I could have overlooked an appendix or other resource.

From: Daniel Cayan <dcayan@ucsd.edu>
Sent: Sunday, May 27, 2018 10:03:57 AM
To: Michael Gary Wallace
Subject: Re: I’m trying to trace some data in a work product of yours

 

hi Mike      we’ve moved on from SRES model runs and the programmer who produced most

of this has retired, so i’m not sure how easily recovered this is;    back to you later this wk    Dan

 

On Sun, May 27, 2018 at 8:05 AM, Michael Gary Wallace <mwa@unm.edu> wrote:

Hello Dr. Cayan,

I’m working to compare some of my streamflow work to yours and am hoping to save some time with a very specific thread.  If I can confirm that thread it would be minimally helpful.  However in the hopes that you have an excel or .cvs table handy, or a summary .nc file, I’m making my first request in that context.

 

I am seeking to confirm some data contained in the document:

Cayan, D., M. Tyree, K. E. Kunkel, C. Castro, A. Gershunov, J. Barsugli, A. J.

Ray, J. Overpeck, M. Anderson, J. Russell, B. Rajagopalan, I. Rangwala, and P. Duffy. 2013. “Future

Climate: Projected Average.” In Assessment of Climate Change in the Southwest United States: A Report Prepared for the National Climate Assessment, edited by G. Garfin, A. Jardine, R. Merideth, M. Black, and S. LeRoy, 101–125. A report by the Southwest Climate Alliance. Washington, DC: Island Press.

 

You are listed as the contact person for Chapter 6 where my interests are currently focused.  In particular I am interested in the center chart of Figure 6-10 for Apr-Jul runoff in inches from 1950 through the present and beyond to 2100 for the Colorado Basin.

 

I’m interested particularly in the gray “individual historical simulations”.  I would most like to obtain a table listing for each [April through July] period of each year, the model run ID and the runoff result. I assume each runoff result is a four month (April through July) average over the watershed you defined and that this average value is in units of inches.

 

This would among other things allow me to quickly verify broad items including the apparent median calculation which you indicate by the black curve.  I’m not certain what the flat black line is and would also like to know that.

 

I’ve started to crunch through this material by accessing the .nc files, but I’ve already hit a snag or a discrepancy and so I’m motivated to communicate with you and ideally to obtain the table I just requested above.

 

But in any case, I did download a file from your first historical hindcast run for 1950.  The file is:

sresa1b.bccr_bcm2_0.1.monthly.runoff.1950.nc

 

From that file I extracted the runoff values for the months of April through July.

I calculated the average runoff value to be 7.13.

I can’t find any value that high in the gray series for this year 1950 for Colorado case from the center chart of Figure 6-10.

 

If I understand, the chart includes averages for the April through July runoff for each run case and so that value I calculated should be shown by a point along a gray line.  The fact that the chart only extends up to 4 inches ensures that the value I tried to confirm cannot be found.  This suggests to me that I’m doing something wrong or that a mistake or oversight was made by others in development of the chart.

 

I would prefer to obtain the comprehensive table I requested at first.  There must be such a table or the chart could not have been produced.  It would save much time for me as well.    If you also can address the discrepancy I am interested to know how I can avoid problems on my part.

 

Thanks in advance,

Mike Wallace

 

 

 

571total visits,24visits today

Leave a Reply