The most important thing to keep in mind is that you can't directly compare scores between different regions and/or different times of year! A score close to 100 only means that, for the region and time of year you're looking at, that year outperformed most of the other years since 1951. If you're looking at March in North Dakota, a score of 100 may mean that there were two fluke severe weather days, while none of the other years had any at all. If you're looking at May in Kansas, a score of 25 may mean that there were "only" four decent chase days instead of 15-20 in the best years.
For a given region and time of year, scores are computed based on seven metrics:
Notice that there are three tornado-count metrics. Each is scored in two ways: (1) the total tornado count over the period, and (2) the number of days containing a tornado over the period. This is a crude method for giving more weight to a month with, e.g., 30 tornadoes spread out over 10 days than a month that was quiet except for a single 30-tornado outbreak day. My rationale is that, as a chaser, you can only be on so many storms in one day. A prolific outbreak like April 14, 2012, or May 23, 2008, has a huge impact on tornado count for the month (or even year), but you can only see a few of those tornadoes (if you're lucky)! On the other hand, a month like May 2010 -- with few outbreaks, but endless localized setups -- gives you a reasonable chance to see numerous tornadoes and enjoy many good chases.
To be clear: a long-track tornado is counted in all of the tornado metrics; not just the most stringent category it meets. If Oklahoma has one tornado in January and it's on the ground for 30 miles, each of the first three metrics will be assigned a value of 1. This means that tornadoes with path lengths over 4 miles, and particularly over 15 miles, are given far more weight than short-track tornadoes.
Why include days with giant hail reports as a chasing metric? My intent was to give some "credit" to days with intense supercells, since many chasers enjoy storm structure and/or large hail. As you'll see below, the weighting for this metric is relatively small. Ultimately, a month with lots of nontornadic storm days should score higher than a month dominated by Gulf-wiping cold fronts or a death ridge, and the hail metric was my way of trying to ensure that.
Without a doubt, people will have reasonable objections to the metrics I've chosen. Unfortunately, they're probably close to the best I can do with the dataset we have. I would love to substitute tornado duration for track length, but duration is not recorded in this dataset (nor is it even reasonably known for most tornadoes prior to the advent of WSR-88D). Some may suggest that F/EF rating be used as a metric. I can see some arguments for that, but I find the cons too costly: namely, it would tend to "reward" tornadoes striking populated areas that are less desirable to chase (e.g., the I-35 corridor from DFW to Wichita). If you have other ideas, I'd love to hear them!
Right now, the seven metrics are weighted as follows:
|# tor >4 mi.||10%|
|# tor >15 mi.||5%|
|days tor >4 mi.||25%|
|days tor >15 mi.||20%|
Note that the "day counts" are given much more weight than the "total counts." Thus, a huge single-day outbreak is strongly inhibited from inflating a score on its own, provided there are other years with activity spread out more evenly.
So, how exactly are the scores formulated after accounting for the weights? For each metric, the maximum and minimum values for 1951-present are determined. Then, each year's value is normalized over that range. Consider the category "total tornado count" for Kansas in May. Suppose hypothetically that the most active year had 200 tornadoes, while the least active had 20. Then each year's score for that category would be computed as: 100 * [(tornado_count - 20) / (200 - 20)]. The best year(s) will receive a 100 for that category (assuming any year has a non-zero count), while the worst year(s) will receive a 0. The other years will receive a score proportional to their placement in the range [20, 200]. So a year with 110 tornadoes, exactly halfway between the minimum (20) and maximum (200), would receive a score of 50 -- for the total tornado count category (which is weighted at 5% in the total score, per the table above).
Scores have been computed for several regions and states within the Plains. This is a first stab at a scoring system, so there's plenty of room for improvement. In states where only parts are chaseable (NM, CO, WY, MT due to mountains; TX, OK, IA due to forests and hills), lines of latitude or longitude are used to remove activity in the unchaseable portion. More precise delineations could be made in future revisions. Also, I completely omitted states like MN and MO where only a sliver is chaseable, so they could be added in later.
On each chart, you can see the states included in the scoring on the right-hand side. If any included state(s) has portions removed, the lat/lon criteria for the area included in the scoring will be specified.
I've arbitrarily required that tornado reports fall between 11am and 9pm LST, which is equivalent to noon-10pm LDT during Daylight Saving Time. This is also quite crude; some well-after-dark reports are no doubt included in the early spring and fall, while a few "dusk" reports from the northern Plains in the summer may be unnecessarily removed. This is another aspect to be refined in the future when I have more time.
To account for report inflation over the past several decades, a linear regression was performed on scores for each of the seven criteria for the entire Plains over March-June. A significant upward trend with time was found for three metrics: (1) total tornado count, (2) total >=4 mi. path tornado count, and (3) days with significant hail. The other metrics are too sparse (i.e., the events are too rare) to identify a stable trend over a 60-year dataset; plus, longer-track tornadoes were more likely to be noticed even in the old days, so inflation should not be too big a problem.
For the three criteria with an identifiable, stable trend, the linear regression equations were noted. The trendlines from these equations were applied to all regions and times of year, despite being valid specifically for the Plains over the spring season. It is not feasible to calculate linear regressions for smaller spatiotemporal windows, because again, the data becomes too sparse.
However, the inflation-normalization was relaxed by 50%, so calculating a new linear regression on the final scores presented here would still show modest inflation over time. Why did I do this? Because, unfortunately, there are tradeoffs that must be made when choosing how to account for inflation. A raw linear regression suggests, for example, that we should almost double the tornado count from 1951 in order to make direct comparisons with the count from 2010. However, there are several years in the 1950s and 1960s with tornado counts rivaling the most active recent years. Is it likely that such years (see 1965, for example) were twice as active as, say, 2007? Probably not; again, the more active and significant an event or period, the more likely it would be well-reported even in the distant past. (One could argue that this is only justification for relaxing the trendlines if most of the active early years owed mainly to big tornadoes and big outbreaks, rather than incessant mediocre events; I've decided to assume this is the case). My suspicion is that the bottom-ranked early years weren't quite as bad as their scores indicate, but scores for the higher-ranked early years are fairly realistic.
In the end, I decided to play it safe and use a middle-of-the-road approach to accounting for inflation. It's possible that early years in the dataset deserve to be scored even higher, in some cases, but choosing a formula that would fill the top 5 rankings with early years seemed suspect to me.