WO2017083568A1

WO2017083568A1 - Estimating or forecasting health condition prevalence in a definable area and associated costs and return on investment of interventions

Info

Publication number: WO2017083568A1
Application number: PCT/US2016/061407
Authority: WO
Inventors: James Alexander PHILP; Brian Margulies STEELE; Frederick L. SPATARO; Taurean Jerome WEBER; Grant MacDonald FRAME; Jordan Scott LARSON
Original assignee: Upstream Health Systems, Inc.
Priority date: 2015-11-13
Filing date: 2016-11-10
Publication date: 2017-05-18

Abstract

A system provides estimates and forecasts for prevalence and associated cost in a population per user selectable criteria including user selected characteristics, conditions or traits, and a user definable geographic area. The system employs data from data sets of different granularities, for example a first data set that represents data at a first geographic level (e.g., census blocks) and a second data set that represents data at a second geographic level (e.g., county, parish). The system can generate a third or static data set, searchable by a given key (e.g., census block group code). Data can be related to health conditions (e.g., type 2 diabetes, Lung Cancer), and can include demographic and/or non-demographic data, as well as representing immutable and/or mutable characteristics or traits. Such advantageously allows automated or autonomous analysis between non-homogenous data sets.

Description

ESTIMATING OR FORECASTING HEALTH CONDITION PREVALENCE IN A DEFINABLE AREA AND ASSOCIATED COSTS AND RETURN ON INVESTMENT OF INTERVENTIONS

BACKGROUND Technical Field

The present disclosure generally relates to systems, methods and articles that automate estimations.

Description of the Related Art

There are numerous collections of data (i.e., data sets) that represent physical characteristics, conditions or traits of people and/or objects. The data sets typically represent a sampling of a subset of a general population, the sampling representative of physical characteristics, conditions or traits of the sampled population at a time of the sampling. One example of a sampling is the United States Census, in which data is collected from a wide sampling of the general population of the United States. A smallest unit at which the collected data is typically represented is referred to as a "census block" or census block group." Census blocks are identified by the U.S. Census Bureau as statistical areas bounded by visible features such as roads, streams, and railroad tracks, and by nonvisible boundaries such as property lines, city, township, school district, county limits and short line-of-sight extensions of roads. In urban areas, a census block can comprise a city block, while in rural areas a census block is often an irregular area bounded by natural features (e.g., streams) and roads. Census blocks are not delineated by population, and some may even have no inhabitants. There were 11, 155,486 census blocks in the 2010 U.S. Census.

The data sets are typically stored in or on nontransitory computer- or processor-readable media, allowing various statistical analyses to be performed via programmed computers or processors (e.g., digital microprocessors, digital signal processors, graphical processing units). A collection of people or objects can be referred to as a population. A population can be spatially distributed, for example across various geographic regions or sub-regions. Populations may be divided into various sub-populations based on some defining characteristic, condition or trait.

Various approaches can be employed in estimating a prevalence of a certain physical characteristic, condition or trait in a population or sub-population, the estimation representing a number or percentage of the population or sub-population estimated to have or possess the characteristic, condition or trait at the time of the sampling. Various approaches can be employed in forecasting a prevalence of a certain characteristic, condition or trait in a population or sub-population, the estimation representing a number or percentage of the population or sub-population forecasted to have or possess the characteristic, condition or trait at a time other than the time of sampling (e.g., in the future).

Often two or more data sets can have respective levels of granularity which are different from one another, for example different populations or sub- populations, or different geographic regions. While having different granularities, at least some of the information or data from one of the data sets may be of interest or potentially relevant to information from another one of the data sets. For example, there may be a known or even a suspected correlation between a first physical characteristic, condition or trait represented in a first data set and a second physical characteristic, condition or trait represented in a second data set.

The ability to automatically perform analysis using data from "heterogeneous" data sets is highly desirable.

BRIEF SUMMARY

Various apparatus, methods and articles described herein allow or facilitate analysis using data from "heterogeneous" data sets. Such can be implemented via provisions of services in a hosted service or software as a service (SaaS) networked environment. Alternatively, such can be implemented on end user operated processor- based devices, e.g., desktop computers, workstation computers, laptop computers, tablet computers, etc. Data sets can be stored locally or stored remotely from a processor- based device. A system can, for example, generate a "unified" third data set from two or more heterogeneous data sets or implement a federated data set or database from two or more heterogeneous and autonomous data sets or databases.

A method of operation in a system that comprises at least one processor and at least non-transitory processor-readable medium communicatively coupled to the at least one processor and that stores at least one of processor-executable instructions or data may be summarized as including: accessing a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region (e.g., Census block group); accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region (e.g., county), where the second type of geographic region is different from the first type of geographic region for at least one instance of a pair of geographic regions of the first and second types; and generating a third (e.g., static) data set, by at least one processor, the third data set searchable by a geographic region key (e.g., Census block group code) that corresponds to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region (e.g., Census block group), the data set including data regarding at least one demographic characteristic and at least one health-related characteristic representative of a population associated with the respective geographic region, where the at least one health-related characteristic is different from the at least one demographic characteristic. The characteristics, conditions or traits can include demographic characteristics, conditions or traits, non- demographic characteristics, conditions or traits, immutable characteristics, conditions or traits, and/or mutable characteristics, conditions or traits.

Generating a third data set may include fitting a predictive model to the data of the first and the second data sets, by the at least one processor. Fitting a predictive model to the data of the first and the second data sets may include performing a binomial regression on the data of the first and the second data sets, by the at least one processor. The geographic regions of the second type may encompass respective sets of the geographic regions of the first type. Accessing a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region may include accessing a first data set that comprises demographic data for a respective population of each of a plurality of U.S. census block-groups. Accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region may include accessing a second data set that comprises health-related data for a respective population of each of a plurality of counties. Accessing a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region may include accessing a first data set that comprises data regarding at least one immutable characteristic. Accessing a first data set that comprises data regarding at least one immutable characteristic may include accessing a first data set that comprises data regarding at least one of: a gender, an ethnicity, or an age class for individuals in the respective population of each of a plurality of geographic regions of a first type of geographic region. Accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region may include accessing a second data set that comprises data regarding at least one non-demographic mutable characteristic. Accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region may include accessing a second data set that comprises data regarding at least one mutable characteristic. The data regarding at least one mutable characteristic may include data regarding at least one of: an obesity level or an activity level for individuals in the respective population of the geographic region of the second type of geographic region. Accessing a second data set that comprises data regarding at least one mutable characteristic may include accessing a second data set that comprises data regarding at least one of: a percentage of the respective geographic area of the second type of geographic area that is rural, a total population of the respective geographic area of the second type of geographic area, or a total number of individuals by age class that have a defined condition. The method may further include: determining a bias correction for the data of the third data set, by the at least one processor. The method may further include: determining at least one confidence interval for the data of the third data set, by the at least one processor. The demographic data of the first data set may represent a baseline period of time, and the method may further include: receiving user input that specifies an inquiry period of time, the inquiry period of time different from the baseline period of time; and determining a prevalence of a health condition in at least a portion of the population for the inquiry period of time. The determining a prevalence of a health condition in at least a portion of the population for the inquiry period of time may include accumulating a prevalence over multiple sub-periods of the inquiry period. The method may further include: receiving user input that specifies a geographic area that overlaps a boundary of at least one of the geographic regions of the first type of geographic region; and determining a prevalence of a health condition in at least a portion of the population for the geographic area specified by the user input. Receiving user input may include: a plurality of selections of points or lines on a map that define a polygon, and may further include: converting the user input that defines a polygon into a number of geographic region key values. The method may further include: receiving user input that specifies a geographic area that encompasses at least a portion of at least two of the geographic regions of the first type of geographic region; and determining a prevalence of a health condition in at least a portion of the population for the geographic area specified by the user input. Receiving user input may include: a plurality of selections of points or lines on a map that define a polygon, and may further include: converting the user input that defines a polygon into a number of geographic region key values. The demographic data of the first data set may represent a baseline period of time, and the method may further include: receiving user input that specifies an inquiry period of time, the inquiry period of time different from the baseline period of time; and determining a rate of change in a prevalence of a health condition in at least a portion of the population for the inquiry period of time.

A system may be summarized as including: at least one processor; and at least non-transitory processor-readable medium communicatively coupled to the at least processor and that stores at least one of processor-executable instructions or data, execution of which causes the at least one processor to: access a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region; access a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, where the second type of geographic region is different from the first type of geographic region for at least one instance of a pair of geographic regions of the first and second types; and generate a third data set, the third data set searchable by a geographic region key that corresponds to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region, the data set including data regarding at least one demographic characteristic and at least one health-related characteristic representative of a population associated with the respective geographic region, where the at least one health-related characteristic is different from the at least one demographic characteristic.

To generate a third data set the at least one processor may fit a predictive model to the data of the first and the second data sets. To fit a predictive model to the data of the first and the second data sets the at least one processor may perform a binomial regression on the data of the first and the second data sets. The geographic regions of the second type may encompass respective sets of the geographic regions of the first type. The geographic regions of the first type may be U.S. census block- groups. The geographic regions of the second type may be counties. The data regarding at least one demographic characteristic may include data regarding at least one immutable characteristic. The data regarding at least one immutable characteristic may include data regarding at least one of: a gender, an ethnicity, or an age class for individuals in the population or sub-population. The data regarding at least one health- related characteristic may include data regarding at least one non-demographic mutable characteristic. The data regarding at least one health-related characteristic may include data regarding at least one mutable characteristic. The data regarding at least one mutable characteristic may include data regarding at least one of: an obesity level or an activity level for individuals in the respective population of the geographic region of the second type of geographic region. The data regarding at least one mutable characteristic may include data regarding at least one of: a percentage of the respective geographic area of the second type of geographic area that is rural, a total population of the respective geographic area of the second type of geographic area, or a total number of individuals by age class that have a defined condition. The at least one processor may further: determine a bias correction for the data of the third data set, by the at least one processor. The at least one processor may further: determine at least one confidence interval for the data of the third data set, by the at least one processor. The demographic data of the first data set may represent a baseline period of time, and the at least one processor may further: receive user input that specifies an inquiry period of time, the inquiry period of time different from the baseline period of time; and determine a prevalence of a health condition in at least a portion of the population for the inquiry period of time. To determine a prevalence of a health condition in at least a portion of the population for the inquiry period of time the at least one processor may accumulate a prevalence over multiple sub-periods of the inquiry period. The at least one processor may further: receive user input that specifies a geographic area that overlaps a boundary of at least one of the geographic regions of the first type of geographic region; and determine a prevalence of a health condition in at least a portion of the population for the geographic area specified by the user input. The at least one processor may receive user input as a plurality of selections of points or lines on a map that define a polygon, and may further: convert the user input that defines a polygon into a number of geographic region key values. The at least one processor may receive user input that specifies a geographic area that encompasses at least a portion of at least two of the geographic regions of the first type of geographic region, and may further: determine a prevalence of a health condition in at least a portion of the population for the geographic area specified by the user input. The at least one processor may receive user input as a plurality of selections of points or lines on a map that define a polygon, and may further: convert the user input that defines a polygon into a number of geographic region key values. The demographic data of the first data set may represent a baseline period of time, and the at least one processor may further: receive user input that specifies an inquiry period of time, the inquiry period of time different from the baseline period of time; and determine a rate of change in a prevalence of a health condition in at least a portion of the population for the inquiry period of time.

A method of operation in a system that comprises at least one processor and at least one non-transitory processor-readable medium communicatively coupled to the at least one processor and that stores at least one of processor-executable instructions or data may be summarized as including: accessing a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region (e.g., Census block group); accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region (e.g., county), where the second type of geographic region is different from the first type of geographic region for at least one instance of a pair of geographic regions of the first and second types; and generating a third (e.g., static) data set, by at least one processor, the third data set searchable by a geographic region key (e.g., Census block group code) that corresponds to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region (e.g., Census block group), the data set including data regarding at least one demographic characteristic and at least one health-related characteristic representative of a population associated with the respective geographic region, where the at least one health-related characteristic is different from the at least one demographic characteristic. The characteristics, conditions or traits can include demographic characteristics, conditions or traits, non- demographic characteristics, conditions or traits, immutable characteristics, conditions or traits, and/or mutable characteristics, conditions or traits.

Generating a third data set may include fitting a predictive model to the data of the first and the second data sets, by the at least one processor. Fitting a predictive model to the data of the first and the second data sets may include performing a binomial regression on the data of the first and the second data sets, by the at least one processor. The geographic regions of the second type may encompass respective sets of the geographic regions of the first type. Accessing a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region may include accessing a first data set that comprises demographic data for a respective population of each of a plurality of U.S. census block-groups. Accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region may include accessing a second data set that comprises health-related data for a respective population of each of a plurality of counties. Accessing a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region may include accessing a first data set that comprises data regarding at least one immutable characteristic. Accessing a first data set that comprises data regarding at least one immutable characteristic may include accessing a first data set that comprises data regarding at least one of: a gender, an ethnicity, or an age class for individuals in the respective population of each of a plurality of geographic regions of a first type of geographic region. Accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region may include accessing a second data set that comprises data regarding at least one non-demographic mutable characteristic. Accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region may include accessing a second data set that comprises data regarding at least one mutable characteristic. The data regarding at least one mutable characteristic may include data regarding at least one of: an obesity level or an activity level for individuals in the respective population of the geographic region of the second type of geographic region. Accessing a second data set that comprises data regarding at least one mutable characteristic may include accessing a second data set that comprises data regarding at least one of: a percentage of the respective geographic area of the second type of geographic area that is rural, a total population of the respective geographic area of the second type of geographic area, or a total number of individuals by age class that have a defined condition. The method may further include: determining a bias correction for the data of the third data set, by the at least one processor. The method may further include: determining at least one confidence interval for the data of the third data set, by the at least one processor. The demographic data of the first data set may represent a baseline period of time, and the method may further include: receiving user input that specifies an inquiry period of time, the inquiry period of time different from the baseline period of time; and determining an estimated prevalence of a health condition in at least a portion of the population for the inquiry period of time. The determining an estimated prevalence of a health condition in at least a portion of the population for the inquiry period of time may include accumulating an estimated prevalence over multiple sub-periods of the inquiry period. The method may further include: receiving user input that specifies a geographic area that overlaps a boundary of at least one of the geographic regions of the first type of geographic region; and determining an estimated prevalence of a health condition in at least a portion of the population for the geographic area specified by the user input. Receiving user input may include: a plurality of selections of points or lines on a map that define a polygon, and may further include: converting the user input that defines a polygon into a number of geographic region key values. The method may further include: receiving user input that specifies a geographic area that encompasses at least a portion of at least two of the geographic regions of the first type of geographic region; and determining an estimated prevalence of a health condition in at least a portion of the population for the geographic area specified by the user input. Receiving user input may include: a plurality of selections of points or lines on a map that define a polygon, and may further include: converting the user input that defines a polygon into a number of geographic region key values. The demographic data of the first data set may represent a baseline period of time, and the method may further include: receiving user input that specifies an inquiry period of time, the inquiry period of time different from the baseline period of time; and determining a rate of change in an estimated prevalence of a health condition in at least a portion of the population for the inquiry period of time. A system may be summarized as including: at least one processor; and at least one non-transitory processor-readable medium communicatively coupled to the at least processor and that stores at least one of processor-executable instructions or data, execution of which causes the at least one processor to: access a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region; access a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, where the second type of geographic region is different from the first type of geographic region for at least one instance of a pair of geographic regions of the first and second types; and generate a third data set, the third data set searchable by a geographic region key that corresponds to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region, the data set including data regarding at least one demographic characteristic and at least one health-related characteristic representative of a population associated with the respective geographic region, where the at least one health-related characteristic is different from the at least one demographic characteristic.

To generate a third data set the at least one processor may fit a predictive model to the data of the first and the second data sets. To fit a predictive model to the data of the first and the second data sets the at least one processor may perform a binomial regression on the data of the first and the second data sets. The geographic regions of the second type may encompass respective sets of the geographic regions of the first type. The geographic regions of the first type may be U.S. census block- groups. The geographic regions of the second type may be counties. The data regarding at least one demographic characteristic may include data regarding at least one immutable characteristic. The data regarding at least one immutable characteristic may include data regarding at least one of: a gender, an ethnicity, or an age class for individuals in the population or sub-population. The data regarding at least one health- related characteristic may include data regarding at least one non-demographic mutable characteristic. The data regarding at least one health-related characteristic may include data regarding at least one mutable characteristic. The data regarding at least one mutable characteristic may include data regarding at least one of: an obesity level or an activity level for individuals in the respective population of the geographic region of the second type of geographic region. The data regarding at least one mutable characteristic may include data regarding at least one of: a percentage of the respective geographic area of the second type of geographic area that is rural, a total population of the respective geographic area of the second type of geographic area, or a total number of individuals by age class that have a defined condition. The at least one processor may further: determine a bias correction for the data of the third data set, by the at least one processor. The at least one processor may further: determine at least one confidence interval for the data of the third data set, by the at least one processor. The demographic data of the first data set may represent a baseline period of time, and the at least one processor may further: receive user input that specifies an inquiry period of time, the inquiry period of time different from the baseline period of time; and determine an estimated prevalence of a health condition in at least a portion of the population for the inquiry period of time. To determine an estimated prevalence of a health condition in at least a portion of the population for the inquiry period of time the at least one processor may accumulate an estimated prevalence over multiple sub- periods of the inquiry period. The at least one processor may further: receive user input that specifies a geographic area that overlaps a boundary of at least one of the geographic regions of the first type of geographic region; and determine an estimated prevalence of a health condition in at least a portion of the population for the geographic area specified by the user input. The at least one processor may receive user input as a plurality of selections of points or lines on a map that define a polygon, and may further: convert the user input that defines a polygon into a number of geographic region key values. The at least one processor may receive user input that specifies a geographic area that encompasses at least a portion of at least two of the geographic regions of the first type of geographic region, and may further: determine an estimated prevalence of a health condition in at least a portion of the population for the geographic area specified by the user input. The at least one processor may receive user input as a plurality of selections of points or lines on a map that define a polygon, and may further: convert the user input that defines a polygon into a number of geographic region key values. The demographic data of the first data set may represent a baseline period of time, and the at least one processor may further: receive user input that specifies an inquiry period of time, the inquiry period of time different from the baseline period of time; and determine a rate of change in an estimated prevalence of a health condition in at least a portion of the population for the inquiry period of time.

A system may be summarized as including: at least one processor; and at least one processor-readable medium that stores at least one of processor executable instructions or data, which when executed by the at least one processor cause the at least one processor to: generate at least a first boundary by encoding at least one user definition of at least a first user defined geographical region; compare at least one portion of at least the first boundary of at least the first user defined geographical region to at least population data corresponding to at least the first user defined geographical region to produce at least a first estimated population of at least the first user defined geographical region; and convert at least the first estimated population of at least the first user defined geographical region to at least a first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region, at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region being at least an estimated or forecasted count of at least a first estimated or forecasted number of patients affected by at least a first health condition and that at least reside in at least the first user defined geographical region.

The at least one processor may convert at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on at least a first rate of at least the first health condition in at least the first estimated population of at least the first user defined geographical region. The at least one processor may convert at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on at least respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region. The respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region may be at least respective estimated prevalences of the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region. The first health condition may be at least one of type 2 diabetes or chronic lung disease. The respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region may be at least respective numbers of hospitalizations attributable to at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region. The first health condition may be at least one of coronary artery disease or stroke. The respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region may be at least respective incidences of the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region. The first health condition may be prostate cancer. The at least one processor may further select each census block at least partially contained in at least the first user defined geographical region. The at least one processor may further reduce, for each respective census block that is partially yet not fully contained in at least the first user defined geographical region, at least one respective attribute associated with the respective census block by a proportion that at least corresponds to at least a first percentage of area of the respective census block that is contained in at least the first user defined geographical region. The at least one respective attribute associated with the respective census block may be at least one of a respective population of the respective census block and a respective estimated population of the respective census block. The at least one respective attribute associated with the respective census block may be at least one of a respective first health condition rate of the respective census block for at least the first health condition and a respective estimated or forecasted first health condition rate of the respective census block for at least the first health condition. The at least one processor may further convert at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region to at least a first estimated or forecasted annual cost of treating at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region. The at least one processor may convert at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted annual cost of treating at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on at least a first estimated or forecasted cost of treating a given individual affected by at least the first health condition. The at least one processor may further obtain at least the first estimated or forecasted annual cost of treating the given individual affected by at least the first health condition from at least one user. The at least one processor-readable medium may further store at least one of processor executable instructions or data, which when executed by the at least one processor may further cause the at least one processor to: access, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count, a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region; access, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count, a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, where the second type of geographic region is different from the first type of geographic region for at least one instance of a pair of geographic regions of the first and second types; and generate, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count, a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, a third data set, by at least one processor, the third data set searchable by a geographic region key that corresponds to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region, the data set including data regarding at least one demographic characteristic and at least one health-related characteristic representative of a population associated with the respective geographic region, where the at least one health-related characteristic is different from the at least one demographic characteristic, wherein the at least one processor may convert at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on the third data set searchable by the geographic region key that corresponds to the geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region.

A method may be summarized as including: generating, by at least one processor, at least a first boundary by encoding at least one user definition of at least a first user defined geographical region; comparing, by the at least one processor, at least one portion of at least the first boundary of at least the first user defined geographical region to at least population data corresponding to at least the first user defined geographical region to produce at least a first estimated population of at least the first user defined geographical region; and converting, by the at least one processor, at least the first estimated population of at least the first user defined geographical region to at least a first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region, at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region being at least an estimated or forecasted count of at least a first estimated or forecasted number of patients affected by at least a first health condition and that at least reside in at least the first user defined geographical region.

Converting at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region may include converting, by the at least one processor, at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on at least a first rate of at least the first health condition in at least the first estimated population of at least the first user defined geographical region. Converting at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region may include converting, by the at least one processor, at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on at least respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region. The respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region may be at least respective estimated prevalences of the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region. The first health condition may be at least one of type 2 diabetes or chronic lung disease. The respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region may be at least respective numbers of hospitalizations attributable to at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region. The first health condition may be at least one of coronary artery disease or stroke. The respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region may be at least respective incidences of the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region. The first health condition may be prostate cancer.

The method may further include selecting, by the at least one processor, each census block at least partially contained in at least the first user defined geographical region.

The method may further include reducing, by the at least one processor, for each respective census block that is partially yet not fully contained in at least the first user defined geographical region, at least one respective attribute associated with the respective census block by a proportion that at least corresponds to at least a first percentage of area of the respective census block that is contained in at least the first user defined geographical region.

The at least one respective attribute associated with the respective census block may be at least one of a respective population of the respective census block and a respective estimated population of the respective census block. The at least one respective attribute associated with the respective census block may be at least one of a respective first health condition rate of the respective census block for at least the first health condition and a respective estimate first health condition rate of the respective census block for at least the first health condition.

The method may further include converting, by the at least one processor, at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region to at least a first estimate annual cost of treating at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region.

Converting at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted annual cost of treating at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region may include converting, by the at least one processor, at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted annual cost of treating at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on at least a first estimated or forecasted cost of treating a given individual affected by at least the first health condition. The at least one processor may further obtain at least the first estimated or forecasted annual cost of treating the given individual affected by at least the first health condition from at least one user.

The method may further include: accessing, at least prior to converting at least the first estimated population to at least the first estimated or forecasted health condition patient count, a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region; accessing, at least prior to converting at least the first estimated population to at least the first estimated or forecasted health condition patient count, a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, where the second type of geographic region is different from the first type of geographic region for at least one instance of a pair of geographic regions of the first and the second types; and generating, at least prior to converting at least the first estimated population to at least the first estimated or forecasted health condition patient count, a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, a third data set, by at least one processor, the third data set searchable by a geographic region key that corresponds to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region, the data set including data regarding at least one demographic characteristic and at least one health-related characteristic representative of a population associated with the respective geographic region, where the at least one health-related characteristic is different from the at least one demographic characteristic, wherein converting at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region may be based at least in part on the third data set searchable by the geographic region key that corresponds to the geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region.

Generating a third data set may include fitting a predictive model to the data of the first and the second data sets, by the at least one processor. Fitting a predictive model to the data of the first and the second data sets may include performing a binomial regression on the data of the first and the second data sets, by the at least one processor. The geographic regions of the second type may encompass respective sets of the geographic regions of the first type. Accessing a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region may include accessing a first data set that comprises demographic data for a respective population of each of a plurality of U.S. census block-groups. Accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region may include accessing a second data set that comprises health-related data for a respective population of each of a plurality of counties. Accessing a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region may include accessing a first data set that comprises data regarding at least one immutable characteristic. Accessing a first data set that comprises data regarding at least one immutable characteristic may include accessing a first data set that comprises data regarding at least one of: a gender, an ethnicity, or an age class for individuals in the respective population of each of a plurality of geographic regions of a first type of geographic region. Accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region may include accessing a second data set that comprises data regarding at least one non-demographic mutable characteristic. Accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region may include accessing a second data set that comprises data regarding at least one mutable characteristic. The data regarding at least one mutable characteristic may include data regarding at least one of: an obesity level or an activity level for individuals in the respective population of the geographic region of the second type of geographic region. Accessing a second data set that comprises data regarding at least one mutable characteristic may include accessing a second data set that comprises data regarding at least one of: a percentage of the respective geographic area of the second type of geographic area that is rural, a total population of the respective geographic area of the second type of geographic area, or a total number of individuals by age class that have a defined condition. The method may further include: determining a bias correction for the data of the third data set, by the at least one processor. The method may further include: determining at least one confidence interval for the data of the third data set, by the at least one processor. The demographic data of the first data set may represent a baseline period of time, and the method may further include: receiving user input that specifies an inquiry period of time, the inquiry period of time different from the baseline period of time; and determining a prevalence of a health condition in at least a portion of the population for the inquiry period of time. The determining an estimated prevalence of a health condition in at least a portion of the population for the inquiry period of time may include accumulating an estimated prevalence over multiple sub- periods of the inquiry period. The method may further include: receiving user input that specifies a geographic area that overlaps a boundary of at least one of the geographic regions of the first type of geographic region; and determining an estimated prevalence of a health condition in at least a portion of the population for the geographic area specified by the user input. Receiving user input may include: a plurality of selections of points or lines on a map that define a polygon, and may further include: converting the user input that defines a polygon into a number of geographic region key values. The method may further include: receiving user input that specifies a geographic area that encompasses at least a portion of at least two of the geographic regions of the first type of geographic region; and determining an estimated prevalence of a health condition in at least a portion of the population for the geographic area specified by the user input. Receiving user input may include: a plurality of selections of points or lines on a map that define a polygon, and may further include: converting the user input that defines a polygon into a number of geographic region key values. The demographic data of the first data set may represent a baseline period of time, and the method may further include: receiving user input that specifies an inquiry period of time, the inquiry period of time different from the baseline period of time; and determining a rate of change in an estimated prevalence of a health condition in at least a portion of the population for the inquiry period of time.

A system may be summarized as including: at least one processor; and at least one non-transitory processor-readable medium communicatively coupled to the at least processor and that stores at least one of processor-executable instructions or data, execution of which causes the at least one processor to: access a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region; access a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, where the second type of geographic region is different from the first type of geographic region for at least one instance of a pair of geographic regions of the first and second types; and generate a third data set, the third data set searchable by a geographic region key that corresponds to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region, the data set including data regarding at least one demographic characteristic and at least one health-related characteristic representative of a population associated with the respective geographic region, where the at least one health-related characteristic is different from the at least one demographic characteristic.

The method may further include reducing, by the at least one processor, for each respective census block that is partially yet not fully contained in at least the first user defined geographical region, at least one respective attribute associated with the respective census block by a proportion that at least corresponds to at least a first percentage of area of the respective census block that is contained in at least the first user defined geographical region. The at least one respective attribute associated with the respective census block may be at least one of a respective population of the respective census block and a respective estimated population of the respective census block. The at least one respective attribute associated with the respective census block may be at least one of a respective first health condition rate of the respective census block for at least the first health condition and a respective estimate first health condition rate of the respective census block for at least the first health condition.

A system may be summarized as including: at least one processor; and at least one processor-readable medium that stores at least one of processor executable instructions or data, which when executed by the processor cause the at least one processor to: estimate or forecast a prevalence of a first health condition in a population of a first user selected or defined geographic region for a first time period; estimate or forecast a prevalence of the first health condition in the population of the first user selected or defined geographic region for a second time period, the second time period different than the first time period; estimate or forecast a total number of new cases of the first health condition that will occur during the second period; estimate or forecast a portion of the first population prone to the first health condition based on the estimated or forecasted total number of new cases of the first health condition that will occur during the second period; and estimate or forecast a net present value of a defined intervention, the defined intervention which inhibits an onset of the first health condition in at least the portion of the first population prone to the first health condition.

The first health condition may be type 2 diabetes and the at least one processor may estimate or forecast a portion of the first population which is prediabetes based on an estimated or forecasted total number of new cases of type 2 diabetes that will occur during the second period. The at least one processor may estimate or forecast a portion of the first population which is pre-diabetes based on a percentage of the estimated or forecasted total number of new cases of type 2 diabetes that will occur during the second period, the percentage being a value between 5% and 10% inclusive. To estimate or forecast a total number of new cases of the first health condition that will occur during the second period the at least one processor may determine a difference between an estimated or forecasted prevalence in the second period and an estimated or forecasted prevalence in the first period, and may determine a product of the difference and a size of the population. To estimate a net present value of a defined intervention the at least one processor may determine a per capita net present value. To determine a per capita net present value the at least one processor may

where y is a total number of years and i is a discounted value of future money, and fi to f_y = c*r*a, where c is an annual cost of treating the first health condition, r is the defined intervention's rate of risk reduction and a is a rate at which people with a second health condition that is a precursor to the first health condition develop the first health condition. The at least one processor may further determine an estimated or forecasted return on investment for the defined intervention. To determine an estimate return on investment for the defined intervention the at least one processor may further determine h = v/f₀ where h is return on investment, v is the per capita net present value, and f₀ is a cost per person of the defined intervention. The at least one processor may further determine w = v*b where w is total present savings of the defined intervention, v is the per capita net present value, and b is a number of people with the second health condition that is the precursor to the first health condition. The at least one processor may further receive user input that specifies the first user selected or defined geographic region. The first user selected or defined geographic region may be a user defined geographic area. The at least one processor may receive a plurality of user selections of points or lines on a map that define a polygon, and may further convert the user input that defines a polygon into a number of geographic region key values. The first health condition may be chronic lung disease. The at least one processor may further: obtain at least one mutable ancillary characteristic; and convert the at least one mutable ancillary characteristic into an estimated or forecasted portion of the population of the first user selected or defined geographic region with a particular set of attribute values and with the first health condition. The at least one mutable ancillary characteristic may include at least one of an obesity level and an exercise level. The at least one processor may obtain the at least one mutable ancillary characteristic from a user. The at least one processor may further convert the estimated or forecasted portion of the population of the first user selected or defined geographic region into an estimated or forecasted prevalence of the first health condition in the subpopulation of the population of the first user selected or defined geographic region for the second time period. The at least one processor-readable medium may further store at least one of processor executable instructions or data, which when executed by the at least one processor may further cause the at least one processor to: generate, at least prior to estimation or forecast of the prevalence of the first health condition in the population of the first user selected or defined geographic region for the first time period, at least a first boundary by encoding at least one user definition of at least a first user defined geographical region; compare, at least prior to estimation or forecast of the prevalence of the first health condition in the population of the first user selected or defined geographic region for the first time period, at least one portion of at least the first boundary of at least the first user defined geographical region to at least population data corresponding to at least the first user defined geographical region to produce at least a first estimate population of at least the first user defined geographical region; and convert, at least prior to estimation or forecast of the prevalence of the first health condition in the population of the first user selected or defined geographic region for the first time period, at least the first estimated population of at least the first user defined geographical region to at least a first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region, at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region being at least an estimated or forecasted count of at least a first estimated or forecast number of patients affected by at least a first health condition and that at least reside in at least the first user defined geographical region, wherein the at least one processor may estimate or forecast the prevalence of the first health condition in the population of the first user selected or defined geographic region for the first time period based at least in part on at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region. The at least one processor-readable medium may further store at least one of processor executable instructions or data, which when executed by the at least one processor may further cause the at least one processor to: access, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count, a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region; access, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count, a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, where the second type of geographic region is different from the first type of geographic region for at least one instance of a pair of geographic regions of the first and second types; and generate, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count, a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, a third data set, by at least one processor, the third data set searchable by a geographic region key that corresponds to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region, the data set including data regarding at least one demographic characteristic and at least one health-related characteristic representative of a population associated with the respective geographic region, where the at least one health-related characteristic is different from the at least one demographic characteristic, wherein the at least one processor may convert at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on the third data set searchable by the geographic region key that corresponds to the geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region.

A system may be summarized as including: at least one processor; and at least one processor-readable medium that stores at least one of processor executable instructions or data, which when executed by the processor cause the at least one processor to: estimate or forecast a first health condition patient count in a population of a user selected or defined geographic region for a first time period, the first health condition count being a count of a number of patients affected by a first health condition; estimate or forecast a first health condition patient count in the population of the user selected or defined geographic region for a second time period, the second time period different than the first time period; and estimate or forecast a net present value of a defined intervention, the defined intervention which inhibits an onset of the first health condition in at least a portion of the population of the user selected or defined geographic region.

The at least one processor may further: estimate or forecast a total number of new cases of the first health condition that will occur during the second period; and estimate or forecast a portion of the first population prone to the first health condition based on the estimated or forecasted total number of new cases of the first health condition that will occur during the second period. The at least one processor may estimate or forecast the first health condition patient count in the population of the user selected or defined geographic region for the second time period by converting a rate of the first health condition in the population of the user selected or defined geographic region into the first health condition patient count in the population of the user selected or defined geographic region for the second time period.

The rate of the first health condition in the population of the user selected or defined geographic region may be a prevalence of the first health condition in the population of the user selected or defined geographic region. The first health condition may be at least one of type 2 diabetes or chronic lung disease. The rate of the first health condition in the population of the user selected or defined geographic region may be a number of hospitalizations attributable to the first health condition in at least a portion of the population of the user selected or defined geographic region. The first health condition may be at least one of coronary artery disease or stroke. The rate of the first health condition in the population of the user selected or defined geographic region may be an incidence of the first health condition in the population of the user selected or defined geographic region. The first health condition may be prostate cancer. The at least one processor may further estimate an annual cost of treating every individual affected by the first health condition in the population of the user selected or defined geographic region by converting at least an estimated or forecasted annual treatment cost per individual into the annual cost of treating every individual affected by the first health condition in the population of the user selected or defined geographic region. The at least one processor may further convert at least the annual cost of treating every individual affected by the first health condition in the population of the user selected or defined geographic region and an estimated or forecasted cost of the defined intervention into an estimated or forecasted cost savings of implementing the defined intervention in the population of the user selected or defined geographic region. The at least one processor may further convert at least the annual cost of treating every individual affected by the first health condition in the population of the user selected or defined geographic region and the estimated or forecasted net present value of the defined intervention into an estimated or forecasted return on investment for implementing the defined intervention in the population of the user selected or defined geographic region. The at least one processor may further: obtain at least a first mutable ancillary characteristic; and convert at least the first mutable ancillary characteristic into a first estimated or forecasted portion of the population of the user selected or defined geographic region, the first estimated or forecasted portion being a first estimated or forecasted subpopulation of the population of the user selected or defined geographic region, the first estimated or forecasted subpopulation having both a first particular set of attribute values and the first health condition. At least the first mutable ancillary characteristic may include at least one of an obesity level and an exercise level. The at least one processor may obtain at least the first mutable ancillary characteristic from a user. The at least one processor may further convert the first estimated or forecasted portion of the population of the user selected or defined geographic region into an estimated or forecasted first health condition patient count in the first subpopulation of the population of the user selected or defined geographic region for the second time period. The at least one processor may estimate the net present value of the defined intervention by converting the first estimated or forecasted first health condition patient count in the first subpopulation of the population of the user selected or defined geographic region for the second time period into the net present value of the defined intervention. The at least one processor may convert the net present value of the defined intervention into a return on investment of the defined intervention. The at least one processor may further: obtain at least a second mutable ancillary characteristic; convert at least the second mutable ancillary characteristic into a second estimated or forecasted portion of the population of the user selected or defined geographic region, the second estimated or forecasted portion being a second estimated or forecasted subpopulation of the population of the user selected or defined geographic region, the second estimated or forecasted subpopulation having both a second particular set of attribute values and the first health condition; convert the second estimated or forecasted portion of the population of the user selected or defined geographic region into an estimated or forecasted first health condition patient count in the second subpopulation of the population of the user selected or defined geographic region for the second time period; convert the estimated or forecasted first health condition patient count in the second subpopulation of the population of the user selected or defined geographic region for the second time period into a net present value of another defined intervention; and convert the net present value of the other defined intervention into a return on investment of the other defined intervention. The at least one processor may further: compare the return on investment of the defined intervention with the return on investment of the other defined intervention to produce a comparison result; and convert the comparison result into a humanly perceptible indication of the comparison result. The at least one processor-readable medium may further store at least one of processor executable instructions or data, which when executed by the at least one processor may further cause the at least one processor to: generate, at least prior to estimation or forecast of the first health condition patient count in the population of the user selected or defined geographic region for the first time period, at least a first boundary by encoding at least one user definition of at least a first user defined geographical region; compare, at least prior to estimation or forecast of the first health condition patient count in the population of the user selected or defined geographic region for the first time period, at least one portion of at least the first boundary of at least the first user defined geographical region to at least population data corresponding to at least the first user defined geographical region to produce at least a first estimate or forecast population of at least the first user defined geographical region; and convert, at least prior to estimation or forecast of the first health condition patient count in the population of the user selected or defined geographic region for the first time period, at least the first estimated population of at least the first user defined geographical region to at least a first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region for the first time period, at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region for the first time period being at least an estimated or forecasted count of at least a first estimated or forecasted number of patients affected by at least a first health condition and that at least reside in at least the first user defined geographical region for the first time period, wherein the at least one processor may estimate or forecast the first health condition patient count in the population of the user selected or defined geographic region for the first time period based at least in part on at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region for the first time period. The at least one processor-readable medium may further store at least one of processor executable instructions or data, which when executed by the at least one processor may further cause the at least one processor to: access, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region for the first time period, a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region; access, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region for the first time period, a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, where the second type of geographic region is different from the first type of geographic region for at least one instance of a pair of geographic regions of the first and second types; and generate, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region for the first time period, a second data set that comprises health- related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, a third data set, by at least one processor, the third data set searchable by a geographic region key that corresponds to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region, the data set including data regarding at least one demographic characteristic and at least one health-related characteristic representative of a population associated with the respective geographic region, where the at least one health-related characteristic is different from the at least one demographic characteristic, wherein the at least one processor may convert at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region for the first time period based at least in part on the third data set searchable by the geographic region key that corresponds to the geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region.

The at least one processor may receive at least one user selection of at least one county, census tract, zip code or designated place. The at least one processor may receive at least one user selection of at least one county, census tract, zip code or designated place via user selection of at least one location on a map which corresponds to the at least one county, census tract, zip code or designated place.

The at least one processor may: receive a user selection of one or more user selected or defined geographic regions; generate a geographic area report for each of the one or more user selected or defined geographic regions; and present the generated geographic area reports to the user. The at least one processor may generate a geographic area report which includes a health condition by demographics report for at least one health condition and at least one demographic.

The at least one processor may: receive a user selection of claims data for a claims population; and present the selected claims data for the claims population to a user on a map.

The at least one processor may: receive a user selection of health-related data; and present the health-related data to a user on a map. The health-related data may relate to at least one of political data, physical data, social data, hazards data, or disease data. The at least one processor may: receive a user selection of live data; and present the live data to a user on a map. The live data may include at least one of live air quality data or live hazards data. The at least one processor may determine S( '₅) = (p^■ d)n'g ^■ C(l), where S(n'_g^ is total annual, associated health care savings, C(l) is the annual, estimated per capita health care costs per person, p is a rate which reduces incidence as a result of the intervention, and n'_g is the estimated count of the population afflicted with the first health condition in the user selected or defined region. The at least one processor may determine yearly cash flows based at least in part on an estimated cost associated with the defined intervention and an estimated health care savings associated with the defined intervention. The at least one processor may receive a user indication of at least one of a cost of treating the first health condition or a cost of implementing the defined intervention.

The at least one processor may: receives a user selection of health- related data; and present the health-related data to a user on a map. The health-related data may relate to at least one of political data, physical data, social data, hazards data, or disease data.

The at least one processor may: receive a user selection of live data; and present the live data to a user on a map. The live data may include at least one of live air quality data or live hazards data. The at least one processor may determine S( '₅) = (p^■ d)n'g ^■ C(l), where S(n'_g^ is total annual, associated health care savings, C(l) is the annual, estimated per capita health care costs per person, p is a rate which reduces incidence as a result of the intervention, and n'_g is the estimated count of the population afflicted with the first health condition in the user selected or defined region. The at least one processor may determine yearly cash flows based at least in part on an estimated cost associated with the defined intervention and an estimated health care savings associated with the defined intervention. The at least one processor may receive a user indication of at least one of a cost of treating the first health condition or a cost of implementing the defined intervention.

The at least one processor may further receive user input that specifies the first user selected or defined geographic region.

The at least one processor may receive a plurality of user selections of points or lines on a map that define a polygon, and may further convert the user input that defines a polygon into a number of geographic region key values. The at least one processor may receive at least one user selection of at least one county, census tract, zip code or designated place. The at least one processor may receive at least one user selection of at least one county, census tract, zip code or designated place via user selection of at least one location on a map which corresponds to the at least one county, census tract, zip code or designated place.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

In the drawings, identical reference numbers identify similar elements or acts. The sizes and relative positions of elements in the drawings are not necessarily drawn to scale. For example, the shapes of various elements and angles are not necessarily drawn to scale, and some of these elements may be arbitrarily enlarged and positioned to improve drawing legibility. Further, the particular shapes of the elements as drawn are not necessarily intended to convey any information regarding the actual shape of the particular elements, and may have been solely selected for ease of recognition in the drawings.

Figure 1 is a schematic view of a networked processor-based environment to provide services to a plurality of remotely located users via user operated processor-based devices over a network, according to one illustrated implementation.

Figure 2 is a functional block diagram of a hosted service system to produce estimates of a health condition patient count, a physical characteristics population count, and/or conditions or traits population or object count in a population or sub-population, to produce estimates of costs of treatment for the count(s), and to provide such to one or more end user client processor-based devices, according to one illustrated implementation.

Figure 3A is a schematic view of a portion of a map which can be displayed via a user interface to obtain estimates of a health condition patient count, a physical characteristics population count, and/or conditions or traits population or object count in a population or sub-population, according to one illustrated implementation.

Figure 3B is a screenshot showing a portion of a user interface to obtain estimates of a first health condition patient count (e.g., lung cancer incidence) in a population or sub-population, according to one illustrated implementation.

Figure 3C is a screenshot showing a portion of a user interface to obtain estimates of a second health condition patient count (e.g., type 2 diabetes prevalence) in a population or sub-population, according to one illustrated implementation.

Figure 3D is a screenshot showing a portion of a user interface to obtain forecasts of a prevalence of a second health condition (e.g., type 2 diabetes) in a population or sub-population over a period of time (e.g., 10 years), according to one illustrated implementation.

Figure 3E is a screenshot showing a portion of a user interface to obtain estimates and/or forecasts of a prevalence of a health condition in a population or sub- population, illustrating a population data panel with which an end user can specify various characteristics, conditions or traits as part of a query regarding prevalence of a health condition, according to one illustrated implementation.

Figure 3F is a screenshot showing a portion of a user interface to obtain estimates of a patient count for a population or sub-population of a user defined area to obtain estimates or forecasts of respective patient counts for a plurality of health conditions in the population or sub-population in the user defined area, and to obtain estimates or forecasts of respective costs of treatment for the respective patient counts for the plurality of health conditions, according to one illustrated implementation.

Figure 3G is a screenshot showing a portion of a user interface to obtain estimates or forecasts of respective patient counts for a health condition for first and second time periods and to obtain estimates or forecasts of respective expected new counts of the health condition for the next period without implementing an intervention, expected prevented counts of the health condition after implementing the intervention for the next period, total present savings of implementing the intervention, per capital net present value of implementing the intervention, and discounted return on investment of implementing the intervention, according to one illustrated implementation.

Figure 4 is a flow diagram showing a high-level method of operation of providing services to a particular user based on a set of defined criteria of a plurality of users, according to one illustrated implementation.

Figure 5 is a flow diagram showing a method of operation of providing services to a particular user based on a set of defined criteria of a plurality of users, according to illustrated implementations.

Figure 6 shows a low-level method of operation that converts user input into a polygonal geographic region, according to illustrated implementations.

Figure 7 shows a low-level method of operation that converts user input into a polygonal geographic region, produces an estimated population of the polygonal geographic region, and converts the estimated population to an estimated or forecasted health condition patient count in the polygonal geographic region, according to illustrated implementations.

Figure 8 is a flow diagram showing a low-level method of operation that converts an estimated or forecasted health condition patient count to an estimated or forecasted annual cost of treating the estimated or forecasted health condition patient count, according to one illustrated implementation. Figure 9 is a flow diagram showing a low-level method of operation that reduces attributes associated with a geographical region of a given geographical region level or type by a proportion that corresponds to an area of the geographical region contained in a user defined geographical region and converts an estimated population of the user defined geographical region to an estimated or forecasted health condition patient count in the estimate population, according to one illustrated implementation.

Figure 10 is a flow diagram showing a low-level method of operation that estimates or forecasts a number of a population that is prone to a health condition, an annual cost of treating every individual affected by the health condition, and costs, savings, net present value, and return on investment, according to one illustrated implementation.

Figure 11 is a flow diagram showing a low-level method of operation that estimates or forecasts a net present value of an intervention, a per person cost of the intervention, a return on investment of the intervention, and a total present savings of the intervention, according to one illustrated implementation.

Figure 12 is a graph showing a plot of fitted versus predicted prevalence. Figure 13 is a graph showing distributions of prediction errors { _i} - jX ) summarized by state.

Figure 14 is a graph showing a distribution of effects for each of a plurality of variables.

Figure 15 is a graph showing block-group prevalence estimates for 2012 plotted against the CDC prevalence estimates for each of plurality of counties.

Figure 16 is a graph showing a set of block-group prevalence estimates for 2012 plotted against the CDC county prevalence estimates for Montana.

Figure 17 is a graph showing distributions of estimated incidence by state as a set of boxplots.

Figure 18 is a screenshot showing a portion of a user interface to obtain estimates, forecasts and/or prevention information for one or more health conditions in a population or sub-population, according to one illustrated implementation. Figure 19 is a screenshot of a portion of the user interface of Figure 18, showing a geographic area report dialog box, according to one illustrated implementation.

Figure 20 is a screenshot of a portion of the user interface of Figure 18, showing a selection of a plurality of counties for inclusion in a geographic area report, according to one illustrated implementation.

Figure 21 is a screenshot showing an example geographic area report, according to one illustrated implementation.

Figure 22 is a screenshot showing an example geographic area report which provides a graphical display of a disease by demographics, according to one illustrated implementation.

Figure 23 is a screenshot of a portion of the user interface of Figure 18, showing selection of an area of interest, according to one illustrated implementation.

Figure 24 is a screenshot of a portion of the user interface of Figure 18, showing a map which provides claims data from one or more claims populations, according to one illustrated implementation.

Figure 25 is a screenshot of a portion of the user interface of Figure 18, showing a data catalog window, according to one illustrated implementation.

Figure 26 is a screenshot of a portion of the user interface of Figure 18, showing a literature window, according to one illustrated implementation.

Figure 27 is a screenshot of a portion of the user interface of Figure 18, showing a live data window, according to one illustrated implementation.

Figure 28 is a screenshot of a portion of the user interface of Figure 18, showing an Esri ArcGIS™ window, according to one illustrated implementation.

Figure 29 is a graph showing weights assigned to neighbors by the conventional ^-nearest neighbor and exponentially-weighted ^-nearest neighbor algorithms, wherein weights are plotted for the conventional ^-nearest neighbor algorithm for k G {3, 5, 10, 20} (corresponding to weighting 1, 2, 3, and 4), and for the exponentially-weighted ^-nearest neighbor function for a G {.4, .16, .064, .0256} (corresponding to k G { 1, 2, 3, 4}), according to one illustrated implementation. Figure 30 is a graph showing the distribution of county estimates of adult asthma prevalence for the year 2012, where n = 3221, according to one illustrated implementation.

Figure 31 is a graph showing the distribution of forecasted change in adult asthma prevalence from 2012 to 2025, where n = 3221, according to one illustrated implementation.

Figure 32 is a graph showing the distribution of county mean body mass index (kg/m2) for the year 2012, where n = 3221, according to one illustrated implementation.

Figure 33 is a graph showing the estimated change of county mean body mass index (kg/m2) from 2012 to 2025, where n = 3221, according to one illustrated implementation.

Figure 34 is a graph showing county estimates of obesity rate for the year 2012, where n = 3221, according to one illustrated implementation.

Figure 35 is a graph showing the estimated change in obesity rate from

2012 to 2025, where n = 3221, according to one illustrated implementation.

Figure 36 is a graph showing county estimates of mental illness rate for the year 2012, where n = 3221, according to one illustrated implementation.

Figure 37 is a graph showing estimated change in mental illness rate from 2012 to 2025, where n = 3221, according to one illustrated implementation.

DETAILED DESCRIPTION

In the following description, certain specific details are set forth in order to provide a thorough understanding of various disclosed implementations. However, one skilled in the relevant art will recognize that implementations may be practiced without one or more of these specific details, or with other methods, components, materials, etc. In other instances, well-known structures associated with computer systems, server computers, and/or communications networks have not been shown or described in detail to avoid unnecessarily obscuring descriptions of the implementations. Unless the context requires otherwise, throughout the specification and claims which follow, the word "comprise" and variations thereof, such as "comprises" and "comprising," are to be construed in an open, inclusive sense, that is as "including, but not limited to."

Reference throughout this specification to "one implementation" or "an implementation" means that a particular feature, structure or characteristic described in connection with the implementation is included in at least one implementation. Thus, the appearances of the phrases "in one implementation" or "in an implementation" in various places throughout this specification are not necessarily all referring to the same implementation. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more implementations.

As used in this specification and the appended claims, the singular forms "a," "an," and "the" include plural referents unless the content clearly dictates otherwise. It should also be noted that the term "or" is generally employed in its sense including "and/or" unless the content clearly dictates otherwise.

Rates of different health conditions, physical characteristics, conditions, or traits may be measured for a given geographical region in different ways. For example, type 2 diabetes and chronic lung disease may be measured in terms of prevalence. Coronary artery disease and stroke may be measured by number of hospitalizations of Medicare patients per year. Prostate cancer may be measured by incidence, i.e., number of people diagnosed with prostate cancer per year. Unless the context requires otherwise, reference throughout this specification to prevalence or any other specific manner of measuring a given health condition, physical characteristic, condition, or trait is an example implementation that may be implemented with any other way of measuring a given health condition, physical characteristic, condition, or trait instead of prevalence, such as those discussed herein.

The headings and Abstract of the Disclosure provided herein are for convenience only and do not interpret the scope or meaning of the implementations.

Figure 1 shows a networked environment 100 in which hosted services are provided by a service entity 102 to a plurality of end user client entities 104a-104n (only five shown, collectively 104), according to one illustrated implementation. While generally described herein in terms of hosted services or software as a service (SaaS) delivery, the various apparatus, methods and articles described herein can be provided in other models, for instance as a standalone programmed end user processor-based device that executes the algorithms set out herein using one or more locally stored data sets, or as a programmed end user processor-based device that executes the algorithms set out herein using one or more remotely stored data sets. Thus, the description herein shall not be considered as limiting the various implementations to a hosted services model or infrastructure.

The service entity 102 operates one or more hosted services systems 106.

The hosted services system(s) 106 is (are) communicatively coupled or communicatively coupleable via one or more networks 108 to one or more processor- based devices 1 10a- 1 1 On (five shown, collectively 1 10) associated with the end user client entities 104.

The service entity 102 may take any of a variety of forms. For example, the service entity 102 may take the form of an individual or business that provides hosted services, for instance hosted services that allow end user entities to specify criteria to generate estimates and/or forecasts of a prevalence of at least one of one or more physical characteristics, conditions or traits of people and/or objects in a population or sub-population for a given period of time or date and allow end user entities to define a geographical region to generate estimates or forecasts of a health condition patient count, a physical characteristics population count, a condition or trait population count or object count, or any combination thereof in a population or sub- population of the user defined geographical region for a first time period and a different second time period to produce estimates or forecasts of a net present value and/or return on investment of one or more defined interventions for the health condition, the physical characteristics, and/or the condition or trait.

The hosted services advantageously estimate or forecast a number of new counts of a health condition, a physical characteristics, and/or a condition or trait of interest in a user defined geographical region for a given time period (e.g., day, week, biweek, month, bimonth, quarter, half year, year, two years, five years) out from a first time period (e.g., daily, week, biweekly, monthly, bimonthly, quarterly, semi-annually, annually, biannually). Taking the health condition as an example, the hosted services advantageously determine a population or sub-population of the user defined geographical region for the first time period. A health condition patient count in the population or sub-population is determined for the first time period. A forecasted health condition patient count in the population or sub-population is determined for the given time period out from the first time period. The health condition patient count in the population or sub-population for the first time period is subtracted from the forecasted health condition patient count in the population or sub-population for the year out from the first time period to estimate or forecast a number of new counts of the health condition for the population or sub-population for the year out from the first time period.

For example, the health condition patient count in the population or sub- population that is determined for the first time period may be determined based on a prevalence of the health condition for the first time period. The forecasted condition patient count in the population or sub-population for the given time period out from the first time period additionally or alternatively may be determined based on a forecasted prevalence of the health condition for the given time period out from the first time period. In this case, prevalence of the health condition for the first time period is subtracted from the forecasted prevalence of the health condition for the given time period out from the first time period, and the result is multiplied by the population or sub-population to estimate or forecast the number of new counts of the health condition for the population or sub-population for the year out from the first time period.

The hosted services advantageously determine a prone count for the population or sub-population of the user defined geographical region. The hosted services advantageously estimate or forecast a number of people prone to the health condition in the population or sub-population. For example, if the health condition is diabetes and if 5%-10% of prediabetes become full blown diabetics annually, then the hosted services advantageously estimates or forecasts a number of prediabetics in the population or sub-population of the user defined geographical region. For example, the hosted services may divide the number of new counts of the health condition for the population or sub-population for the year out from the first time period by the conservative 5% to estimate or forecast the number of prediabetics in the population or sub-population of the user defined geographical region.

The hosted services may advantageously include estimating or forecasting a per capita net present value (or total population or sub-population net present value) of one or more interventions (e.g., preventive actions or actions to reduce incidence) for the health condition. For example, if a prediabetic's risk of developing diabetes is reduced by 58% responsive to losing 7% of the prediabetic' s weight, then the hosted services advantageously determine 58% is a reduction rate of an intervention (e.g., 7% weight loss) that is able to treat all prediabetics in the user defined geographical region. This is may be input by a user. The hosted services determine a total number of years over which cost of the intervention will be spread. This may be input by the user. The hosted services determine a per person cost of the intervention. This may be input by the user. The hosted services determine an annual cost of treating a given individual with the health condition. This may be input by the user. The hosted services advantageously estimates or forecasts a net present value of the intervention in the population or sub-population of the user defined geographical region from this data. The hosted services advantageously convert the net present value of the intervention into a return on investment of the defined intervention. The hosted services advantageously produce an expected number of new diabetics (or other health condition, physical characteristic, condition, or trait) for next year after the first time period, a number of diabetes (or other health condition, physical characteristic, condition, or trait) cases that the intervention is expected to prevent, total present savings of the intervention, per capita present savings of the intervention, and the return on investment of the intervention, which may be expressed as a percentage. The necessary data may not be available for the user defined geographical region or for the entirety of the user defined geographical data. In this case, for example, the hosted services may advantageously at least one of generate, estimate, or forecast the necessary data.

For example, the service entity 102 may take the form of an individual or business that provides hosted services, for instance hosted services that allow end user entities to specify criteria to generate estimates and/or forecasts of a prevalence of at least one of one or more physical characteristics, conditions or traits of people and/or objects in a population or sub-population for a given period of time or date and allow end user entities to define a geographical region to generate estimates or forecasts of a health condition patient count, a physical characteristics population count, a condition or trait population count or object count, or any combination thereof in a population or sub-population of the user defined geographical region and to produce estimates or forecasts of costs of treatment for the count(s).

The hosted services advantageously generate estimates of respective population sizes or sub-population sizes of user defined geographical regions. For example, a first data set may include samples of population or sub -population counts represented at a first geographical region level or type (e.g., census block group) while the hosted services assist users in determining at least one geographical region of interest. While users may select at least one given geographical region of a given geographical region level or type as the geographical region(s) of interest, the geographical region of interest is not limited to being defined by borders of the given geographical region level or type. Thus, the geographical region(s) of interest may be user defined (e.g., user created or specified boundary). For example, the geographical region of interest may or may not be the given geographical region of the given geographical region level or type. The geographical region of interest may be at least one portion of the given geographical region of the given geographical region level or type; each of the at least one portion may be adjacent to at least another one of the at least one portion or not adj acent to any other one of the at least one portion. The geographical region of interest may be more than one given geographical region of the given geographical region level or type; each of the more than one given geographical region may be adjacent to at least another one of the more than one given geographical region or not adjacent to any other one of the more than one given geographical region. The geographical region of interest may encompass an entirety of the given geographical region of the given geographical region level or type and encompass at least one respective portion of at least one other given geographical region of the given geographical region level or type; each of the at least one other given geographical region may be adjacent to the given geographical region or not adjacent to the given geographical region and may be adjacent to at least one other one of the at least one other given geographical region or not adjacent to any other one of the at one other given geographical region; each of the at least one respective portion of the at least one other given geographical region may be adjacent to the given geographical region or not adjacent to the given geographical region, may be adjacent to at least one other one of the at least one other given geographical region or not adjacent to any other one of the at one other given geographical region, and may be adjacent to at least one other one of the at least one respective portion of the at least one other given geographical region or may not be adjacent to any other one of the at least one respective portion of the at least one other given geographical region.

The hosted services advantageously assist a user in determining a geographical region of interest. For example, the user may select at least one given geographical region of at least one given geographical region level or type as the geographical region of interest. For example, if the user selects at least one given geographical region of at least one given geographical region level or type, then the geographical region of interest is defined as the boundaries of the at least one given geographical region of at least one given geographical region level or type. Additionally or alternatively, the geographical region of interest may be user defined. For example, the user may draw on a map, thereby circumscribing the geographical region of interest. When the user completes the drawing that circumscribes the geographical region of interest, the hosted services advantageously determine polygon vertex coordinates that define the geographical region of interest. For example, the polygon vertex coordinates may define boundaries of at least one polygon, the boundaries defining the user defined geographical region. The boundaries of the region of interest are compared to a map of at least one geographical region of at least one geographical region level or type that are either completely or partially included in the region of interest (i.e., either or both of a user selected geographical region or a user defined geographical region). If at least one geographical region of at least one geographical region level or type is completely contained in the user defined or selected geographical region, then a population estimate for the at least one at least one geographical region of at least one geographical region level or type is added to a population estimate for the user defined or selected geographical region. If only a portion of at least one geographical region of at least one geographical region level or type is included in the user defined or selected geographical region, then it is determined what percentage of total area of the at least one geographical region of at least one geographical region level or type constitutes the portion of the at least one geographical region of at least one geographical region level or type. The same percentage is applied to the total population of the at least one at least one geographical region of at least one geographical region level or type to produce an estimated population of the portion of the at least one geographical region of at least one geographical region level or type that is included in the user defined or selected geographical region. The estimated population of the portion of the at least one geographical region of at least one geographical region level or type that is included in the user defined or selected geographical region is added to the population estimate for the user defined or selected geographical region. This is performed for each geographical region of at least one geographical region level or type that is entirely or partially included in the user defined or selected geographical region to generate estimates of respective population sizes or sub-population sizes of user defined or selected geographical regions.

A new map layer is created in which each geographical region of at least one geographical region level or type that is partially included in the user defined or selected geographical region has all attributes associated with the entire geographical region of at least one geographical region level or type, yet the attributes are reduced proportionally to the respective partially included portion of each geographical region of at least one geographical region level or type in similar fashion to the estimated population of the portion of the at least one geographical region of at least one geographical region level or type that is included in the user defined or selected geographical region. This is described in further detail below.

The hosted services may advantageously include generating an estimate of a count of individuals that have a given disease or condition of interest. The estimate is a sum over each of the at least one geographical region of the at least one geographical region level or type that is entirely or partially included in the user defined or selected geographical region. At least some of the terms contributing to the sum are respective estimates of respective rates of the disease or condition in each of the at least one geographical region of the at least one geographical region level or type that is entirely or partially included in the user defined or selected geographical region. From the respective estimates of respective rates, the hosted services estimate respective numbers of residents that have the disease or condition in the user defined or selected geographical region. For example, the hosted services may apply the respective estimates of the respective rates to the estimated population of each portion of the at least one geographical region of at least one geographical region level or type that is included in the user defined or selected geographical region to produce an estimated patient count for each portion of the at least one geographical region of at least one geographical region level or type that is included in the user defined or selected geographical region. The estimated patient count for each portion of the at least one geographical region of at least one geographical region level or type that is included in the user defined or selected geographical region is added to a total estimated patient count for the given disease or condition of interest in the user defined or selected geographical region. This may be performed for each of the at least one geographical region of at least one geographical region level or type that is partially or entirely included in the user defined or selected geographical region.

The respective estimated number of residents that have the disease or condition in the entire at least one geographical region of the at least one geographical region level or type may not be available for the at least one geographical region level or type. In this case, for example, the hosted services may advantageously generate the respective estimated number of residents that have the disease or condition in the entire at least one geographical region of the at least one geographical region level or type.

The hosted services advantageously generates the estimates and/or forecasts from two or more sets of data (i.e., data sets) which have respective granularities which are different from one another, and hence are heterogeneous data sets. For example, a first data set may include samples of a prevalence of physical characteristics, conditions or traits of people and/or objects in a population or sub- population represented at a first geographic region level or type (e.g., census block group), while a second data set may include samples of a prevalence of physical characteristics, conditions or traits of people and/or objects in a population or sub- population represented at a second geographic region level or type (e.g., county); in at least some instances the second geographic region level or type is different from the first geographic region level or type. For example, in some instances the second geographic region level or type may encompass all of one or more geographic regions of the first geographic region level or type. Thus, the population of a specific geographic region of the first geographic region level or type may belong to the population of a specific geographic region of the second geographic region level or type. Also for example, in some instances the second geographic region level or type may encompass a portion of one or more geographic regions of the first geographic region level or type, the geographic regions of the first geographic region level or type being split between two or more geographic regions of the second geographic region level or type. Thus, the a portion of the population of a specific geographic region of the first geographic region level or type may belong to the population of a first specific geographic region of the second geographic region level or type, while another portion of the population of a specific geographic region of the first geographic region level or type may belong to the population of a second specific geographic region of the second geographic region level or type. As yet a further example, in some instances it is possible that a specific geographic region of the second geographic region level or type may be coterminous or coextensive with a specific geographic region of the first geographic region level or type, the populations of the specific geographic regions of the first geographic region level or type and the second geographic region level or type being identical. Nevertheless, the first and second data sets are heterogeneous.

The data sets can represent a variety of physical characteristics, conditions or traits of people and/or objects in a population or sub-population, typically sampled at a defined time or over a defined period. As generally described herein, one or more data sets can include data that represents one or more demographic characteristics, conditions or traits. As generally described herein, one or more data sets can include data that represents one or more non-demographic, conditions or traits. As generally described herein, one or more data sets can include data that represents one or more immutable characteristics, conditions or traits. As generally described herein, one or more data sets can include data that represents one or more mutable characteristics, conditions or traits.

In some instances, demographic information may take the form of immutable characteristics or traits such as age, race, ethnicity, gender, height, natural hair color, and/or eye color, etc. In some instances, demographic information may additionally or alternatively take the form of mutable characteristics or traits, such as weight, body type, marital status, income, and/or education level, etc.

The data in the data sets can be related directly or indirectly to health and/or incidence or prevalence of a health-related condition or disease. An example used for illustration is data related to the estimation and forecasting of prevalence of type 2 diabetes in a population.

As an example, a first data set comprises data regarding at least one immutable characteristic. For instance, the first data set comprises data regarding at least one of: a gender, an ethnicity, or an age class for individuals in a population or sub-population represented at a geographic region or unit of a first geographic region level or type (e.g., census block). As an example, a second data set comprises data regarding at least one non-demographic mutable characteristic. Also for instance, the second data set comprises data regarding at least one of: an obesity level or an activity level for individuals in the respective population or sub- population represented at a geographic region or unit of a second geographic region level or type (e.g., county, state). Also for instance, the second data set can comprises data regarding one or more of: a percentage of the respective geographic area of the second type of geographic area or level that is rural, a total population of the respective geographic area of the second type of geographic area or level, or a total number of individuals by age class that have a defined condition in the respective geographic area of the second type of geographic area or level. In some instances, the first data set can be sampled, generated and/or made available by a first entity (e.g., U. S. Census Bureau), while the second data set can be sampled, generated and/or made available by a second entity (e.g., private organization, non-profit organization, state public health administration or authority), different from the first entity.

The hosted services may include generating a third data set or federated data set from two or more data sets of different granularities. The hosted services may include generating estimates of a prevalence of one or more physical characteristics, conditions or traits of people and/or objects in a population or sub-population for a given sampling as represented by two or more data sets of different granularities. The hosted services may include generating forecasts of a prevalence of one or more physical characteristics, conditions or traits of people and/or objects in a population or sub-population for a given period of time or date based on two or more data sets of different granularities. The hosted services may include generating estimates and/or forecasts of a prevalence of one or more physical characteristics, conditions or traits of people and/or objects for end user specified geographic regions, which may or may not coincide with geographic regions of the first or the second geographic region levels or type. Thus, an end user can specify a geographic region that is not coextensive with the geographic region of one or more of the data sets or that is not coextensive with the geographic region of any of the source data sets, and the hosted services will generate estimates and/or forecasts of a prevalence of one or more physical characteristics, conditions or traits of people and/or objects for the end user specified geographic region(s). Further, an end user can specify one or more physical characteristics, conditions or traits of people and/or objects and the hosted services will generate estimates and/or forecasts of a prevalence of the end user specified one or more physical characteristics, conditions or traits of people and/or objects.

As discussed above, the hosted services may include generating the estimate of the count of individuals that have the given disease or condition of interest in the user defined geographical region. The hosted services may, for example, generate the estimate of the count via conversion of the estimated population of the user defined geographical region to the estimate of the count based at least in part on the above discussed generated estimates of respective rates of the disease or condition of interest in the entirety of each at least one geographical region of the at least one geographical region level or type that is entirely or partially included in the user selected or user defined geographical region.

The hosted services may advantageously convert the estimated count of individuals affected by the disease or condition of interest in the user defined geographical region to an estimated or forecasted annual (or another period of interest, e.g. daily, weekly, biweekly, monthly, quarterly, semi-annually, annually, bi-annually) cost of treating the estimated count of individuals affected by the disease or condition of interest in the user defined geographical region. This conversion is performed based at least in part on a total annual (or another period of interest, e.g. daily, weekly, biweekly, monthly, quarterly, semi-annually, annually, bi-annually) cost of treating the disease or condition of interest in the user defined geographical region. The total annual (or another period of interest, e.g. daily, weekly, biweekly, monthly, quarterly, semiannually, annually, bi-annually) estimated or forecasted cost of treating the disease or condition of interest in the user defined geographical region may be specified by the user.

Each end user client entity 104 may be logically or otherwise associated with one or more processor-based devices 110a- 11 On, at least when interacting with the hosted service. The processor-based devices 110 may take any of a large variety of forms, including but not limited to personal computers (e.g., desktop computers 110a, 110c, 110k, or laptop computers 1 lOe, net book computers 1 lOi, tablet computers 1 lOf, 1 1 Oh, smart phones 1 10b, HOd, HOg, HOj, workstation computers 1 1 On, and/or mainframe computers (not shown), and the like.

Notably, some end user client entities 104c, 104n may be logically associated with a single processor-based device HOg, HOn, respectively. In many instances, each respective end user client entity 104a, 104b, 104d may be logically associated with two or more processor-based devices. The logical association may be established via an account record or other data structure which may be set up when the end user client entity 104 registers with the service entity 102. For example, an account may be set up for the end user client entity 104, which specifies device address information (e.g., uniform resource locator or URL, phone number, SIM mobile subscriber identifier, mobile equipment identifier, MAC address) for one or more processor-based devices 1 10. The logical association may be established on an ad hoc basis, for example in response to an end user client entity 104 logging into a portal (e.g., Web portal) using one or more applications (e.g., browser) executed on or by one of the processor-based devices 1 10. Such may include the entering of a user name and a password by the end user client entity 104, and verification of the user name and password with an end user client entity account by the hosted services systems 106. Such ad hoc logical associations may be temporary, and may move from one processor- based device 1 10 to another processor-based device 1 10 as the particular end user client entity 104 moves.

The processor-based devices 1 10 are capable communications, for example via one or more networks 108 (e.g., Wide Area Networks, Local Area Networks), for instance packet switched communications networks, such as the Internet, Worldwide Web portion of the Internet, extranets, intranets, and/or various other types of telecommunications networks such as cellular phone and data networks, and plain old telephone system (POTS) networks. The type of communications infrastructure should not be considered limiting. One or more communications interface devices 1 12a-1 12d (four shown, collectively 1 12) may provide communications between the processor-based device 1 10 and the network(s) 108. The communications interface devices 1 12 may take any of a large variety of forms, including modems (e.g., DSL modem, cable modem), routers, network switches, and/or bridges, etc.

The hosted services system 106 may have one or more hosted services server computer 1 14 (only one illustrated) to provide electronic communications either externally from and/or internally within the service entity 102. To handle the load of multiple end user client entities 104, the hosted services system 106 will typically have more than one hosted services server computer system 1 14. The hosted services system 106 may include one or more terminals or personal computers 1 16 (only one shown), communicatively coupled to the hosted services server computer 1 14 via one or more wired or wireless networks 1 18 (only one shown). The terminals or personal computers 1 16 allow input and output by an end user (e.g., employee or contractor of the hosted services entity 102).

The hosted services system 106 includes at least one nontransitory computer- or processor-readable storage medium 120 (e.g., hard drive, solid state drive, RAID, RAM). The nontransitory computer- or processor-readable storage medium 120 stores processor-executable instructions and/or data, facilitating the generation of an estimated or forecasted health condition patient count in a population of a user selected or defined geographic region for a first time period, generation of an estimated or forecasted health condition patient count in the population for a given time period out from the first time period, and/or generate an estimated or forecasted net present value, per capita cost, and/or return on investment of one or more interventions for the health condition. The interventions can take one or more of a large variety of forms, for example: weight loss of a given amount or percentage; change in frequency or amount of exercise; change in diet; change in consumption of alcohol; reducing or stopping smoking or chewing of tobacco products; reduction in stress levels via changes in work or lifestyle (e.g., practicing mediation); periodic use of therapeutic agents (e.g., baby aspirin; statins); reduction in exposure to carcinogens or toxic materials, for instance in populations "downwind" or downstream from known or suspect carcinogens or toxic material point sources. The nontransitory computer- or processor-readable storage medium 120 stores further processor-executable instructions and/or data, further facilitating the generation of an estimate of a population of a user defined geographical region, generation of an estimate of a count of individuals that have a given disease or condition of interest in the user defined geographical region, and/or an estimate or forecast of an annual cost of treating the estimated count of individuals that have the given disease or condition of interest in the user defined geographical region.

The nontransitory computer- or processor-readable storage medium 120 stores further processor-executable instructions and/or data, further facilitating the generation of a third or "unified" data set or federated data set, generation of estimates of prevalence of a physical characteristic, condition or trait, and/or generation of forecasts of prevalence of a physical characteristic, condition or trait based on two or more heterogeneous data sets based on end user specified input (e.g., specifying a geographic region, specifying a time or period of time period, and/or specifying physical characteristics, conditions or traits).

In most implementations, one or more data sets will be stored by the hosted services server computer 1 14 and/or computer- or processor-readable storage medium 120, for instance, in a third or "unified" data set, database or other data structure(s) generated from two or more source data sets. The hosted services server computer 1 14 may, from time to time, import or write data to the third or "unified" data set stored on the computer- or processor-readable storage medium 120. The hosted service servers computer 1 14 may, from time to time, retrieve or extract data from the third or "unified" data set stored on the computer- or processor-readable storage medium 120.

For example, the hosted services server computer 1 14 may retrieve the aspects, attributes, characteristics, conditions or traits using a key that is common across the data of two or more data sets. The hosted services server computer 1 14 may retrieve the aspects, attributes, characteristics, conditions or traits in response to a query posed by the end user client entities 104. For example, the hosted services server computer 1 14 may retrieve the aspects, attributes or characteristics of particular end user client entities 104a in response to a query by one end user client entity seeking to establish a correlation between certain attributes represented in a first data set and certain attributes represented in a second data set of a different granularity than a granularity of the first data set. In response, the hosted services server computer 114 may retrieve the aspects, attributes or characteristics and provide such to the querying end user client entities 104.

While illustrated as a single nontransitory computer- or processor- readable storage medium 120, in many implementations the nontransitory computer- or processor-readable storage medium 120 may constitute a plurality of nontransitory storage media. The plurality of nontransitory storage media may be commonly located at a common location, or distributed at a variety of remote locations. Thus, the data sets or database may be implemented in one, or across more than one, nontransitory computer- or processor-readable storage media. Such data set(s) or database(s) may be stored separately from one another on separate computer- or processor-readable storage medium 120 or may be stored on the same computer- or processor-readable storage medium 120 as one another. The computer- or processor-readable storage medium 120 may be co-located with the hosted services server computer system 114, for example, in the same room, building or facility. Alternatively, the computer- or processor-readable storage medium 120 may be located remotely from the hosted services server computer system 114, for example, in a different facility, city, state or country. Electronic or digital information, files or records or other collections of information may be stored at specific locations in non-transitory computer- or processor-readable media 120, thus are logically addressable portions of such media, which may or may not be contiguous.

While Figure 1 illustrates a representative networked environment 100, typical networked environments may include many additional computer systems and entities. The concepts taught herein may be employed in a similar fashion with more populated networked environments than that illustrated.

Figure 2 shows a networked environment 200 comprising one or more hosted services server computer systems 202 (only one illustrated) and one or more associated nontransitory computer- or processor-readable storage medium 204 (only one illustrated). The associated nontransitory computer- or processor-readable storage medium 204 is communicatively coupled to the hosted services server computer system(s) 202 via one or more communications channels, for example, one or more parallel cables, serial cables, or wireless channels capable of high-speed communications, for instance, via Fire Wire®, Universal Serial Bus® (USB) 2 or 3, and/or Thunderbolt®, Gigabyte Ethernet®.

The networked environment 200 also comprises one or more end user client entity associated processor-based systems 206 (only one illustrated). The end user client entity associated processor-based systems 206 are communicatively coupled to the hosted services server computer system(s) 202 by one or more communications channels, for example, one or more wide area networks (WANs) 210, for instance the Internet or Worldwide Web portion thereof.

In operation, the end user client entity associated processor-based systems 206 typically function as a client to the hosted services server computing system 202. In operation, the hosted services server computer system 202 typically functions as a server to receive requests or information from the end user client entity associated processor-based systems 206.

The hosted services server computer systems 202 may access two or more data sets, with different granularities, and generate a third or unified data set or a federated data set using a common key. The hosted services server computer systems 202 may receive end user generated queries, which, for example, specify geographic areas, times or periods of time, physical characteristics, conditions or traits, population or sub-population. In response to a query, the hosted services server computer systems 202 can generate an estimate of a prevalence of a characteristics, condition or trait in the defined geographic area and/or for a defined population or sub-population. In response to a query, the hosted services server computer systems 202 can generate a forecast of a prevalence of a characteristic, condition or trait at a future time in the defined geographic area and/or for a defined population or sub-population. The population or sub-population can, for example, be specified by one or more demographic characteristic (e.g., age range, gender, ethnicity). The hosted services server computer systems 202 may receive end user input that specifies a geographic area in any of a variety of forms. For example, the user input can identify a census block by census block identifier, county by county identifier, or state by state identifier, or any of the above from a user interface element such as a pull-down list or table. Advantageously, the user input can identify the geographic region via geometric elements (e.g., points, lines or curves) selected on a representation of a map by the end user via one or more pointing devices (e.g., mouse, trackball, trackpad, joystick, stylus). The hosted services server computer systems 202 identifies a geographic area defined or delineated by the geometric elements, and generates the estimate and/or forecast using the end user specified geographic area, even where the end user specified geographic area is not coextensive with any particular area in either the source data sets or the third or unified data set.

The networked environment 200 may employ other computer systems and network equipment, for example, additional servers, proxy servers, firewalls, routers and/or bridges. The hosted services server computer systems 202 will at times be referred to in the singular herein, but this is not intended to limit the implementations to a single device, since in typical implementations there may be more than one hosted services server computer systems 202 involved. Unless described otherwise, the construction and operation of the various blocks shown in Figure 2 are of conventional design. As a result, such blocks need not be described in further detail herein, as they will be understood by those skilled in the relevant art.

The hosted services server computer systems 202 may include one or more processing units 212a, 212b (collectively 212), a system memory 214 and a system bus 216 that couples various system components, including the system memory 214 to the processing units 212. The processing units 212 may be any logic processing unit, such as one or more central processing units (CPUs) 212a, digital signal processors (DSPs) 212b, application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), etc. The system bus 216 can employ any known bus structures or architectures, including a memory bus with memory controller, a peripheral bus, and/or a local bus. The system memory 214 includes read-only memory ("ROM") 218 and random access memory ("RAM") 220. A basic input/output system ("BIOS") 222, which can form part of the ROM 218, contains basic routines that help transfer information between elements within the hosted services server computer system(s) 202, such as during start-up.

The hosted services server computer systems 202 may include a hard disk drive 224 for reading from and writing to a hard disk 226, an optical disk drive 228 for reading from and writing to removable optical disks 232, and/or a magnetic disk drive 230 for reading from and writing to magnetic disks 234. The optical disk 232 can be a CD-ROM, while the magnetic disk 234 can be a magnetic floppy disk or diskette. The hard disk drive 224, optical disk drive 228 and magnetic disk drive 230 may communicate with the processing unit 212 via the system bus 216. The hard disk drive 224, optical disk drive 228 and magnetic disk drive 230 may include interfaces or controllers (not shown) coupled between such drives and the system bus 216, as is known by those skilled in the relevant art. The drives 224, 228 and 230, and their associated computer-readable media 226, 232, 234, provide nonvolatile storage of computer-readable instructions, data structures, program modules and other data for the hosted services server computer system 202. Although the depicted hosted services server computer system 202 is illustrated employing a hard disk 224, optical disk 228 and magnetic disk 230, those skilled in the relevant art will appreciate that other types of computer-readable media that can store data accessible by a computer may be employed, such as solid state drives (SSDs), WORM drives, RAID drives, magnetic cassettes, flash memory cards, digital video disks ("DVD"), Bernoulli cartridges, RAMs, ROMs, smart cards, etc.

Program modules can be stored in the system memory 214, such as an operating system 236, one or more application programs 238, other programs or modules 240 and program data 242. Application programs 238 may include instructions that cause the processor(s) 212 to receive or access data in two or more data sets, at least two of the data sets having different levels of granularity from one another, for instance one data set representing information at a census block level and one data set representing information at a county or state level. Application programs 238 may include instructions that cause the processor(s) 212 to generate a third or unified data set or a federated data set, with at least one key in common to perform queries on the data. Application programs 238 may include instructions that cause the processor(s) 212 to automatically store the third or unified data set or federated data set to the associated nontransitory computer- or processor-readable storage medium 204. Application programs 238 may also include instructions that cause the processor(s) 212 to receive end user queries, which specify one or more search criteria. Search criteria can, for example, include a geographic region of interest, a time or time frame of interest (e.g., set of years), characteristics of a population or sub-population of interest (e.g., demographic information such as gender, age range, ethnicity), and/or one or more physical characteristics, conditions or traits that are of interest. Application programs 238 may also include instructions that cause the processor(s) 212 to automatically execute queries on the third or unified data set or federated data set and generate and provide results of queries to end users. Application programs 238 may also include instructions that cause the processor(s) 212 to automatically convert or map end user specified geographic areas to a suitable form to execute queries, even where those end user specified geographic areas are not coextensive with geographical units in which the source data is represented. Application programs 238 may further include instructions that further cause the processor(s) 212 to receive or access population data, for example, estimated based on data sets having different levels of granularity from one another. Application programs 238 may include instructions that cause the processor(s) 212 to generate an estimated population of the user defined geographical region. Application programs 238 may include instructions that cause the processor(s) 212 to generate an estimated or forecasted count of individuals affected by the disease or condition of interest in the user defined geographical region. Application programs 238 may include instructions that cause the processor(s) 212 to generate an estimated or forecasted annual (or another period of interest, e.g. daily, weekly, biweekly, monthly, quarterly, semi-annually, annually, bi-annually) cost of treating the estimated or forecasted count of individuals affected by the disease or condition of interest in the user defined geographical region. Application programs 238 may include instructions that cause the processor(s) 212 to generate an estimated or forecasted health condition patient count in the user defined geographical region for a first time period. Application programs 238 may include instructions that cause the processor(s) 212 to generate an estimated or forecasted health condition patient count in the user defined geographical region for a second time period that is different from the first time period. Application programs 238 may include instructions that cause the processor(s) 212 to generate an estimated or forecasted number of new cases of the health condition that will occur during the second time period. Application programs 238 may include instructions that cause the processor(s) 212 to generate an estimated or forecasted portion of the population prone to the health condition based on the estimated or forecasted total number of new cases of the health condition that will occur during the second time period. Application programs 238 may include instructions that cause the processor(s) 212 to generate an estimated or forecasted cost savings of implementing the intervention(s) in the user selected or defined geographical region. Application programs 238 may include instructions that cause the processor(s) 212 to generate an estimated or forecasted net present value of one or more interventions for the health condition in the user defined geographical region. Application programs 238 may include instructions that cause the processor(s) 212 to generate an estimated or forecasted return on investment for implementing the intervention(s) in the user selected or defined geographical region. Application programs 238 may include instructions that cause the processor(s) 212 to generate an estimated or forecasted comparison of two different interventions, in terms of the above discussed cost measures, and/or in terms of rate of risk reduction and convert the comparison result into a humanly perceptible indication of the comparison result. Such is described in detail herein with reference to the various figures.

The system memory 214 may also include communications programs, for example, a server 244 that causes the hosted services server computer system 202 to serve electronic information or files via the Internet, intranets, extranets, telecommunications networks, or other networks as described below. The server 244 in the depicted implementation is markup language based, such as Hypertext Markup Language (HTML), Extensible Markup Language (XML) or Wireless Markup Language (WML), and operates with markup languages that use syntactically delimited characters added to the data of a document to represent the structure of the document. A number of suitable servers may be commercially available such as those from Mozilla, Google, Microsoft and Apple Computer.

While shown in Figure 2 as being stored in the system memory 214, the operating system 236, application programs 238, other programs/modules 240, program data 242 and server 244 can be stored on the hard disk 226 of the hard disk drive 224, the optical disk 232 of the optical disk drive 228 and/or the magnetic disk 234 of the magnetic disk drive 230.

An operator can enter queries, commands and information into the hosted services server computer system(s) 202 through input devices such as a touch screen or keyboard 246 and/or a pointing device such as a mouse 248, and/or via a graphical user interface. Other input devices can include a microphone, joystick, game pad, tablet, scanner, etc. These and other input devices are connected to one or more of the processing units 212 through an interface 250 such as a serial port interface that couples to the system bus 216, although other interfaces such as a parallel port, a game port or a wireless interface or a universal serial bus ("USB") can be used. A monitor 252 or other display device is coupled to the system bus 216 via a video interface 254, such as a video adapter. The hosted services server computer system(s) 202 can include other output devices, such as speakers, printers, etc.

The hosted services server computer systems 202 can operate in a networked environment 200 using logical connections to one or more remote computers and/or devices. For example, the hosted services server computer systems 202 can operate in a networked environment 200 using logical connections to one or more end user client associated processor-based systems 206. Communications may be via a wired and/or wireless network architecture, for instance, wired and wireless enterprise- wide computer networks, intranets, extranets, and/or the Internet. Other implementations may include other types of communications networks including telecommunications networks, cellular networks, paging networks, and other mobile networks. There may be any variety of computers, switching devices, routers, bridges, firewalls and other devices in the communications paths between the hosted services server computer systems 202, the end user client associated processor-based systems 206.

The end user client associated processor-based systems 206 will typically take the form of end user processor-based devices, for instance, personal computers (e.g., desktop or laptop computers), net book computers, tablet computers, smart phones, personal digital assistants, workstation computers and/or mainframe computers, and the like, executing appropriate instructions. These end user client associated processor-based systems 206 may be communicatively coupled to one or more server computers. For instance, end user client processor-based systems 206 may be communicatively coupled externally via one or more end user client entity server computers (not shown), which may implement a firewall. The client entity server computers 206 may execute a set of server instructions to function as a server for a number of end user client processor-based systems 206 (i.e., clients) communicatively coupled via a LAN at a facility or site, and thus act as intermediaries between the end user client processor-based systems 206 and the hosted services server computer system(s) 202. The end user client processor-based systems 206 may execute a set of client instructions to function as a client of the server computer(s), which are communicatively coupled via a WAN.

The end user client processor-based systems 206 may include one or more processing units 268, system memories 269 and a system bus (not shown) that couples various system components including the system memory 269 to the processing unit 268. The end user client processor-based systems 206 will at times each be referred to in the singular herein, but this is not intended to limit the implementations to single end user client processor-based systems 206. In typical implementations, there may be more than one end user client processor-based system 206 and there will likely be a large number of end user client processor-based systems 206.

The processing unit 268 may be any logic processing unit, such as one or more central processing units (CPUs), digital signal processors (DSPs), application- specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), graphical processing units (GPUs), etc. Non-limiting examples of commercially available computer systems include, but are not limited to, an 80x86 or Pentium series microprocessor from Intel Corporation, U.S.A., a PowerPC microprocessor from IBM, a Sparc microprocessor from Sun Microsystems, Inc., a PA-RISC series microprocessor from Hewlett-Packard Company, a 68xxx series microprocessor from Motorola Corporation, an ATOM processor, or an A8 or A9 processor. Unless described otherwise, the construction and operation of the various blocks of the end user client processor-based systems 206 shown in Figure 2 are of conventional design. As a result, such blocks need not be described in further detail herein, as they will be understood by those skilled in the relevant art.

The system bus can employ any known bus structures or architectures, including a memory bus with memory controller, a peripheral bus, and a local bus. The system memory 269 includes read-only memory ("ROM") 270 and random access memory ("RAM") 272. A basic input/output system ("BIOS") 271, which can form part of the ROM 270, contains basic routines that help transfer information between elements within the end user client computer systems 206, such as during start-up.

The end user client processor-based systems 206 may also include one or more media drives 273, e.g., a hard disk drive (HDD), solid state drive (SSD), magnetic disk drive, WORM drive, and/or optical disk drive, for reading from and writing to computer-readable storage media 274, e.g., hard disk, optical disks, and/or magnetic disks. The nontransitory computer-readable storage media 274 may, for example, take the form of removable media. For example, hard disks may take the form of a Winchester drive, and optical disks can take the form of CD-ROMs, while magnetic disks can take the form of magnetic floppy disks or diskettes. The media drive(s) 273 communicate with the processing unit 268 via one or more system buses. The media drives 273 may include interfaces or controllers (not shown) coupled between such drives and the system bus, as is known by those skilled in the relevant art. The media drives 273, and their associated nontransitory computer-readable storage media 274, provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the end user client processor-based systems 206. Although described as employing computer-readable storage media 274 such as hard disks, optical disks and magnetic disks, those skilled in the relevant art will appreciate that end user client processor-based systems 206 may employ other types of nontransitory computer-readable storage media that can store data accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks ("DVD"), Bernoulli cartridges, RAMs, ROMs, smart cards, etc. Data or information, for example, electronic or digital files or data or metadata related to such can be stored in the nontransitory computer-readable storage media 274.

Program modules, such as an operating system, one or more application programs, other programs or modules and program data, can be stored in the system memory 269. Program modules may include instructions for accessing a Website, extranet site or other site or services (e.g., Web services) and associated Webpages, other pages, screens or services hosted by the hosted services server computer system 1 14.

In particular, the system memory 269 may include communications programs that permit the end user client processor-based systems 206 to exchange electronic or digital information or files or data or metadata with the hosted services server computer system 202. The communications programs may, for example, be a Web client or browser that permits the end user client processor-based systems 206 to access and exchange information, files, data and/or metadata with sources such as Web sites of the Internet, corporate intranets, extranets, or other networks. Such may require that the end user client processor-based systems 206 have sufficient right, permission, privilege or authority for accessing a given Website, for example, one hosted by the hosted services sever computer system(s) 202. The browser may, for example, be markup language based, such as Hypertext Markup Language (HTML), Extensible Markup Language (XML) or Wireless Markup Language (WML), and may operate with markup languages that use syntactically delimited characters added to the data of a document to represent the structure of the document. While described as being stored in the system memory 269, the operating system, application programs, other programs/modules, program data and/or browser can be stored on the computer-readable storage media 274 of the media drive(s) 273. An operator can enter commands and information into the end user client processor-based systems 206 via a user interface 275 through input devices such as a touch screen or keyboard 276 and/or a pointing device 277 such as a mouse. Other input devices can include a microphone, joystick, game pad, tablet, scanner, etc. These and other input devices are connected to the processing unit 268 through an interface such as a serial port interface that couples to the system bus, although other interfaces such as a parallel port, a game port or a wireless interface or a universal serial bus ("USB") can be used. A display or monitor 278 may be coupled to the system bus via a video interface, such as a video adapter. The end user client processor-based systems 206 can include other output devices, such as speakers, printers, etc.

Figure 3A shows a map 300 which can be displayed as part of a user interface of the hosted services system(s) 106 (Figure 1), according to one illustrated implementation.

The map 300 typically graphically represents a physical geographic area. The map 300 includes markings that delineate various human-defined boundaries. For example, the map 300 includes a first set of markings (solid lines) 302a, 302b, 302c, 302d (only four called out, collectively 302) that delineate a first plurality of geographic regions 304a, 304b, 304c, 304d (only four called out, collectively 304) of a first type or level. For example, the map 300 includes a second set of markings (broken lines) 306a, 306b, 306c, 306d (only four called out, collectively 306) that delineate a second plurality of geographic regions 308a, 308b, 308c, 308d (only four called out, collectively 308) of a second type or level. The second type or level is typically different from the first type or level. As a non-limiting example, geographic regions 304 of the first type or level may take the form of census blocks or census block groups (e.g., smallest unit of representation for a census, for instance the decennial U. S. Census). As a non-limiting example, geographic regions 308 of the second type or level may take the form of counties, parishes or even states. While illustrated in Figure 3A using solid lines and broken lines, other graphical elements or effects can be employed to visually distinguish geographic regions 304 of the first type or level from geographic regions 308 of the second type or level, including but not limited to line weight, color, marqueeing, shading, cross-hatching.

As previously noted, in some instances a geographic region 308a of the second type may encompass one or more geographic regions 304a of the first type. Also as previously noted, in some instances a geographic region 304d of the first type may encompass one or more geographic regions 308b of the second type. Also as previously noted, in some instances a geographic region of the first type may be coextensive with a geographic region of the second type.

The map 300 typically includes markings that represent physical features or structures, which may include both natural features or structures and human-made features or structures. Non-limiting examples include, for instance lakes 310a, rivers 310b, streams, geographic topology (e.g., altitude, mountains) 310c, highways and roads 3 lOd, reservoirs, golf courses 3 lOe, and/or buildings.

Figure 3B shows a user interface 312 to obtain estimates of a prevalence of a first health condition (e.g., lung cancer) in a population or sub-population, according to one illustrated implementation. The hosted services system(s) 106 (Figure 1) can display the user interface 312 or cause the user interface 312 to be displayed by an end user processor-based device.

The user interface 312 includes a first set of user selectable icons (along upper left edge of screen), in the form of about tab 314a, manage tab 314b, and log off tab 314c (collectively 314). User selection of tabs 314a, 314b, 314c, for instance via a pointing device (e.g., mouse, trackpad, finger or stylus on touch screen), allow the user to view information about the hosted services or software, manage a user account, or log off from user session, respectively.

The user interface 312 includes a second set of user selectable icons (along right upper edge of screen), in the form of population data tab 316a, cancer module tab 316b, county statics tab 316c and international tab 316d. User selection of tabs 316a, 316b, 316c, 316d, for instance via a pointing device (e.g., mouse, trackpad, finger or stylus on touch screen), allows the user to view population data, open a cancer module, view county statistics or view international statics, respectively.

The user interface 312 includes a map 318 showing a portion of a region, the region including various sub-regions, for instance various counties or parishes, can include natural and/or geographic features. The map 318 can, for instance, take the form of a "heat map" where different colors map to respective ones of ranges of incidence of a health condition in respective geographic regions.

The user interface 3 12 includes a third set of user selectable icons (along the left lower edge of screen), in the form of close icon 320a, zoom in icon 320b, zoom out icon 320c, information icon 320d and legend icon 320e. User selection of respective tabs 320a, 320b, 320c, 320d, 320e, for instance via a pointing device (e.g., mouse, trackpad, finger or stylus on touch screen), allows the user to close a current view, zoom in on a selected portion of the map, zoom out from a selected portion of the map, obtain information about a selected portion of the map, or review a listing of information or maps or prior analysis. For example, selection of the information icon 320d can cause presentation (e.g., display) of an information pop-up dialog box 322 with information 324, for instance location information, including an estimated prevalence of a health condition (e.g., lung cancer) in the identified location. Also for example, selection of the legend icon 320c can cause presentation (e.g., display) of a legend pop-up dialog box 326 with information 328 mapping estimated prevalence of a health condition (e.g., lung cancer) to respective colors.

Figure 3C shows a user interface 332 to obtain estimates of a prevalence of a second health condition (e.g., type 2 diabetes) in a population or sub-population, according to one illustrated implementation. The hosted services system(s) 106 (Figure 1) can display the user interface 332 or cause the user interface 332 to be displayed by an end user processor-based device.

Many of the user interface elements are similar or even identical to those illustrated and discussed with reference to Figure 3B, and are identified in Figure 3C using the same reference numbers as in Figure 3B. Only significant differences are discussed below. In contrast to Figure 3B, the map 334 in Figure 3C displays an incidence of type 2 diabetes, the information pop-up dialog box 336 with information 338, for instance location information, including an estimated prevalence of a health condition (e.g., type 2 diabetes) in the selected location. The user interface 312 further includes a legend pop-up dialog box 340 with information 342 mapping estimated prevalence of a health condition (e.g., type 2 diabetes) to respective colors.

Figure 3D shows a user interface 344 to obtain forecasts of a prevalence of a second health condition (e.g., type 2 diabetes) in a population or sub-population over a period of time (e.g., 10 years), according to one illustrated implementation. The hosted services system(s) 106 (Figure 1) can display the user interface 344 or cause the user interface 344 to be displayed by an end user processor-based device.

Many of the user interface elements are similar or even identical to those illustrated and discussed with reference to Figure 3B, and are identified in Figure 3D using the same reference numbers as in Figure 3B. Only significant differences are discussed below.

The user interface 344 includes a map 346 showing a portion of a region, the region including various sub-regions, for instance various counties. The user can define a polygon geographic region 348 using the map 346, for instance via a pointing device (e.g., mouse, trackpad, finger or stylus on touch screen). In at least some instances, the user defined region 348 is different (e.g., non-coterminous) from governmental authority defined geographic regions (e.g., states, counties, parishes, cities, towns, census block groups). The user interface 344 can visually represent the user defined geographic region 348, for instance with a polygon (e.g., closed solid line 350) enclosing the user defined geographic region 348 and/or with highlighting 352 or other visual effects.

The user interface 344 includes a forecast pop-up dialog box 354 which presents information 356. In particular, the forecast pop-up dialog box 354 can present a forecast of prevalence of a health condition (e.g., type 2 diabetes) for the user defined geographic area of a user specified period of time (e.g., every year for 10 years). Various approaches to generating forecasts are described herein. The forecast pop-up dialog box 354 can be accessed, for instance via selection of a user selectable tab 358. The forecast pop-up dialog box 354 can include one or more user selectable icons. For example, the forecast pop-up dialog box 354 may include a draw region user selectable icon, selection of which brings up a drawing tool that allow the user to create or define a user defined polygon geographic region 348. For example, the forecast pop-up dialog box 354 may include clear regions user selectable icon, selection of which clears previously created or defined user defined polygon geographic regions 348.

Figure 3E shows a portion of a user interface 360 to obtain estimates and/or forecasts of a prevalence of a health condition in a population or sub-population, illustrating a population data panel 362 with which an end user can specify various characteristics, conditions or traits as part of a query regarding prevalence of a health condition for display on a resulting map 364, according to one illustrated implementation.

The population data panel 362 may be presented in response to selection of the population data tab 316a (Figure 3B, 3C). The population data panel 362 includes a plurality of sets of characteristics, conditions or traits to specify a population or sub-population for display, by selecting (e.g., checking) a corresponding user selectable check box via user input. For example, the data panel 362 can include a set of selections 362a to specify a base geography which will be displayed on the resulting map. Also for example, the data panel 362 can include a set of selections 362b to specify a custom geography which will be displayed on the resulting map. Also for example, the data panel 362 can include a set of selections 362c to specify regional data geography which will be displayed on the resulting map. Also for example, the data panel 362 can include a set of selections 362d to specify environmental data which will be displayed on the resulting map. Also for example, the data panel 362 can include a set of selections 362e to specify disease clusters which will be displayed on the resulting map. In some instances, the data panel 362 can present a metadata pop-up dialog box 366 that provides information about the metadata represented on the map 364. Figure 3F shows a portion of a user interface 379 to obtain estimates of a population or sub-population 380 of a user defined geographical region 382, estimates or forecasts of at least one count 384 of individuals affected by a disease or condition of interest in the estimated population or sub-population 380, and estimates or forecasts of at least one annual (or another period of interest, e.g. daily, weekly, biweekly, monthly, quarterly, semi-annually, annually, bi-annually) cost 386 of treating the at least one estimated or forecasted count 384 of individuals affected by the disease or condition of interest in the user defined geographical region 382, illustrating a disease cost panel 388 with which an end user can specify at least one various annual (or another period of interest, e.g., daily, weekly, biweekly, monthly, quarterly, semi-annually, annually, bi- annually) cost 390 of treating a given individual affected by various diseases or conditions of interest in the user defined geographical region 382, as part of a query regarding at least one estimated or forecasted count 384 of individuals affected by various diseases or conditions of interest in the user defined geographical region 382 for display at least adjacent to a map 392, according to one illustrated implementation. For example, a data panel 394 may include an estimated or forecasted count for an estimated total population 380 in the user defined geographical region 382 and at least one respective estimated or forecasted count 384 for each disease or condition of interest in the user defined geographical region 382. Also for example, the data panel 394 can include at least one respective estimated or forecasted annual (or another period of interest, e.g. daily, weekly, biweekly, monthly, quarterly, semi-annually, annually, bi-annually) costs 386 of treating the diseases or conditions of interest in the user defined geographical region 382.

Figure 3G shows a portion of a user interface 395a to obtain estimates or forecasts of a health condition patient count in a population of a user selected or defined geographical region 396 for a first time period and a second time period that is different from the first time period, estimates or forecasts of results 397 of implementing one or more interventions for the population of the user selected or defined geographical region 396 such as estimates or forecasts of a number of expected new health condition patient counts in the next (or any other number into the future) year (or other time period, e.g., day, week, biweek, month, bimonth, quarter, half year, year, biyear), estimates or forecasts of a number of health condition patient counts prevented by one or more interventions in the next (or any other number into the future) year (or other time period, e.g., day, week, biweek, month, bimonth, quarter, half year, year, biyear), and estimates or forecasts of total present savings, net present value, and return on investment for the intervention(s) in the user selected or defined geographical region 396, illustrating an intervention characteristics panel 395b with which an end user can specify characteristics 398 of the intervention(s) such as at least one number of years to implement the intervention, a cost of the intervention, a risk reduction rate of the intervention, and a discount rate for investment in the intervention for display adjacent to a map 395c, as part of a query regarding at least one estimated or forecasted cost of the intervention, according to one illustrated implementation. For example, a data panel 395d may include an estimated or forecasted new health condition count for each disease or condition of interest in the user selected or defined geographical region 396. Also for example, the data panel 395d may include an estimated or forecasted number of health condition patient counts prevented by the modeled intervention(s). Also for example, the data panel may include an estimated or forecasted total present savings of implementing the intervention(s) for the first time period to (until or through) the second time period. Also for example, the data panel may include an estimated or forecasted per capita net present value of implementing the intervention(s) for the first time period to (until or through) the second time period. Also for example, the data panel may include an estimated or forecasted return on investment for the first time period to (until or through) the second time period.

Figure 4 shows a high-level method 400 of operation in a system that includes at least one processor and at least one non-transitory processor-readable medium communicatively coupled to the at least one processor and that stores at least one of processor-executable instructions or data.

The method starts at 402. For example, the hosted services method 400 may start on powering up of a component or device of the hosted services server computer systems 202. Alternatively, the hosted services method 400 may start in response to a launching of an application by an end user client entity, or in response to a call from a calling routine executing on an end user client processor-based system 110.

At 404, at least one processor accesses a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region. For example, the at least one processor can access a first data set that comprises demographic data for a respective population of each of a plurality of U.S. census block-groups. The first data set may, for example, comprise demographic data regarding at least one immutable characteristic of the population of the geographic regions of a first type of geographic region. The at least one immutable characteristic can, for instance, include at least one of: a gender, an ethnicity, or an age class for individuals in the respective population of each of a plurality of geographic regions of a first type of geographic region.

At 406, the at least one processor accesses a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region. For example, the at least one processor can access a second data set that comprises health-related data for a respective population of each of a plurality of counties.

For at least one instance of a pair of geographic regions of the first and second types, the second type of geographic region may be different from the first type of geographic region. For instance, the geographic regions of the first type can constitute census blocks, while the geographic regions of the second type can constitute counties. In some instances, a geographic region of the second type can encompass one or more respective geographic regions of the first type. While not typical, in other instances, a geographic region of the first type can encompass one or more geographic regions of the second type. In yet further instances, it is possible for a geographic region of the first type to be identical (i.e., coextensive) with a respective geographic region of the second type.

The second data set may comprise data regarding at least one non- demographic characteristic. The at least one non-demographic characteristic can take the form of at least one mutable characteristic. The non-demographic or mutable characteristics can, for instance, include an obesity level or an activity level for individuals in the respective population of the geographic region of the first type of geographic region. The non-demographic or mutable characteristics can, for instance, include at least one of: a percentage of the respective geographic area of the second type of geographic area that is rural, a total population of the respective geographic area of the second type of geographic area, or a total number of individuals by age class that have a defined condition.

At 408, the at least one processor generates a third or "unified" data set or a federated data set, by at least one processor. The third or unified data set can also be referred to as a static data set. The third data set is searchable by a geographic region key. The geographic region key can correspond to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region, for example a census block group code. The third or unified data set includes data regarding at least one demographic or immutable characteristic and at least one health- related characteristic (e.g., non-demographic and/or mutable characteristics) representative of a population associated with the respective geographic region. The at least one health-related characteristic is different from the at least one demographic characteristic.

For example, at least one processor can generate a third data set by fitting a predictive model to the data of the first and the second data sets. For instance, the at least one processor can fit a predictive model to the data by performing a binomial regression on the data of the first and the second data sets.

The method 400 terminates at 410, for example until executed again, for instance in response to a calling routine. Alternatively, the method 400 may continuously or periodically repeat.

Figure 5 shows a low-level method 500 of operation in a system that includes at least one processor and at least one non-transitory processor-readable medium communicatively coupled to the at least one processor and that stores at least one of processor-executable instructions or data. The method 500 can be executed to generate at least one of estimates or forecasts, for example, using data in the third or unified data set or a federated data set generated or formed via the method 400 (Figure 4).

The method starts at 502. For example, the method 500 may start on powering up of a component or device of the hosted services server computer systems 202. Alternatively, the method 500 may start in response to a launching of an application by an end user client entity, or in response to a call from a calling routine (e.g., method 400) executing on hosted services server computer systems 202.

At 504, at least one processor receives a query for at least one of an estimate of prevalence or a forecast of prevalence.

At 506, the at least one processor processes the query, determining an estimated prevalence of a health condition in at least a portion of the population.

Optionally at 508, the at least one processor determines a bias correction for the data of the third data set, by the at least one processor.

Optionally at 510, the at least one processor applies the determined bias correction for the data of the third data set, by the at least one processor.

Optionally at 512, the at least one processor determines at least one confidence interval for the data of the third data set, by the at least one processor.

The method 500 terminates at 514, for example until executed again, for instance in response to a calling routine. Alternatively, the method 500 may continuously or periodically repeat.

Figure 6 shows a low-level method 600 of operation in a system that includes at least one processor and at least one non-transitory processor-readable medium communicatively coupled to the at least one processor and that stores at least one of processor-executable instructions or data. The method 600 can be executed to generate at least one of estimates or forecasts, for example, as part of executing the method 500 (Figure 5).

The method 600 starts at 602. For example, the method 600 may start on powering up of a component or device of the hosted services server computer systems 202. Alternatively, the method 600 may start in response to a launching of an application by an end user client entity, or in response to a call from a calling routine (e.g., method 400, method 500) executing on the hosted services server computer systems 202.

Optionally at 604, at least one processor receives user input that specifies an inquiry period of time.

In the case of an estimate, the inquiry period of time is the same as, or a subset of, a baseline period of time during which the source data was collected or sampled. In the case of a forecast, the inquiry period of time is different from the baseline period of time.

Optionally at 606, the at least one processor receives user input that specifies a geographic area of the inquiry.

For example, the at least one processor can receive an identifier (e.g., number, name, alpha-numeric identifier) that uniquely identifies a specific non-user defined or predefined geographic region, where the geographic region has been defined independently of the received user input. For instance, the at least one processor can receive a census block identifier that identifies an existing census block, or name of a county and/or state in which the county is located which uniquely identifies an existing county, which was defined by a governmental entity.

More advantageously, the at least one processor can receive a plurality of end user selections of points or lines on a map that define a polygon or other geometric shape, which specifies a user defined geographic region. Thus, not only is the geographic region user selected, but the geographic region is user defined and may not actually be coextensive with any particular existing or governmental defined geographic region. For instance, the user input may, for example specify a geographic area that overlaps a boundary of at least one of the geographic regions of the first type of geographic region. Also for instance, the received user input may specify a geographic area that encompasses at least a portion of at least two of the geographic regions of the first type of geographic region.

If the at least one processor receives end user selections of points or lines, then the at least one processor converts the user input that defines a polygon into a number of geographic region key values representing a user defined geographical area having boundaries at 608.

At 610, at least one processor determines an estimate and/or a forecast of a prevalence of a health condition in a population or sub-population based on the search or query criteria specified by the end user input. For example, at least one processor determines an estimate and/or a forecast of a prevalence of a health condition in a population or sub-population for the period of time specified by the end user input. As previously noted, the specified period may constitute the baseline period, in which case the at least one processor is estimating the prevalence of the health condition in the population or sub-population. As previously noted, the specified inquiry period may constitute a period other than, or in addition to, the baseline period, in which case the at least one processor is forecasting the prevalence of the health condition in the population or sub-population. When forecasting a prevalence of a health condition in at least a portion of the population, the at least one processor may calculate or otherwise determine an estimated prevalence for the population or sub-population for each of a plurality of sub-periods, then accumulate the estimated prevalence over multiple sub- periods of the inquiry period to determine the estimated prevalence over the inquiry period (e.g., each year for 10 years, then accumulate the ten sub-totals).

Additionally or alternatively, the at least one processor determines an estimate and/or a forecast of a prevalence of a health condition in a population or sub- population for the geographic area specified by the user input. For instance, the at least one processor determines an estimate and/or a forecast of a prevalence of a health condition for an end user defined and end user specified geographic area, that is different from the geographic areas of the first and the second types of geographic areas of the source data.

Optionally at 612, the at least one processor may determining a rate of change in an estimated prevalence of a health condition in at least a portion of the population for the inquiry period of time. For example, the at least one processor may calculate or otherwise determine estimated prevalence for the population or sub- population for each of a plurality of sub-periods, then compare a change in the estimated prevalence over multiple sub-periods of the inquiry period to determine the rate of change over the inquiry period (e.g., each year for 10 years, then compare estimated prevalence for each subsequent year to estimated prevalence of previous years).

The method 600 terminates at 614, for example until executed again, for instance in response to a calling routine. Alternatively, the method 600 may continuously or periodically repeat.

Figure 7 shows a low-level method 700 of operation in a system that includes at least one processor and at least one non-transitory processor-readable medium communicatively coupled to the at least one processor and that stores at least one of processor-executable instructions or data. The method 700 can be executed to generate at least one estimate, for example, as part of executing the method 500 (Figure 5) and/or the method 600 (Figure 6).

The method 700 starts at 702. For example, the method 700 may start on powering up of a component or device of the hosted services server computer systems 202. Alternatively, the method 700 may start in response to a launching of an application by an end user client entity, or in response to a call from a calling routine (e.g., method 400, method 500, method 600) executing on the hosted services server computer systems 202.

Optionally at 704, at least one processor receives user input that specifies a geographic area of inquiry.

If the at least one processor receives end user selections of points or lines, then the at least one processor converts the user input that defines a polygon into a number of geographic region key values, thereby generating boundaries of a user defined geographic region by encoding the user input, i.e. definition, of the user defined geographic region at 706.

Optionally at 708, at least one processor compares the boundaries of the user defined geographical region to population data corresponding to the first user defined geographical region to produce an estimated population of the user defined geographical area.

Optionally, at 710, at least one processor converts the estimated population of the user defined geographical area to an estimated or forecasted patient count for a given disease or condition of interest in the estimated population of the user defined geographical area based at least in part on a respective estimated rate of the given disease or condition of interest in each respective population or sub-population of each geographic region that is at least partially included in the user defined geographical area.

The method 700 terminates at 712, for example until executed again, for instance in response to a calling routine. Alternatively, the method 700 may continuously or periodically repeat.

Figure 8 shows a low-level method 800 of operation in a system that includes at least one processor and at least one non-transitory processor-readable medium communicatively coupled to the at least one processor and that stores at least one of processor-executable instructions or data. The method 800 can be executed to generate at least estimates, for example, as part of executing at least one of the method 500 (Figure 5), the method 600 (Figure 6), and the method 700 (Figure 7).

The method 800 starts at 802. For example, the method 800 may start on powering up of a component or device of the hosted services server computer systems 202. Alternatively, the method 800 may start in response to a launching of an application by an end user client entity, or in response to a call from a calling routine (e.g., method 400, method 500, method 600, method 700) executing on the hosted services server computer systems 202.

Optionally at 804, at least one processor receives user input that specifies an estimate cost of treating a given individual affected by a disease or condition of interest in a user defined geographical region.

Optionally at 806, the at least one processor converts an estimated or forecasted health condition patient count in the user defined geographical region to an estimated or forecasted annual (or another period of interest, e.g. daily, weekly, biweekly, monthly, quarterly, semi-annually, annually, bi-annually) cost of treating the estimated or forecasted health condition patient count in the user defined geographical region based at least in part on the estimated or forecasted cost of treating the given individual affected by the disease or condition of interest in the user defined geographical region.

The method 800 terminates at 808, for example until executed again, for instance in response to a calling routine. Alternatively, the method 800 may continuously or periodically repeat.

Figure 9 shows a low-level method 900 of operation in a system that includes at least one processor and at least one non-transitory processor-readable medium communicatively coupled to the at least one processor and that stores at least one of processor-executable instructions or data. The method 900 can be executed to generate at least one estimate, for example, as part of executing at least one of the method 500 (Figure 5), the method 600 (Figure 6), the method 700 (Figure 7), and the method 800 (Figure 8).

The method 900 starts at 902. For example, the method 900 may start on powering up of a component or device of the hosted services server computer systems 202. Alternatively, the method 900 may start in response to a launching of an application by an end user client entity, or in response to a call from a calling routine (e.g., method 400, method 500, method 600, method 700, method 800) executing on the hosted services server computer systems 202. Optionally at 904, at least one processor generates at least a first boundary of a user defined geographical region by encoding a user definition of the user defined geographical region.

Optionally at 906, at least one processor selects each geographical region of a given geographical region level or type, e.g., census block, that is at least partially contained in the user defined geographical region.

Optionally at 908, at least one processor reduces, for each respective geographical region of the given geographical region level or type that is at least partially contained in the user defined geographical region, at least one respective attribute associated with the respective geographical region by a proportion that corresponds to a percentage of the total area of the respective geographical area that is contained in the user defined geographical region.

Optionally at 910, at least one processor compares at least a portion of at least the first boundary of the user defined geographical region at least to population data corresponding to the user defined geographical region to produce at least an estimated population of the user defined geographical region.

Optionally at 912, at least one processor converts at least the estimated population of the user defined geographical region to an estimated or forecasted health condition patient count in the estimated population of the user defined geographical region.

The method 900 terminates at 914, for example until executed again, for instance in response to a calling routine. Alternatively, the method 900 may continuously or periodically repeat.

Figure 10 shows a low-level method 1000 of operation in a system that includes at least one processor and at least one non-transitory processor-readable medium communicatively coupled to the at least one processor and that stores at least one of processor-executable instructions or data. The method 1000 can be executed to generate at least one estimate, for example, as part of executing at least one of the method 500 (Figure 5), the method 600 (Figure 6), the method 700 (Figure 7), the method 800 (Figure 8), or the method 900 (Figure 9). The method 1000 starts at 1002. For example, the method 1000 may start on powering up of a component or device of the hosted services server computer systems 202. Alternatively, the method 1000 may start in response to a launching of an application by an end user client entity, or in response to a call from a calling routine (e.g., method 400, method 500, method 600, method 700, method 800, method 900j executing on the hosted services server computer systems 202.

Optionally at 1004, at least one processor estimates or forecasts a prevalence of a first health condition in a population of a user selected or defined geographical region for a first time period.

Optionally at 1006, at least one processor estimates or forecasts a prevalence of the first health condition in the population of the user selected or defined geographical region for a second time period that is different from the first time period.

Optionally at 1008, at least one processor estimates or forecasts a total number of new cases of the first health condition that will occur during the second time period based at least in part on a determined difference between an estimated prevalence in the population of the first user selected or defined geographical region in the second time period and an estimated prevalence in the population of the first user selected or defined geographical region in the first time period and a determined product of a determined difference and size of the population of the first user selected or defined geographical region.

Optionally at 1010, at least one processor estimates or forecasts a portion of the population that is prone to the first health condition based on at least one of (1) an estimated or forecasted total number of new cases of the first health condition that will occur during the second time period and (2) a percentage of estimated total number of new cases of the first health condition that will occur during the second time period.

Optionally at 1012, at least one processor estimates or forecasts an annual cost of treating every individual affected by the first health condition in the population of the user selected or defined geographical region by converting at least the estimated annual treatment cost per individual into an annual cost of treating every individual affected by the first health condition in the population of the user selected or defined geographical region.

Optionally at 1014, at least one processor converts at least the annual cost of treating every individual affected by the first health condition in the population of the user selected or defined geographical region and estimated or forecasted cost of one or more defined interventions into estimated or forecasted cost savings of implementing the defined intervention(s) in the population of the user selected or defined geographic region.

Optionally at 1016, at least one processor estimates or forecasts a net present value of implementing the intervention(s) in the user selected or defined geographical region.

Optionally at 1018, at least one processor converts the at least annual cost of treating every individual affected by the first health condition in the population of the user selected or defined geographical region and the estimated or forecasted net present value of the defined intervention(s) into an estimated or forecasted return on investment for implementing the defined intervention(s) in the population of the user selected or defined geographical region.

The method 1000 terminates at 1020, for example until executed again, for instance in response to a calling routine. Alternatively, the method 1000 may continuously or periodically repeat.

Figure 11 shows a low-level method 1100 of operation in a system that includes at least one processor and at least one non-transitory processor-readable medium communicatively coupled to the at least one processor and that stores at least one of processor-executable instructions or data. The method 1100 can be executed to generate at least estimates, for example, as part of executing at least one of the method 500 (Figure 5), the method 600 (Figure 6), the method 700 (Figure 7), the method 800 (Figure 8), the method 900 (Figure 9), or the method 1000 (Figure 10).

The method 1100 starts at 1102. For example, the method 1100 may start on powering up of a component or device of the hosted services server computer systems 202. Alternatively, the method 1100 may start in response to a launching of an application by an end user client entity, or in response to a call from a calling routine

(e.g., method 400, method 500, method 600, method 700, method 800, method 900, method 1000) executing on the hosted services server computer systems 202.

Optionally at 1 104, at least one processor

is a per person cost of the intervention at period zero, y is a total number of periods, and i is a discounted value of future money.

Optionally at 1 106, at least one processor determines fi to f_y = c*r*a where c is a cost of treating the first health condition for each period, r is the defined intervention's rate of risk reduction, and a is a rate at which people with a second health condition that is a precursor to the first health condition develop the first health condition. The at least one processor can repeat acts 1 104 and 1 106 for each of a plurality of interventions.

Optionally at 1 108, at least one processor determines a per capita net present value of the defined intervention based on the determined summation and determined fi to f_y .

Optionally at 1 1 10, at least one processor converts the per capita net present value into an estimated or forecasted net present value of the defined intervention.

Optionally at 1 1 12, at least one processor determines an estimated or forecasted return on investment for the defined intervention(s) by determining h = v/f₀ where h is return on investment, v is the per capita net present value, and f₀ is the cost per person of the defined intervention(s) at period zero.

Optionally at 1 1 14, at least one processor determines w = v*b where w is total present savings of the defined intervention, v is the per capita net present value, and b is a number of people with a second health condition that is a precursor to the first health condition.

The method 1 100 terminates at 1 1 16, for example until executed again, for instance in response to a calling routine. Alternatively, the method 1 100 may continuously or periodically repeat. Figure 18 shows a user interface 1800 to obtain various health-related information about one or more health conditions in a population or sub-population. The hosted services system(s) 106 of Figure 1 may display the user interface 1800 or cause the user interface to be displayed by an end user processor-based device.

The user interface 1800 includes a first set of user selectable icons (along the upper left edge of the screen), in the form of a home tab 1802, an about tab 1804, and a log off tab 1806. User selection of the tabs 1802, 1804 and 1806, for instance via a pointing device (e.g., mouse, trackpad, finger or stylus on a touch screen), allows the user to return to a home or starting view, view information about the hosted services or software, manage a user account, or log off from a user session.

The user interface 1800 includes a second set of user selectable icons (along the right upper edge of the screen), in the form of an analytics tab 1808, a data tab 1810, a literature tab 1812, an Internet of Things (IoT) tab 1814, and an Esri ArcGIS™ tab 1816. User selection of the tabs 1808-1816, for instance via a pointing device, allows the user to access the functionality associated with each of the respective tabs, as discussed below.

The user interface 1800 includes a map 1818 showing a portion of a region, the region including various sub-regions, for instance various counties or parishes. The map 1818 may include various natural and/or geographic features. The map 1818 can, for instance, take the form of a "heat map" where different colors map to respective ones of ranges of incidences of a health condition or other information in respective geographic regions.

The user interface 1800 includes a third set of user selectable icons (along the lower left edge of the screen), in the form of a turn off map layers icon 1820, a zoom in icon 1822, a zoom out icon 1824, a geographic area report icon 1826, an information icon 1828, a legend icon 1830, and a basemaps icon 1832. User selection of respective icons 1820-1832, for instance via a pointing device (e.g., mouse, trackpad, finger or stylus on touch screen), allows the user to turn off map layers, zoom in on a selected portion of the map, zoom out from a selected portion of the map, generate a geographic report, obtain information about a selected portion of the map, display a legend, or select one or more basemaps 1832, respectively. For example, selection of the information icon 1828 can cause presentation (e.g., display) of an information popup dialog box with information, for instance location information, including an estimated prevalence of a health condition (e.g., lung cancer) in the identified location. Also for example, selection of the legend icon 1830 can cause presentation (e.g., display) of a legend pop-up dialog box 326 with information 328 mapping estimated prevalence of a health condition (e.g., lung cancer) to respective colors.

In Figure 18, the basemaps icon 1832 has been selected to provide a basemap selection dialog box 1834 which provides a set of user selectable icons, in the form of a topographic icon 1836, an imagery icon 1838, a physical icon 1840, a terrain icon 1842, a street icon 1844, a NatGeo icon 1846, a light gray icon 1848, and a dark gray icon 1850. User selection of respective icons 1836-1850, for instance via a pointing device (e.g., mouse, trackpad, finger or stylus on touch screen), allows the user to view a topographic basemap, a satellite and aerial imagery basemap, a physical basemap, a terrain basemap, a street basemap, a National Geographic World basemap, a light gray canvas basemap and a dark gray canvas basemap, respectively.

Figure 19 shows the user interface 1800 after the geographic area report icon 1826 has been selected by a user. Upon selection of the geographic area report icon 1826, a geographic area report dialog box 1902 is presented (e.g., displayed) which includes a county report icon 1904 and a Medicare report icon 1906. Selection of the Medicare report icon 1906 causes a prompt to be presented to the user to select a region (e.g., zip code) on the map 1818 using a pointing device (e.g., mouse, trackpad, finger or stylus on touch screen). Selection of a zip code causes presentation of a report which shows annual average Medicare cost data for inpatient procedures for a period of time. In some implementations, a Medicare report may include various attributes including zip code, Medicare Severity Diagnosis Related Groups (MS-DRG), number of claims per year, average covered charges, average total payments, total discharges per year, etc. The report may displayed to the user (e.g., via a pop-up window, new browser window or tab). The user may additionally or alternatively be provided with the option to download the report as a downloadable file (e.g., PDF, CSV) or to print the report. Upon selection of the county report icon 1904, the user may be prompted to select a number of regions (e.g., counties) displayed on the map 1818. For example, the user may use a pointing device (e.g., mouse, trackpad, finger or stylus on touch screen) to select up to four counties.

Figure 20 shows the user interface 1800 after the user has selected four counties 2002, 2004, 2006 and 2008 to be included in the geographic area report. The counties may be adjacent one another or may be spaced apart (e.g., located within different states or different regions). Once the counties 2002-2008 have been selected, the user may select a generate report icon 2010, which causes a geographic area report to be generated and presented to the user for the selected counties. The report may displayed to the user (e.g., via a pop-up window, new browser window or tab). The user may additionally or alternatively be provided with the option to download the report as a downloadable file (e.g., PDF, CSV) or to print the report.

Figure 21 shows a user interface 2100 for a geographic area report for one of the counties 2002-2008 selected by the user. In the case where the user selects multiple counties, as shown in Figure 20, a geographic area report may be displayed simultaneously for each of the selected counties (e.g., displayed side-by-side in a single browser window or tab, displayed side-by-side in a single dialog box). The user interface 2100 includes a map 2102 which includes a highlighted region 2104 which corresponds to the one or more counties selected by the user for the geographic area reports. The user interface 2100 for the geographic area report also includes a summary tab 2106, a demographics tab 2108, an epidemiology tab 21 10 and a disease (or health condition) by demographics tab 21 12.

As shown in Figure 21, when the summary tab 2106 is selected, the user interface 2100 may display the population 21 14 of the county. The user interface 2100 may also include a lower region 21 16 which displays various summary health and/or demographic information, such as the percentage of the population which is diabetic, the percentage of the population which lives in poverty, the percentage of the population which is elderly, the percentage of the population which has limited access to affordable, healthy food, the percentage of the population which has low access to affordable, healthy food and is low income and the percentage of the population which has limited access to affordable, healthy food and no vehicle, etc.

When the demographics tab 2108 is selected, the user interface 2100 may graphically display (e.g., bar chart, pie chart) demographics for the county. In at least some implementations, the demographics information for the county may be compared to the demographics information for a larger region (e.g., state, country, world). Demographics may include age, race, income and education, for example. When the epidemiology tab 21 10 is selected, the user interface 2100 may graphically display (e.g., bar chart, pie chart) the prevalence of various health conditions for the county. Such health conditions may include chronic disease prevalence, cardiovascular hospitalizations, cancer incidence, etc. In at least some implementations, the prevalence of various health conditions for the county may be compared to the prevalence of health conditions for a larger region (e.g., state, country, world).

Figure 22 shows the user interface 2100 for a geographic area report when the disease by demographics tab 21 12 has been selected by the user. The user interface 2100 includes a category drop down menu 2202 which allows a user to select from a plurality of health conditions and a plurality of demographics. For example, health conditions may include diabetes, cancer, major depression, etc. Demographics may include age, body mass index (BMI), gender, income, race, etc.

Upon selection of a health condition and a demographic from the category drop down menu 2202, the user interface 2100 may present a chart 2204 which illustrates the prevalence of the selected health condition across the selected demographic. In the example shown in Figure 22, the prevalence of diabetes by age is presented to the user as a result of the user' s selection. In at least some implementations, the prevalence of various health conditions for the county may be compared to the prevalence of health conditions for a larger region (e.g., state, country, world).

Figure 23 shows a portion (along the upper right edge of the screen) of the user interface 1800 (Figure 18) after the user has selected the analytics tab 1808. Selection of the analytics tab 1808 causes a health analytics window 2302 to be presented to the user. For example, selection of the analytics tab 1808 may cause the health analytics window 2302 to scroll in an animated manner from the right side of the screen to the position shown in Figure 23.

The health analytics window 2302 includes a select an area of interest tab 2304 which allows a user to specify one or more areas of interest. The select an area of interest tab 2304 includes a create an area of interest icon 2305 which, upon selection, allows the user to define a polygon geographic region using the map 1818, for instance via a pointing device (e.g., mouse, trackpad, finger or stylus on touch screen). In at least some instances, the user defined region is different (e.g., non-coterminous) from governmental authority defined geographic regions (e.g., states, counties, parishes, cities, towns, census block groups). The user interface 2100 can visually represent the user defined geographic region, for instance with a polygon (e.g., closed solid line ) enclosing the user defined geographic region and/or with highlighting or other visual effects, as shown in Figures 3D and 3F discussed above.

The select an area of interest tab 2304 also includes a region type drop down menu 2306 which includes a list of region types which may be selected by the user to indicate which one or more regions are to be selected for implementation of the functionality described herein. As an example, the region types in the region type drop down menu 2306 may be US county, designated place, US census tract and zip code.

Selection of one of the region types may cause an overlay of boundaries on the map 1818 which corresponds to the selected region type. For example, if the user selects the US county region type, an overlay of county borders may be displayed on the map 1818. Similarly, if the user selects US census tract region type, an overlay of census tract borders may be displayed on the map 1818 to allow the user to select one or more census tracts for analysis.

User selection of one or more region types, for instance via a pointing device (e.g., mouse, trackpad, finger or stylus on touch screen), may cause the selected one or more region types to be highlighted on the map 1818. Once selected, the user may select any of numerous processes (e.g., estimation, forecasting, prevention) discussed herein to be performed for the selected one or more region types. Figure 24 shows the user interface 1800 when a claims tab 2402 of the health analytics window 2304 has been selected. The claims tab 2402 includes a select claims population list 2404 which provides a list of populations of claims data which may be selected by the user. In the example shown in Figure 24, the claims populations include First Interstate Bank (FIB) employees, ABC Independent Physicians Association and Oregon State Employees populations.

Upon selection of a particular claims population from the list 2404, the user may select various claims based map data to be displayed on the map 1818. For example, for FIB employees, the user may select FIB covered lives, FIB diabetes or FIB locations. Selection of FIB covered lives causes the addresses of lives covered by FIB to be displayed. Each point displayed on the map 1818 may be a collection of claims tied to an individual patient address. Selection of FIB diabetes causes a display of diabetes disease states of FIB covered lives according to ICD-9 diagnosis codes associated with the claims information for a given individual in a given time period (e.g., year). Selection of FIB locations causes the locations of FIB branches to be displayed on the map 1818.

The user may additionally or alternatively select similar claims based map data for other claims populations in the list 2404 to be displayed on the map 1818. In the example shown in Figure 24, the user has selected covered lives, coronary artery disease (CAD), depression and diabetes for the Oregon State employees claims population, which causes color-coded points 2406 to be displayed on the map 1818. Selection of the legend icon 1830 causes a legend 2408 to be presented (e.g., displayed) by the user interface 2100. The legend 2408 provides information to the user regarding the claims data displayed on the map 1818.

Figure 25 shows a portion (along the upper right edge of the screen) of the user interface 1800 (Figure 18) after the user has selected the data tab 1810. Selection of the data tab 1810 causes a data catalog window 2502 to be presented to the user. For example, selection of the data tab 1810 may cause the data catalog window 2502 to scroll in an animated manner from the right side of the screen to the position shown in Figure 25. The data catalog window 2502 includes a plurality of categories of data which may be displayed on the map 1818, for example, as map layers or as points on the map. The data may be obtained from a number of public and/or proprietary sources and stored in a database of databases ("metabase"). In the example shown, the categories of data include political, physical, social, hazards, disease, U.S. states, national and international, which may be selected via a political category icon 2504, a physical category icon 2506, a social category icon 2508, a hazards category icon 2510, a disease category icon 2512, a U.S. states category icon 2514, a national category icon 2516, and an international category icon 2518, respectively.

Selection of the political category icon 2504 may cause a list of political data to be presented for selection by the user to be displayed on the map 1818. For example, the list of political data may include congressional districts, Medicare report regions, metropolitan statistical areas, census place boundaries, census tract boundary, county boundary, zip code boundary, reservation boundaries and tribal lands, healthcare site, and schools.

Selection of the physical category icon 2506 may cause a list of physical data to be presented for selection by the user to be displayed on the map 1818. For example, the list of physical data may include maximum temperature for a period of time, mean temperature for a period of time, minimum temperature for a period of time, precipitation for a period of time, rivers and streams, average ambient temperature for a period of time, and average barometric pressure for a period of time.

Selection of the social category icon 2508 may cause a list of social data to be presented for selection by the user to be displayed on the map 1818. For example, the list of social data may include inactivity rate, median age, Native American population, percent of population without health insurance, percent of population with health insurance, population age 65 and older, prevalence of excessive drinking for a time period, total population and urban and rural areas. Other social data which may be selected by the user include county classifications, crime data, employment data, food access data, food store availability data, poverty data, and race data. Selection of the hazards category icon 2510 may cause a list of hazards data to be presented for selection by the user to be displayed on the map 1818. For example, the list of hazards data may include toxic chemicals in air data, brownfield sites, superfund sites, EPA toxics release inventory data, air related data, anticipated carcinogens data, known carcinogens data, water-related data, etc.

Air related data may include ozone percentile, particulate matter (PM) 2.5 percentile, traffic proximity percentile, average ozone, average PM 2.5. Anticipated carcinogens data may include data relating to cetaldehyde, acrylonitrile, cobalt, dichloromethane, hexachlorobenzene, lead, polycyclic aromatic hydrocarbons, styrene, tetrachloroethylene, trichloroethylene, etc. Known carcinogens data may include 1, 3- butadiene, arsenic, benzene, cadmium, chromium, creosote, ethylene oxide, formaldehyde, nickel, vinyl chloride, etc. Water related data may include EPA monitored waterbodies, EPA monitored rivers and streams, acetaldehyde in water, ammonia in water, atrazine in water, barium compounds in water, copper compounds in water, ethylene glycol in water, formaldehyde in water, formic acid in water, hydrogen sulfide in water, manganese compounds in water, methanol in water, metolachlor in water, nitrate compounds in water, sodium nitrite in water, uranium in water, vanadium compounds in water, vinyl chloride in water, zinc compounds in water, etc.

Selection of the disease category icon 2512 may cause a list of disease data to be presented for selection by the user to be displayed on the map 1818. For example, the list of disease data may include current adult asthma prevalence, forecasted adult asthma prevalence, average BMI, forecasted average BMI, obesity rate, predicted diabetes growth, Type 2 diabetes rate, cancer data, etc. Cancer data may be selected based on overall rates of cancer, particular types of cancer, cancer hotspots, etc.

Selection of the US states category icon 2514 may allow the user to select various health related data for individual US states to be displayed on the map 1818. Similarly, selection of the national category icon 2516 and the international category icon 2518 may allow the user to select various health related data for the United States and foreign countries, respectively, to be displayed on the map 1818. Figure 26 shows a portion (along the upper right edge of the screen) of the user interface 1800 (Figure 18) after the user has selected the literature tab 1812. Selection of the literature tab 1812 causes a literature window 2602 to be presented to the user. For example, selection of the literature tab 1812 may cause the literature window 2602 to scroll in an animated manner from the right side of the screen to the position shown in Figure 26.

Generally, the literature window 2602 provides the user with access to relevant health-related literature from one or more data sources. In the illustrated example, the literature window 2602 may include a search PubMed tab 2604, a my library tab 2606 and an system library tab 2608. Selection of the system library tab 2608 may cause a list of resources (e.g., articles) published by the provider of the system to be presented (e.g., displayed) to the user.

Selection of the search PubMed tab 2604 causes a search box 2610 to be presented (e.g., displayed) to the user. The user may input one or more search terms into the search box 2610 to initiate a search of articles in one or more databases (e.g., NIH' s PubMed database). The results of the search may be presented to the user in the literature window 2602 (e.g., below the search box 2610) in a suitable format. In at least some implementations, the results are presented to the user in a list format which includes title, authorship, and publication data for each of a plurality of articles found by the search. The list may include links (e.g., hyperlinks) which, upon selection, navigate the user to resources where the user may view and/or purchase the articles or portions (e.g., Abstracts) of the articles.

The list of search results may include selectable icons proximate each of the articles which, upon selection, add the associated article to a library ("my library") of articles associated with the user. A list of articles in the user's library may be accessed by the user by selecting the my library tab 2606 of the literature window 2606.

Figure 27 shows a portion (along the upper right edge of the screen) of the user interface 1800 after the user as selected the IoT tab 1814. Selection of the IoT tab 1814 causes a live data window 2702 to be presented (e.g., displayed) to the user. For example, selection of the IoT tab 1814 may cause the live data window 2702 to scroll in an animated manner from the right side of the screen to the position shown in Figure 27.

Generally, the live data window 2702 allows the user to select various live data sources to be displayed (e.g., overlaid as layers) on the map 1818. Such live data may be obtained by the hosted services system(s) in real-time from a number of sensors or sources (e.g., EPA, U. S. Department of the Interior (DOI)) via one or more wired and/or wireless communications networks. The live data may include EPA air quality index data and DOI hazards data. EPA air quality index data may include air quality index forecast for the current day, AirNow Monitoring - All Stations, AirNow Ozone Stations, AirNow PM 2.5 stations, etc. DOI hazards data may include hazard advisories, hazard warnings, hazard watches, severe conditions, earthquakes, flood conditions, etc.

Figure 28 shows a portion (along the upper right edge of the screen) of the user interface 1800 after the user as selected the Esri ArcGIS™ tab 1816. Selection of the Esri ArcGIS™ tab 1816 causes an Esri ArcGIS™ window 2802 to be presented (e.g., displayed) to the user. For example, selection of the Esri ArcGIS™ tab 1816 may cause the Esri ArcGIS™ window 2802 to scroll in an animated manner from the right side of the screen to the position shown in Figure 28.

The Esri ArcGIS™ window 2802 includes an ArcGIS AtlasMaps Health tab 2804 which includes a list of health data which may be selected by the user for display on the map 1818. The health data may include data relating to acute readmissions per 1000 patients, binge drinking, cigarette smoking, emergency department (E.D.) visits per 1,000 beneficiaries, hospital 30-day readmissions - heart attack, hospital 30-day readmissions - heart failure, hospital 30-day readmissions - pneumonia, total standardized costs per capita, etc. Selection of one or more of these types of data causes the data to be presented on the map 1818. For example, the data may be presented as color-coded regions on the map 1818.

A number of illustrative algorithms which may be utilized to implement the features discussed herein are discussed below. Such algorithms include a geoprocessing algorithms, estimation algorithms, forecast algorithms and prevention algorithms. The estimation algorithms include algorithms for estimating the prevalence of diabetes, coronary artery disease (CAD), chronic lung disease (CLD), prostate cancer, stroke, depression, obesity and asthma in a selected region. The forecast algorithms include algorithms for forecasting the prevalence of diabetes, asthma, obesity and depression. The prevention algorithms include algorithms for estimating a net present value (NPV) and return on investment (ROI) for modeled prevention programs for diabetes, asthma, obesity and depression.

Geoprocessing Algorithm

For a particular disease, condition, or event, this algorithm estimates the adult population in user-selected region G that is afflicted, contracts the affliction, or experiences the event. Examples are: diabetes prevalence, diabetes incidence, or hospitalization (an event).

Act 1

Notation:

G = geographic region

g = geographic region index

i = Census block index

j = year index

d_tj = estimated Census block disease rate (prevalence, incidence, or event)

n_tj = estimated Census block population count

This act gathers the input data for acts 2 and 3 discussed below. These inputs include block-specific disease prevalence, incidence, or event rates for blocks that intersect the user-selected region G. Specifically, the ith U.S. Census block in year j is assigned disease rate d_i- and population size n_i- (e.g. d_i2o±o denotes the ith

Census block disease rate for 2010, a USCB base count year). Assignment of block- specific disease rates originate from county or block-group data depending on the disease, condition, or event. Additionally, to estimate more recent, or future block population sizes η_ί-, w_i20io is incremented according to county-level population growth rates. County -level growth rates are estimated by a weighted linear regression of population size on year (discussed in more detail in the Forecast Algorithms section below).

Act 2

Equation:

r_t = Notation:

a_f = Census block area within region G

a'i = total Census block area

r_t = is the areal proportion of block i that intersects region G This act computes an areal proportion for all Census block areas intersecting a user-selected region. Specifically, for any Census block in region G, the geoprocessing algorithm measures a the ith Census block area intersecting region G.

Area a_t is then divided by the total block area a'_t, forming the areal proportion r_t which is used in the next act.

Act 3

Equation:

/

1=1

Notation:

/ = total number of Census blocks in region

n_g = computed afflicted population count in region G

di_j= estimated Census block disease rate (prevalence, incidence, or event)

n_tj = estimated Census block population

r_t = is the areal proportion of block i that intersects region G

This act computes the estimated population count in region G that has a chronic disease (prevalence), or has contracted a disease (incidence), or has experienced an event (e.g. hospitalization). For each intersecting Census block, the block-specific disease rate is multiplied by the block population count and areal proportion to compute the estimated number of afflicted adults within each block intersection. The sum of these estimates is the afflicted population count estimate for the region.

Specifically, for the ith Census block in region G, this act multiplies the estimated disease rate d_i-, population size η_ί-, and computed area proportion r_t. The product of these terms, for all intersecting blocks 1 through /, are summed to produce the afflicted adult population estimate n_g for region G.

Diabetes Estimation Algorithm.

This algorithm estimates the size of the adult population in user-selected region G that is afflicted by Type 1 or 2 diabetes; and computes the associated health care costs.

Act 1

Notation:

g = geographic region index

j = year index

b = Census block-group index

i = Census block index

d_kj= estimated block-group diabetes prevalence

d_tj = unadjusted estimated Census block diabetes prevalence

Diabetes prevalence d_kj for the fcth block-group is estimated for each year j by a binomial regression model that estimates the proportion of diabetes in a county based on a set of demographic covariates including state. Data originate from the CDC BRFSS 2004-2012 and the USCB American Community Surveys. Block- specific estimates d_i- of the proportion of diabetics are computed from the model. The regression prediction model uses block-group level-specific covariates to compute the estimates for d_i;-. (Prevalence estimates vary at the block-group level.)

Act 2

Equation:

Notation:

rt_j,- = estimated Census block population count 'ί2θΐο ⁼ estimated Census block adult population count

d_tj = unadjusted estimated Census block diabetes prevalence

= estimated Census block diabetes prevalence

In this act, unadjusted prevalence estimates d_i- are adjusted for adult population changes from the base year 2010 to the year of interest. The 2010 estimate is scaled by the relative population change (all residents), and the fraction of the population that are adults (18 years and older).

Specifically, to account for general and adult population change d_i- is multiplied by two separate proportion; block population counts in year j relative to 2010, and adults n'(age 18 and up) relative to η_ί2οιο · (This algorithm act currently provides estimates for the year 2015.)

Act 3

Notation:

n_g = computed afflicted population count (adult diabetic population size) in region G

In this act the adjusted, estimated diabetes prevalence d_i- is entered into the Geoprocessing Algorithm (discussed above) which then computes the afflicted adult diabetic population count n_g for region G.

Act 4

Equation:

C(n_g) = n_gC(l)

Notation:

C(l) = annual, estimated health care costs for a diabetic rig = computed afflicted population count in region G

In this final act, the hosted system client-side code multiplies n_g from the previous act by the client-provided diabetes annual care cost per adult. In at least some implementations, the default diabetes annual care cost is set to $7,900, an average from multiple studies.

CAD Estimation Algorithm. This algorithm estimates the size of the senior population in user- selected region G that is afflicted by CAD and hospitalized as a consequence. It also computes the associated health care costs.

Act 1

Equation:

Notation:

t_j = estimated Census block population count

'i20io ⁼ estimated Census block senior population size 65 years of age di_j= unadjusted estimated Census block CAD hospitalization rate d_tj = estimated Census block CAD hospitalization rate

CAD hospitalization rates have been obtained from the CDC's DHDSP

Atlas 2010-2012. The rate of adults over the age of 65 who have been hospitalized with CAD in the ith block in 2012, d_i2oi2 , is multiplied by the proportion of the senior population count n', (age 65 or greater) to the general population in 2010. The resulting CAD senior hospitalization rate d_i- is not further adjusted for population change, as was the case in the Diabetes Estimation Algorithm. (CAD hospitalization rates are assigned to each block from the county in which they reside.)

Act 2

Notation:

n_g = computed afflicted population size (seniors hospitalized for CAD) in region G

In this act, the adjusted, estimated senior CAD hospitalization rate d_i- is entered into the Geoprocessing Algorithm discussed above which then computes the afflicted senior CAD population count n_q for region G.

Act 3

Equation:

c(n_g) = n_g ^■ C(l)

Notation: C(l) = annual, estimated health care costs for a senior hospitalized with

CAD

n_g = computed afflicted population count in region G

In this final act, the hosted system client-side code multiplies n_g from the previous act by the client-provided CAD annual care cost per adult. The default CAD annual care cost is set to $16,690, an average from multiple studies.

CLD Estimation Algorithm.

This algorithm estimates the adult population count in user-selected region G afflicted by CLD. It also computes the associated health care costs.

Act 1

Equation:

dij = ^12012

Notation:

li_j = Census block CLD prevalence

di_j = adjusted Census block disease prevalence

We use county-level prevalence estimates of CLD derived from the 2012

CDC BRFSS data set. CLD prevalence in ith block in 2012 ij₂₀i2 is directly entered into the Geoprocessing Algorithm, and no adjustment is made for change over time in the rate of CLD.

Act 2

Notation:

n_g = computed afflicted population count (adults with CLD) in region G In this act, the adjusted, estimated adult CLD rate d_i- is entered into the Geoprocessing Algorithm which then computes the afflicted adult CLD population count for region G.

Act 3

Equation:

C{n_g) = n_g ^■ C(l)

Notation:

C(l) = annual, estimated health care costs n_g = computed afflicted population in region G

In this final act, the hosted system client-side code multiplies n_g from the previous act by the client-provided CLD annual care cost per adult. The default CLD annual care cost per adult is set to $1,482, an average of multiple studies.

Prostate Cancer Estimation Algorithm.

This algorithm estimates the adult male population count in user-selected region G afflicted by prostate cancer. It also computes the associated health care costs.

Act 1

Equation: d^ti - ^di2°¹² tew

Notation:

riij = estimated Census block population count

n'ij = estimated Census adult male population count

dij= unadjusted estimated Census block prostate cancer incidence dij = estimated Census block prostate cancer incidence

Prostate cancer rates have been obtained from State Cancer Profiles 2008-2012. The rate of male adults who have contracted prostate cancer in the ith block in 2012 dj₂oi₂ is multiplied by the proportion of the adult male population count ', to the general population in 2010. The resulting prostate cancer incidence rate d_i- is not further adjusted for population change, as was the case in the Diabetes Estimation Algorithm. CAD hospitalization rates are assigned to each block from the county in which they reside.

Act 2

Notation:

n_g = computed afflicted population count (number of males who contracted prostate cancer) in region G

In this act the adjusted, estimated prostate cancer rate d_i- is entered into the Geoprocessing Algorithm discussed above which then computes the afflicted adult male population count n_g for region G.

Act 3 Equation:

C(n_g) = n_g ^■ C(l)

Notation:

C(l) = annual, estimated health care costs per male

n_g = computed afflicted population count in region G

In this final act, the hosted system client-side code multiplies the estimated n_g from the previous act by the client-provided prostate cancer annual care cost per male. The default prostate cancer annual care cost is set to $8,514, an average from multiple studies.

Stroke Estimation Algorithm.

This algorithm estimates the adult population count in user-selected region G that is afflicted by stroke and hospitalized as a consequence. It also computes the associated health care costs.

Act 1

Equation:

Notation

t_j = estimated Census block population count

n'ij = estimated Census block senior population count 65 years of age or more

dij= unadjusted estimated Census block stroke hospitalization rate d_tj = estimated Census block stroke hospitalization rate

Stroke hospitalization rates have been obtained from the CDC's DHDSP Atlas 2010-2012. The rate of adults over the age of 65 who have been hospitalized with stroke in the ith block in 2012, d_i2o 2 ,is multiplied by the proportion of the senior population count ', (age 65 or greater) to the general population in 2010. The resulting stroke senior hospitalization rate d_i- is not further adjusted for population change, as was the case in the Diabetes Estimation Algorithm. Stroke hospitalization rates are assigned to each block from the county in which they reside.

Act 2 Notation:

n_g = computed afflicted population count (seniors hospitalized for stroke) in region G

In this act the adjusted, estimated senior stroke hospitalization rate d_i- is entered into the Geoprocessing Algorithm discussed above which then computes the afflicted senior stroke population n_g for region G.

Act 3

Equation:

C{n_g) = n_g ^■ C(l)

Notation:

C(l) = annual, associated health care costs per adult

rig = computed afflicted population count in region G

In this final act, the hosted system client-side code multiplies n_g from the previous act by the client-provided stroke annual care cost per adult. The default stroke annual care cost is set to $20,369, an average from multiple studies.

Major Depression Estimation Algorithm.

This algorithm estimates the adult population size in user-selected region G afflicted by major depression. It also computes the associated health care costs.

Act 1

Notation:

d_tj = unadjusted estimated Census block major depression prevalence Estimated major depression prevalence d_i- is a forecast obtained from the BRFSS 2000-2012 data sets and discussed in detail in the sourced report.

Act 2

Equation:

Notation:

n_tj = estimated Census block population count

n'i_j = estimated Census block adult population = unadjusted count estimated Census block major depression prevalence

= estimated Census block major depression prevalence

Specifically, to account for general and adult population change d_i- is multiplied by two separate proportion; block population counts in year j relative to 2010, and adults n' relative to η_ί2οιο· (This algorithm act currently provides estimates for the year 2015.)

Act 3

Notation:

n_g = computed afflicted population count (adult major depressed population size) in region G

In this act the adjusted, estimated major depression prevalence d_i is entered into the Geoprocessing Algorithm discussed above which then computes the afflicted adult population count n_g for region G.

Act 4

Equation:

C{n_g) = n_g ^■ C(l)

Notation:

C(l) = annual, associated health care costs per adult

rig = computed afflicted population count in region G

In this final act, the hosted system client-side code multiplies n_g from the previous act by the client-provided major depression annual care cost per adult. The default major depressed annual care cost is set to $8,514, an average from multiple studies.

Obesity Estimation Algorithm. This algorithm estimates the adult population count in user-selected region G afflicted by obesity. It also computes the associated health care costs.

Act 1

Notation:

= unadjusted estimated Census block obesity prevalence

Estimated adult obesity prevalence d_i;- is a forecast obtained from the

BRFSS 2000-2012 data sets and discussed in detail in the sourced report.

Act 2

Equation:

Notation:

n_tj = estimated Census block population count

n'i_j = estimated Census block adult population count

d_tj = unadjusted estimated Census block obesity prevalence d_tj = estimated Census block obesity prevalence

In this act, unadjusted prevalence estimates d_i;- are adjusted for adult population changes from the base year 2010 to the year of interest. The 2010 estimate is scaled by the relative population change (all residents), and the fraction of the population that are adults (18 years and older).

Specifically, to account for general and adult population change d_i- is multiplied by two separate proportion; block population counts in year j relative to 2010, and adults n' relative to η_ί20ιο· (This algorithm act currently provides estimates for the year 2015.)

Act 3

Notation:

n_g = computed afflicted population count (adult obese population size) in region G

In this act, the adjusted, estimated obesity prevalence d_i is entered into the Geoprocessing Algorithm which then computes the estimated afflicted adult obese population count n_q for region G. Act 4

Equation:

c n_g) = n_g ^■ c(i)

Notation:

C(l) = annual, associated health care costs per adult

n_g = computed afflicted population count

In this final act, the hosted system client-side code multiplies n_g from the previous act by the client-provided obesity annual care cost per adult. The default obesity annual care cost is set to $1,429, an average of multiple studies.

Asthma Estimation Algorithm.

This algorithm estimates the adult population in user-selected region G afflicted by asthma. It also computes the associated health care costs.

Act 1

Notation:

= unadjusted estimated Census block asthma prevalence

Estimated adult asthma prevalence d_i- is a forecast obtained from the BRFSS 2000-2012 data sets and discussed in detail in the sourced report.

Act 2

Equation:

Notation:

n_tj = estimated Census block population count

n'i_j = estimated Census block adult population count

d_tj = unadjusted estimated Census block asthma prevalence

= estimated Census block asthma prevalence

In this act, unadjusted prevalence estimates d_i- are adjusted for adult population changes from the base year 2010 to the year of interest. The 2010 estimate is scaled by the relative population change (all residents), and the fraction of the population that are adults (18 years and older). Specifically, to account for general and adult population change d_i- is multiplied by two separate proportion; block population counts in year j relative to 2010, and adults n' relative to η_ί2οιο· (This algorithm act currently provides estimates for the year 2015.)

Act 3

Notation:

n_g = computed afflicted population count (adult asthma population size) in region G

In this act, the adjusted, estimated asthma prevalence d_i is entered into the Geoprocessing Algorithm which then computes the estimated afflicted adult asthma population count n_g for region G.

Act 4

Equation:

C(n_g) = n_g ^■ C(l)

Notation:

C(l) = annual, associated health care costs per adult

rig = computed afflicted population count

In this final act, the hosted system client-side code multiplies the estimated n_g from the previous act by the client-provided asthma annual care cost per adult. The default asthma annual care cost is set to $1, 132, an average from multiple studies.

Diabetes Forecast Algorithm.

This algorithm forecasts the adult population count in a user-selected region G that is afflicted by Type 1 or 2 diabetes.

Act 1

We use weighted linear regression to forecast diabetes prevalence. The slope is the estimated diabetes incidence for the block-group and the intercept is the estimated diabetes prevalence in 2008. Estimates are based on the BRFSS data from years 2004 through 2012.

Act 2 Equation:

Notation:

n_tj = estimated Census block population

n'ij = estimated Census block adult population

= unadjusted estimated Census block diabetes prevalence

d_tj = estimated Census block diabetes prevalence

Diabetes prevalence estimates d_i- are entered into the equations used in the Diabetes Estimation Algorithm (act 2). Instead of using diabetes prevalence for just the current year, this is repeated for all years forecasted. Currently, forecasts are computed for 2013 through 2025.

Act 3

Notation:

The Geoprocessing Algorithm discussed above is run for each forecasted year and returns the forecasted number of adults with diabetes in region G for each year.

Act 4

The hosted system displays key outputs in tabular form including the disease forecast count n_g alongside the total population forecast by year.

Asthma Forecast Al orithm.

This algorithm forecasts the adult population count in a user-selected region G that is afflicted by asthma.

Act 1

Forecasting models are fit for each county separately. The data used to fit the models are not the original CDC BRFSS annual county means and proportions, but instead are the estimates produced by imputation and spatial smoothing algorithms discussed below. Imputation is used to replace missing data (some counties have no observations in some years) and smoothing is used to reduce variance in the BRFSS annual county prevalence estimates. (The BRFSS annual county prevalence estimates are computed as weighted averages where the weights are the reciprocals of the BRFSS sampling weights as explained below.) When the number of observations in a particular year and county is small, spatial smoothing reduces sample size-related imprecision.

Act 2

Notation:

c = U.S. county index

j = year index

We forecast the future prevalence of adult asthma for county c and year j using the estimated linear rate of change of asthma prevalence during the time span 2000 through 2012. Our adoption of the linear or constant rate of change model yields an approximate but conservative model of trend. We use the model for forecasting over a relatively short series of years, namely the time interval 2013 to 2025.

Act 3

Notation:

c = U.S. county index

j = year index

r = the difference between the forecast year and 2012

The estimator of incidence (rate of change) in asthma for the cth county is the least squares estimator of the slope coefficient obtained from fitting a simple linear regression model of the expected mean of the variable of interest prevalence for county c and year j. The predictor variable is year. The forecast for a given year (2012 + r) is obtained by evaluating the fitted regression equation using r as the value of the predictor, or explanatory variable.

Act 4

Equation:

Notation:

n_i- = estimated Census block population count

n'i_j = estimated Census block adult population count = unadjusted estimated Census block asthma prevalence

d_tj = estimated Census block asthma prevalence

Asthma prevalence estimates d_i- are entered into the equations used in the Asthma Estimation Algorithm (act 2). Instead of using asthma prevalence for just the current year, this is repeated for all years forecasted. Current forecasted years are 2013 through 2025.

Act 5

Notation:

n_g = computed afflicted population count (adult asthma population size) The Geoprocessing Algorithm discussed above is run for each forecasted year and returns the forecasted number of adults with asthma in region G for each year.

Act 6

Obesity Forecast Algorithm.

This algorithm forecasts the adult population count in a user-selected region G that is afflicted by obesity.

Act 1

Forecasting models are fit for each county separately. The data used to fit the models are not the original BRFSS annual county means and proportions, but instead are the estimates produced by the imputation and spatial smoothing algorithms described in the sourced report. Imputation is used to replace missing data (some counties have no observations in some years) and smoothing is used to reduce variance in the BRFSS annual county prevalence estimates. (The BRFSS annual county prevalence estimates are computed as weighted averages where the weights are the reciprocals of the BRFSS sampling weights as explained in the sourced report.) When the number of observations in a particular year and county is small, spatial smoothing reduces sample size-related imprecision.

Act 2

Notation: c = U.S. county index

j = year index

We forecast the future prevalence of the obesity for county c and year j using the estimated linear rate of change of obesity prevalence during the time span 2000 through 2012. Our adoption of the linear, or constant rate of change model yields an approximate but conservative model of trend. We use the model for forecasting over a relatively short series of years, namely the time interval 2013 to 2025.

Act 3

Notation:

c = U.S. county index

j = year index

r = the difference between the forecast year and 2012

The estimator of incidence (rate of change) in obesity for the cth county is the least squares estimator of the slope coefficient obtained from fitting a simple linear regression model of the expected mean of obesity prevalence for county c and year j. The predictor variable is year. The forecast for year a given year (2012 + r) is obtained by evaluating the fitted regression equation using r as the value of the predictor, or explanatory variable.

Act 4

Equation:

Notation:

n_tj = estimated Census block population count

n'ij = estimated Census block adult population count

= unadjusted estimated Census block obesity prevalence

d_tj = estimated Census block obesity prevalence

Obesity prevalence estimates d_i;- are entered into the equations used in the Obesity Estimation Algorithm (act 2). Instead of using obesity prevalence for just the current year, this is repeated for all years forecasted. Current forecasted years are 2013 through 2025. Act 5

Notation:

n_g = computed afflicted population count (adult obesity population size) in region G

The Geoprocessing Algorithm discussed above is run for each forecasted year and returns the forecasted number of adults with obesity in region G for each year.

Act 6

Major Depression Forecast Algorithm.

This algorithm forecasts the adult population in user-selected region G that is afflicted by major depression.

Act 1

Forecasting models are fit for each county separately. The data used to fit the models are not the original BRFSS annual county means and proportions, but instead are the estimates produced by the imputation and spatial smoothing algorithms described below in Example 2. Imputation is used to replace missing data (some counties have no observations in some years) and smoothing is used to reduce variance in the BRFSS annual county prevalence estimates. (The BRFSS annual county prevalence estimates are computed as weighted averages where the weights are the reciprocals of the BRFSS sampling weights as explained below.) When the number of observations in a particular year and county is small, spatial smoothing reduces sample size-related imprecision.

Act 2

Notation:

c = U.S. county index

j = year index

We forecast the future prevalence of the major depression for county c and year j using the estimated linear rate of change of major depression prevalence during the time span 2000 through 2012. Our adoption of the linear or constant rate of change model yields an approximate but conservative model of trend. We use the model for forecasting over a relatively short series of years, namely the time interval 2013 to

2025.

Act 3

Notation:

c = U.S. county index

j = year index

r = the difference between the forecast year and 2012

The estimator of incidence (rate of change) in major depression for the cth county is the least squares estimator of the slope coefficient obtained from fitting a simple linear regression model of the expected mean of variable of interest prevalence for county c and year j. The predictor variable is year. The forecast for a given year (2012 + r) is obtained by evaluating the fitted regression equation using r as the value of the predictor, or explanatory variable.

Act 4

Equation:

Notation:

n_tj = estimated Census block population count

n'ij = estimated Census block adult population count

d_tj = unadjusted estimated Census block major depression prevalence d_tj = estimated Census block major depression prevalence

Major depression prevalence estimates d_i;- are entered into the equations used in the Major Depression Estimation Algorithm (act 2). Instead of using major depression prevalence for just the current year, this is repeated for all years forecasted. Current forecasted years are 2013 through 2025.

Act 5

n_g = computed afflicted population count (adult major depressed population size) in region G The Geoprocessing Algorithm is run for each forecasted year and returns the forecasted number of adults with major depression in region G for each year.

Act 6

Diabetes Prevention Al orithm.

This algorithm estimates an NPV and ROI for a modeled prevention program offered to the estimated pre-diabetic population in user-selected region G.

Act 1

n_g = computed afflicted population count(adult diabetic population) in region G

The computed adult diabetic population count n_g from the Diabetes Estimation Algorithm is input to the following acts.

Act 2

Notation:

n'g = estimated afflicted population count (adult pre-diabetic population size)

estimated afflicted population count (adult diabetic population size) = 20 = the expected proportion of adult pre-diabetics to adult diabetics

A pre-diabetic population count for region G is estimated using the constant Q , the expected proportion of adult pre-diabetics to adult diabetics, given a published, sourced incidence rate d = 0.05 for adult pre-diabetics. The constant

20 is multiplied by the estimated adult diabetic population n_g, providing an estimate for the pre-diabetic population size for region G.

Act 3

Notation:

n'g = estimated afflicted population count (adult pre-diabetic population size) in region G d = adult diabetes incidence rate for pre-diabetics

p = user-selected rate which reduces incidence as a result of program weight loss

This act inputs variables based on various assumptions. We assume all adult pre-diabetics in region G are offered the diabetes prevention program and subsequently lose seven percent of their body weight. We equate this loss to a percent reduction in annualized diabetes risk (incidence) d by a user-selected rate p.

Act 4

Equation:

S(ft'_g) = (p^■ d)n'_g ^■ C(l)

Notation:

S( '₅) = total annual, associated health care savings

C(l) = annual, estimated per capita health care costs per adult

n'g = estimated afflicted population count (adult pre-diabetic population size) in region G

In this act, annual health care savings S are computed by multiplying the adult pre-diabetic population count n'_g by the diabetic incidence rate d and program incidence reduction p to provide an estimate for the number of adults the program prevented from contracting diabetes. The cost of treating an individual patient with diabetes, C(l), is then multiplied by the number of prevented cases of diabetes to arrive at an estimated annual savings.

Act 5

Equation:

CF_j_

^NPV =∑a (1 + r)i

7=o

Where:

CF₀ = -(c(i)^■ n'_g)

CF_j = S(n'_g),j > 0 Notation:

NPV = net present value of all investment related cash flows S( '₅) = total annual, associated health care savings

CF₀ = estimated program investment

CFj = estimated cash flow in year j

C(l) = annual, associated per capita health care costs

n'a = estimated afflicted population (adult pre-diabetic population) in region G

In this act, cash flows are modeled under a set of assumptions. First, we assume investments are made such that the entire afflicted population in region G undergoes intervention. In addition, investments in the prevention program are expended at the beginning of the first year. Positive cash flows are assumed to occur at the end of years 1 through user-selected /. It is assumed that pre-diabetics are prevented from contracting diabetes only in the first year of the program, after which they remain pre-diabetics for the length of the intervention. In other words, the program is designed as a one-year intervention program, but with health care savings that persist for several years. Health care savings S(n' g) associated with weight loss are treated as end of year cash flows CFj and discounted at the user-selected discount rate r. The very first cash flow, CF₀ represents the initial investment which is the per capita program enrollment costs multiplied by the estimated population of pre-diabetics in region G.

Act 6

Equation:

NPV

ROI =

(CM - ft',)

Notation:

NPV = net present value of all investment related cash flows ROI = program total return on investment

In a final act, NPV is converted to discounted program return on investment (ROI). NPV is scaled (divided) by the total initial investment to obtain the total program ROI.

Asthma Prevention Algorithm. This algorithm estimates an NPV and ROI for a modeled intervention program offered to the estimated acute asthmatic population count in user-selected region G.

Act 1

Notation:

n_g = computed afflicted population (adult asthma population) in region

G

The estimated adult asthma population count n_g from the Asthma Estimation Algorithm is inputted to the following acts.

Act 2

Equation:

fi'g = f n_g

Notation:

n'g = estimated afflicted population (adult acute asthmatic population size)

rig = computed afflicted population count (adult asthmatic population size)

/ = proportion of adult asthmatics who are afflicted by acute asthma Acute asthma population n'_g is computed as a constant proportion / which is seven percent of total adult asthmatics n_g.

Act 3

Notation:

C_b = per capita cost of annual health care bundle of services consumed C_b'= intervened per capita cost of annual health care bundle of services consumed

This act inputs variables based on assumptions. We assume all adult with acute asthma in region G are offered the intervention program and subsequently consume a smaller bundle of health care services annually following intervention.

Act 4

Equation: S(n'_g) = n_g ^■ (C|,-C|,')

Notation:

S( '₅) = total annual, associated health care savings over region G C_b = per capita cost of annual health care bundle of services consumed C_b'= intervened per capita cost of annual health care bundle of services consumed

fig = estimated afflicted population count (adult asthma population size) in region G

We compute per capita health care savings as the net cost difference (C_b—C_b ) in the respective bundle of services consumed by adults receiving intervention and by adults who do not receive intervention.

Act 5

Equation:

Where:

CF₀ = -(C(l)^■ ' )

Notation:

NPV = net present value of all investment related cash flows

S( '₅) = total annual, associated health care savings

CF_Q = estimated program investment

CFj = estimated cash flow in year j

n'g = estimated afflicted population count (adult asthma population size) in region G

C(l) =per capita medical service costs

In this act, cash flows are estimated bases on a set of assumptions. First, we assume investments are made such that the entire afflicted population in region G is intervened. In addition, investments in the intervention program are expended at the beginning of the first year. Positive cash flows are then assumed to occur at the end of years 2 through /. Health care savings S(n'_g^ associated with this intervention are treated as end of year cash flows CFj and discounted at the user-selected discount rate r. The very first cash flow, CF₀ represents the initial investment which is the per capita program medical service costs multiplied by the estimated population count of asthma adults in region G. Medical services are consumed each year following through year /.

Act 6

Equation:

NPV

ROI =

(CM - ft',)

Notation:

NPV = net present value of all investment related cash flows

ROI = program total return on investment

Obesity Prevention Algorithm.

This algorithm estimates an NPV and ROI for a modeled intervention program offered to the estimated obese adult population in user-selected region G.

Act 1

From BRFSS data, a distribution of body mass index (BMI) has been established for each county. A BMI distribution is established for region G using the Geoprocessing Algorithm. The set of bins and population in proportions is used as an input in act 3.

Act 2

Adult average BMI is forecast / years for region G using the Obesity Forecast Algorithm.

Act 3

The estimated BMI distribution and average BMI forecast from acts 2 and 3 are combined, shifting the distribution by the same relative change as the average BMI shift. The relative change in 2012 BMI to year j is the shift in average BMI. This shifted distribution is then saved for further acts. Act 4

Notation:

C(BMI) = annual health care costs estimated for a given BMI

BMI-based healthcare costs are computed using an equation. Monthly costs are computed as a function of current BMI. Specifically, cost function C(BMI) is obtained via a linear regression from sourced data taking into account select cost categories. Expected costs without intervention are saved for region G and input into the next act

Act 5

In this act, a user-selected BMI range is cross referenced to the appropriate forecast BMI distribution, and the relative costs saved from act 4 are applied to each bin.

Act 6

A one year effect of intervention is modeled by shifting the forecasted distribution according to a user-selected fractional weight reduction, (e.g. a user- inputted weight reduction of 10 percent moves individuals from the 30 BMI bin to 27 BMI bin).

Act 7

Forecast the shifted BMI distribution by / years and input as the forecasted BMI distribution for the next act.

Act 8

Compute the adjusted costs for region G, using act 4 and the adjusted forecasted BMI distribution from act 7.

Act 9

For / years of intervention, subtract adjusted cost forecast from the original reference cost forecast. Save these as a savings forecast, Si.

Act 10

Equation:

Where:

CF₀ = -(C)

CFj = Sj

Notation:

NPV = net present value of all investment related cash flows

Sj = annual, forecasted health care savings

CF₀ = estimated program investment

CFj = estimated cash flow in year j in region G

C(l) =per capita costs

In this act, provide the savings forecast Sj for year j as an annual cash flow CFj. Compute NPV from these relative BMI reduction savings.

Act 1 1

Equation:

NPV

ROI =

(C(l) - ft',)

Notation:

NPV = net present value of all investment related cash flows

ROI = program total return on investment

Major Depression Prevention Algorithm.

This algorithm estimates an NPV and ROI for a modeled intervention program offered to the estimated major depressed adult population in user-selected region G.

Act 1

Notation:

rig = estimated afflicted population count (adult major depressed population size) in region G

The computed major depressed adult population count n_g from the

Major Depression Estimation Algorithm is inputted to the following acts. Act 2

Notation:

This act inputs variables based on assumptions. We assume all adults with major depression in region G are offered the intervention program and subsequently consume a smaller bundle of health care services annually following intervention.

Act 3

Equation:

S(n'_g) = n_g ^■ (C|,-C|,')

Notation:

S( '₅) = total annual, associated health care savings

fig = computed afflicted population (adult major depressed population) in region G

Act 4

Equation:

Where:

CF₀ = -(C(l)^■ n'_g)

CFj = S(n'g) - (C(l)^■ n'_g)

Notation: NPV = net present value of all investment related cash flows

S( '₅) = total annual, associated health care savings

CF₀ = estimated program investment

CFj = estimated cash flow in year j

n'g = estimated afflicted population count (adult major depressed population size) in region G

C(l) =per capita medical service costs

In this act, cash flows are estimated based on a set of assumptions. First, we assume investments are made such that the entire afflicted population in region G undergoes intervention. In addition, investments in the intervention program are expended at the beginning of the first year. Positive cash flows are assumed to occur at the end of years 2 through /. Health care savings S(n'_g^ associated with this intervention are treated as end of year cash flows CFj and discounted at the user- selected discount rate r. The very first cash flow, CF₀ represents the initial investment which is the per capita program medical service costs multiplied by the estimated population count of major depressed adults in region G. Medical services are consumed each year following through year j.

Act 5

Equation:

NPV

ROI =

(C(l)-ft' )

Notation:

EXAMPLE 1

The following non-limiting example is presented to better illustrate a practical implementation of the above described systems, methods and algorithms. Such can be applied to other health conditions and data sets, and can be implemented using other techniques.

This example discusses estimation and forecasting for type 2 diabetes prevalence at a block-group level.

Applicants have, to date, computed estimates of type 2 diabetes prevalence for 217,716 U.S. Census block-groups using a predictive model of prevalence for 2012, the last year of available data. Model estimation is necessary as prevalence has not been estimated and released for public access by the Center for Disease Control (CDC) for the U.S. Census block-groups layer. Further, it is desirable to forecast prevalence beyond 2012 to 2025. The development of this set of estimates and forecasts is a milestone in the effort to analyze and thereby understand type 2 diabetes. The complete set of forecasts constitute a valuable resource for public health practitioners and researchers.

The methods for prevalence estimation and forecasting are important tools that can be used to estimate the effect of changing two of the predictive variables (i.e., obesity and exercise level) on a closed population (e.g., a set of 100 individuals with metabolic syndrome).

Source data sets are accessed and a unified or "static" data set constructed, e.g., a dictionary in the Python language, in which every Census block- group code (FIPS) is a key and the items that may be extracted by key are the demographic and ancillary characteristics (e.g., 18.3% Hispanic and 52.1% female, 32.4%) obese, 9.8%> diabetic). These values correspond to the time period in which the source data was collected, for example the calendar year 2012.

At least one processor-based device uses the resulting unified data set to fit a predictive model by using binomial regression. The input variables to the predictive model are those contained in the static data set and the output is the estimated proportion (i.e., estimated proportions are estimated prevalence divided 100) of type 2 diabetic individuals in a subpopulation with a particular set of attribute values (e.g., demographics, obesity and exercise levels). The predictive model is used to compute the block-group level estimates of type 2 prevalence. A second analysis has led to an algorithm that computes forecasts of type 2 diabetes prevalence for each populated U.S. Census block-group within the states and District of Columbia for a given period {e.g., the year 2020). The predictive function is trained on, or fit to, the county-level data since prevalence is only measured at the county level. The predictive function is applied to U.S. Census block-group data to estimate prevalence at this finer layer. The basis for this approach is the presumption that if a predictive function can be developed that accurately predicts county-level prevalence from county-level demographics, then it will also accurately predict prevalence at the group-block level given inputs that are measured at the group-block level.

There is a caveat: the predictive function will use obesity and exercise as predictor variables in addition to the demographic variables (measured at the block- group level). Obesity and exercise are not measured at the block-group level, so the predictions for block-group level prevalence use the county -level values for obesity and exercise. This approach is valuable since it is believed that adopting the county-level variables for the block-groups will likely yield more accurate estimates than a model that does not use obesity and exercise. This approach is useful since, for most instances, it is believed that adopting the county-level variables for the block-groups will likely yield more accurate estimates than a model that does not use obesity and exercise.

Given the estimated rate-of-change for the county for diabetes prevalence (say 0.25% per year), the forecast is a linear function of years since 2012. Obesity rate may be predicted using the same method. For example, the linear model forecast of prevalence is the estimated incidence (rate of change) multiplied by the number of years beyond 2012 to forecast prevalence added to the 2012 estimate of prevalence, say y₂on · The 2020 forecast is

+ 4% .

Linear trend is not sustainable of course, and the actual trend is more complicated. With only 9 years of data, it appears to be nearly as accurate as any other type of trend. The analysis of prediction error is admittedly limited because of the limited time span. In fact, some smoothing of the linear regression coefficients reduces prediction error by approximately 10%. Therefore, this hypothetical estimate of incidence (.25% per year) is, in practice, a weighted average of incidence estimates from many counties.

The key assumption upon which this forecasting method depends are that whatever factors are driving change at the county level continue to operate in the same manner beyond 2012 and through 2025. Both assumptions are at the same time false and approximately correct for most counties.

It is clear that for type 2 diabetes, obesity drives prevalence (not solely though) and that change in obesity rates is likely to be the latent variable causing most of the change in prevalence. As we have obesity rates by county for 2004-2012, it is possible to forecast obesity beyond 2012, and then use the obesity predictions to predict prevalence. It is noted that there is prediction error at both stages. Therefore, the first iteration will use trend in diabetes prevalence at the county level to compute the predictions. The next iteration may use other variables (e.g., obesity and poverty level) as well.

The analysis in this example uses the County Health Rankings Data compiled by the Robert Wood Johnson Foundation henceforth referred to as source data set A. The origins of the demographic variables are contained in the file areas of the U.S. Census Bureau, and the source of the health-related variables {e.g., diabetes, obesity and exercise) are from the CDC's National Center for Chronic Disease Prevention and Health Promotion. The source data are collected via the CDC's Behavioral Risk Factor Surveillance System (BFRSS).

It is implied that the sampling date for the CDC's county-level prevalence estimates is 2012 though type 2 diabetes prevalence appear to be averages derived from BFRSS counts from 2010, 2011 and 2012. The sample units are counties and the demographic variables are proportions of each county by gender, race, and age class. Also contained in the data set are the prevalence of diabetes {e.g., for three age classes: less than 18, 18 to 64, and greater than 64 years of age), percent of the county population classified as rural, population size, and number of diabetics by age class.

A second source data set on type 2 diabetes prevalence for adults (> 18 years of age) by county originate from the American Community Survey administered by the US Census Bureau and are contained in DM_PREV_ALL_STATES_2012.CSV (data set B). Companion files contain obesity rates and percent of respondents that said they spent no time engaged in exercise {i.e., exercise level). In this example, this data spans the interval from 2004 to 2012 inclusive. The prevalence values in data set B for 2012 are much different than data set _^4, as B includes adults only. If the numbers of children is subtracted from the population count in set A (to obtain the number of adult type 2 diabetics), and prevalence is computed as the ratio of number of adult type 2 diabetics to number of adults, then there is close agreement between the average of the years 2010, 2011 and 2012 from set B and the prevalence value assigned to 2012 in A.

Subpopulations are defined by three demographic variables: age {e.g., 3 classes), race {e.g., 6 classes), gender {e.g., two classes), and two ancillary variables: obesity (2 classes) and exercise level (2 classes). Demographic variables are immutable but the ancillary variables are not. Therefore, a user may define a closed subpopulation, say an ACO membership population, or an elemental group such as 100 individuals belonging to a specific demographic subpopulation, for example Hispanic females 18- 64 years of age belonging to the diabetic, obese, and no-exercise class. From the closed population demographics, the hosted system can estimate prevalence. Furthermore, a hypothetical experiment can be carried out in which the mutable subpopulation attributes are shifted to diabetic, non-obese, and no-exercise class and the prevalence estimate re-computed. The shift may be accomplished via bariatric surgery or diet control programs-this process is not relevant. For the next stage of computing ROI, it does matter.

There are two major efforts discussed herein: estimation of prevalence for each of 217,716 census block-groups for the baseline year of 2012, and forecasting prevalence for each block-group for the years 2013, . . . , 2025. The following discussion begins with the estimation task.

Estimation

The population of interest is the 3138 (out of 3141) geographic units assigned 5-digit FIPS codes (essentially, counties) for which CDC prevalence source data exists for 2004-2012. The prevalence model is a model of the proportion of individuals diagnosed as diabetic among the resident population for each geographic unit. Since there are three layers of interest, i is used to index state, j indexes county within state, and k indexes block-group within county. The logistic model of the proportion π.. of diagnosed type 2 diabetics in state i and county j is

where xi _j , . . . , x_{p j} are the predictor variables (i.e. , immutable demographic variables and obesity and exercise level) measured for state i and county, and proportion is prevalence divided by 100. Binomial regression is used to compute the maximum likelihood estimates of the parameters βι, . . ., β_ρ. The fitted parameters β . . ., β are used to compute a proportion given an input vector {χ_{γ y} .. . x _y f according to

The county-level model for estimating diabetes prevalence uses the predictor variables identified in Table 1. There are 61 parameters in the model since a factor with r levels requires r - 1 parameters. The values of these level-specific variables are the county proportions estimated to belong to the respective level. For example, for Missoula county, the value of the American Indian level variable is 2.734%.

Table 1

Number of

Predictor Number of levels coefficients

Exercise Percent of adults reporting no exercise time 1

for exercise

States 51 51

Table 1 shows predictor variables used in the county-level model of diabetes prevalence. The number of levels are shown for the categorical variables (factors). Quantitative variables are identified with a brief description. There are n = 3138 counties with data on prevalence.

Block-group level estimation

A vector of demographic measurements Xi jt . . . x_P-2_,y_k for block group k contained in state i and county j is augmented by concatenating the obesity and exercise variable values measured for state i and county j to produce

assuming that the p - 2 variables are the demographic variables. With the vector x, an estimate of prevalence may be computed for the block-group using formula (2).

Bias correction

The prediction function defined in formula (2) will not yield estimates that are in complete agreement with the observed prevalence. Specifically, if Ny is the number of residents in county j and state z, and if ny is the estimated number of diagnosed diabetes, then the sample estimate of prevalence is

It is almost certain that p_i}≠ π_ί} for the following reasons. Both estimators are estimating y, the true proportion of type 2 diabetics in county j. The estimator py is subject to error because Ny is but a sample of the population. The estimator is subject to error because the model is, of course, only a model, but on the other hand, the estimator is a function of β which is computed from a much larger set of data (all states and counties, rather than only county j).

There are two courses of action when using the hosted system to assign a base prevalence to state i and county j.

1. Accept the i _y 's as better estimates than the py' as the π_] . 's are computed using data from all counties and a model, whereas py is based on a single county. A forecast for the county in year t is computed from an estimated incidence and the difference between year t and 2012. The linear forecast for block-group k is similarly computed as ¾T = ¾¹² + ¾ x (t - 2012) , (5) where π^^η is the fitted value for the county, and r_y is the estimated incidence for the county.

2. Accept py as a better estimate than _i]k . If this is the case, the estimated prevalence for block-group k is computed by adjusting the county sample prevalence. The adjustment is a block-group adjustment given by the difference between the model prevalence estimate for the block-group and the model county estimate: π_≠ - π_ί} . The prevalence estimate for the block is the county sample estimate p_jjk plus the adjustment:

A linear forecast of prevalence in year t for the block-group is

Given that the model fit is very good (discussed below), the first course of action appears acceptable, in part because of the fact that the iv 's are themselves estimates of the true prevalence as the source data from CDC were not obtained from a census but from sampling. Admittedly, the CDC sample sizes are large and sampling error ought to be small. There are other issues related to under-reporting of diabetes and obesity which typically are beyond control. Confidence Intervals

It is sometimes desirable that an estimate of prevalence be accompanied by a confidence interval. The general form of an approximate 100(1 - β)% level confidence interval is

(l 00 [π - ζ_βΙ2σ(π) J, 10θ[π + ζ_βΙ2σ(π) J )% (8) where 100 π % is estimated prevalence, σ (π ) is the standard error of π , and Ζβ is the β/2 percentile of the standard normal distribution. For instance, Ζβ₂ = -1.96 is the critical value used for a 95% confidence interval since β = 0.05. The calculation of σ ( π ) depends on the vector x of predictor variables from which the estimate of π was computed (equation 2 of Example 1) and the estimated variance-covariance matrix of the ^-length parameter vector β . Let ∑ = var( ?) denote the estimated variance- covariance matrix of β . Then, σ(π) = x^T∑x (9)

The matrix ∑ is computed in the course of computing the maximum likelihood estimate β and so usually can be retrieved from the statistical software package that computed β . It may also be computed using β and the data set though the details are omitted here.

Suppose that estimated prevalence is to be computed for a spatial polygon that is entirely enclosed in a county for which the estimated prevalence is 100 π . Since the polygon region may be less than the county, let n denote the number of individuals in the polygon and N denote the number of individuals in the county. Hence, n < N. Because σ² (π) is the estimated variance for the county prevalence and its N residents, the system will scale this variance estimate to reflect additional uncertainty in the estimated proportion of type 2 diabetics in the polygon attributable to the smaller base population (n) compared to the county. (The smaller base population leads to greater variability in the actual proportion of individuals with type 2 diabetes.) Thus, if _polyson is the estimated proportion of type 2 diabetes in the polygon, we approximate the variance of π _ol by -i^- ^ <">>

Note that if the polygon coincides with the county, then n = N and

^ '^' (^polygon ) = ^ (it) .

Suppose that the polygon covers all or portions of c counties. Let N, denote the population size of the rth county. The number of individuals in county i and in the polygon is denoted by «,·. Let

N

s_t = -L (11) denote the scaling coefficient for county i. Thus, we may write σ² (^_polygon ) = s_ta² (π) if the polygon is completely contained in county i.

Returning to a polygon that covers all or part of c > 1 counties, the prevalence estimate for the polygon is a weighted average where the weight for county i is the proportion of the total polygon population size contained in county /^', specifically, the county weight is:

N.

w_t = ·— (12)

If every county were completely containing in the polygon, then the prevalence estimate for the county is:

c

K polygon =∑^W≠, ( ^{1 3})

=l

∑^C Ν,π,

and the variance estimate is:

Now, if some counties are only partly contained in the polygon, then it is necessary to re-introduce the scaling coefficients. This leads to the general formula: σ \^π polygon ) - ]— ¾

For example, suppose that N, = 100 for i = 1, . . . , c = 4 and that n_t = N_;, / = 1, 2, 3 and n₄ = 50. For simplicity, suppose that σ² (π_ί ) = 1 for i = 1, . . . , 4. Then, ^{a 2} ^_Poiy_gon ) = · ^{3 1 25} · ^If «4 = 100 instead, then σ² (π _polygon ) = .25 . Lastly, if c = 3, and N,

= n, = 100 for / = 1, 2, 3, then σ² (π _polygon ) = .3333 .

Formula (16) is correct if county-level counts of the number of type 2 diabetics are independent. Since the presumption of independence is false because of spatial correlation, we proceed with the understanding that the variance estimates are approximate as are confidence intervals derived from the variance estimates.

Forecasting

Turning to the forecasting of diabetes rates, it is presumed that forecasting is aimed at producing predictions in the near future (e.g., less than 15 years beyond the range of the source data, in this example the last year of source data being collected or sampled in 2012). The block-group identifying subscripts are omitted below for clarity. A convenient forecasting model is

where π²⁰¹² is the estimated prevalence in year 2012, r G { 1, 2, . . . , 13 }, and β_γ is the estimated incidence, or rate of change in prevalence. The rate of change is also known as incidence, so that β_χ is the estimated incidence during the interval 2004-2012.

Alternatively, we may apply the predictive model by projecting forward in time from 2008, the midpoint of the interval 2004-2012 and compute, for county / in year r, π - 2012+r - 2012+r

+ B_Y x (r + 4) (18) where ?₀ is the estimated mean prevalence for the county over the nine years 2004- 2012 (and also the intercept from a centered regression model) and βγ is the estimated incidence.

The first forecasting method (equation 17 of Example 1) is used herein as it more accurately predicts prevalence for 2012 when 2012 prevalence are held out of the model-fitting and used as test cases. Improved accuracy from this method versus forecast forward from the mid-point reflects in imprecision in the linear model. Hence, there is some evidence of non-linear trend in the eight years of data. Unfortunately, there's too little data to estimate the trend with a more sophisticated model. To verify this conjecture, we fit the quadratic forecasting model π - 2012+r _Q + _x r + ₂ r² (19) and evaluated its forecasting performance. The quadratic forecasting model is among the simplest nonlinear forecasting models. ^-Nearest Neighbor Regression

A logical initial starting point to estimating incidence is to conduct separate simple linear regressions for each county using the nine observation pairs {(yi, -4), . . ., (y₉, 4)} from each county. In terms of the models above, incidence is the annual rate of change in prevalence. Incidence is, in effect, a parameter of the forecasting models, namely, the slope coefficient (The integers -4, -3, . . ., 4 are the centered year, e.g., -4 = 2004 - 2008). The result of fitting simple linear regression models to each county's data yields estimates of the unknown true parameters β₀ (the true mean prevalence from 2004 through 2012 inclusive) and

(the true incidence).

It is reasonable to expect that simple linear regression estimates computed from the data from a single particular county, say county /^', may be improved by using more data than the nine data pairs. A source of additional data are counties that are similar to county i with respect to prevalence. Incorporating additional data from similar counties may be thought of as information-sharing, borrowing, or smoothing. This section discusses a relatively simple but effective method of using all county data within a particular state to estimate the forecasting model for a specific county. For each county, a different model is fit since data from the county in question has the greatest importance in fitting the forecasting model. Other, similar counties {i.e., similar in the sense that the 2007 prevalence is alike) have some influence, but the greatest influence comes from the county in question.

Smoothing is accomplished via ^-nearest neighbor regression. The k- nearest neighbor regression algorithm operates as follows. When computing the parameter pair for the z^'th county, the data from this county are assigned a weight of a, where 0 < a < 1. Data from the county most similar to county /^', that is, with a mean prevalence most similar to the z^'th county receives weight (1 - α)α; the next most- similar county receives the weight a(l - a)², and so on. Then all county data (from the state in which the county resides) are used, but the counties are weighted as described. Separate regressions are computed for each county since the weighting scheme is different for each county. Of course, county i is the most important having the largest weight; hence, the method yields results that are similar to, but more precise than ordinary least squares regression. Since the weights decrease exponentially as dissimilarity increases, the method is referred to as exponentially weighted ^-nearest neighbor regression. Forecasting Error

Estimation of forecasting error is an essential component of predictive analytics. In this situation, forecasting error is, roughly speaking, the average absolute difference between a forecasted value of prevalence and the actual, or realized value of prevalence, for some unit such as a block-group or a county. As we have no opportunity to observe the true prevalence for some unit in the near future for assessing error, we are forced to holdout the last year of data (2012) and use the first eight years to compute a forecast for 2012. The error estimates computed from comparing the actual and forecasted prevalence for 2012 will be somewhat optimistic for more distant forecasts; however, estimates of forecasting error provide the opportunity to select among competing forecasting algorithms and objectively select an algorithm for general use. Since data exists only for county prevalence, the analysis of forecasting error uses these data.

Forecasting error is estimated for three forecasting functions:

1. Simple linear regression as described in equation 17 of Example 1.

The root mean square prediction error was .960%.

2. Quadratic linear regression as described in equation 19 of Example 1. The root mean square prediction error was 1.117%.

3. The exponentially weighted ^-nearest neighbor regression method. The root mean square prediction error was .937%). This corresponds to the tuning constant value a = .625.

The exponentially weighted ^-nearest neighbor regression forecasting function has smaller prediction error than simple and quadratic linear regression forecasting functions. Therefore, this method has been adopted for computing forecasts for each block-group and year 2013-2025.

Results

The parameter estimates for the county-level model of diabetes prevalence (measured as a proportion rather than the usual percent) are discussed first.

These estimates were computed by fitting a binomial regression model to the number of diagnosed type 2 diabetes cases and are shown in Table 2, below. The model does not contain an intercept. The baseline age class is greater than 64 years of age, and the baseline race is Native Hawaiian and Pacific Islanders (a single race). Large coefficients imply that the variable is positively associated with diabetes prevalence. If we compare racial groups, the largest coefficient is associated with American Indian. The estimate β_ΑΜηΛαη = -0.00476 is obviously negative, which implies that diabetes prevalence for American Indians is less than that of the baseline group (e.g., Native Hawaiians and Pacific Islanders) after accounting for obesity, age, gender, and state differences. Since the estimate is very nearly zero, there is very little difference in estimated mean prevalence between the two groups if all other variable are the held the same. The group with the lowest prevalence is Non-Hispanic Whites (if all other variables are held constant) since the coefficient associated with this group is smallest among the race coefficients. Among states, the coefficient for Alabama is greatest after equalizing differences in obesity, age, race and gender among states. The coefficient associated with female is positive, which implies that after accounting for all other variables, females suffer greater prevalence than males (the baseline gender).

Some information on model fit is provided by the pseudo-R² value of .837. This statistic implies that approximately 83.7% of the variation in county-level prevalence is explained by the model. Eighty percent of the prediction errors across all counties are within one percent, specifically, between -.908% and +1.142% (measuring errors on the prevalence scale as a percent of the population). Additionally, 95% of the prediction errors are between -1.282% and 1.949%).

Figure 12 shows a plot of fitted versus predicted prevalence. In particular, the fitted values are plotted against observed prevalence, for each of n = 3138 counties, pseudo-R² = .837. The line has slope 1 and intercept 0. A perfect model will show all pairs plotted on the line. Notice that there is some attenuation— small observed values tend to be over-predicted and large observed values tend to be under- predicted. While the plot shows some attenuation in the sense that small observed values tend to be over-predicted, generally, the predictions and observed values are in very good agreement.

To determine if there is evidence of systematic over- or under-fitting of prevalence for specific states, the distributions of prediction errors (on the prevalence scale) are graphed in Figure 13. In particular, Figure 13 shows distributions of prediction errors { _i} - y_tj) summarized by state. The distributions for all states are distributed about zero, or nearly so, and so the model appears to fit well in the sense that there is no evidence of systematic over- or under-fitting, by state.

Figure 14 shows the distribution of effects for each variable. Figure 14 provides some insight to the variables that drive differences across counties. A distribution for each variable is shown, and the values comprising the distribution are differences in fitted prevalence and prevalence estimates obtained by replacing the actual county values with the mean across all counties. Wider distributions correspond to variables that have more importance in determining the estimated county prevalence. The fitted prevalences vary in response to the importance of the variable as a predictor and the degree of variability in the variable. The second term in the difference is, in effect, the estimate that would be obtained if the variable were not in the model. For instance, Figure 14 shows a substantial degree of variation in the non-Hispanic White distribution, implying that this variable is the one of the most important factors affecting prevalence. Further, the distribution is skewed to the left because of unusually high prevalence in some counties with large proportions of non-Hispanic Whites. Some investigation reveals that a substantial fraction of these counties with the unusually large values are among the poorest counties in the United States. The obesity and exercise distributions have the next largest variances which implies that these variables are (roughly) the second- and third-most important factors followed closely by age.

Incidence estimation

Figure 17 shows the distributions of estimated incidence by state as a set of boxplots 1200. The value comprising the state-specific boxplots 1200 are the estimated incidence obtained from the exponentially weighted ^-nearest neighbor regression models (a = .625). States with the largest county estimates are Alabama, Kentucky, and Florida. States with small estimates of incidence are Hawaii, Utah and Vermont. The mean estimate (across all U.S. counties) was .3747 percent/year and the standard deviation of the estimates was .143 percent/year.

Prevalence estimation at the block-group stratum

To gain some insight into the estimates, the block-group prevalence estimates (2012) are plotted against the CDC prevalence estimates for each of the 3141 counties. Specifically, for county /^', CDC reports an estimate p_t whereas we have computed prevalence estimates π_η , π_η , . . ., π_{ί n} , for each of block groups contained in the county. The distribution of estimates is of some interest since the distributions ought to be centered on the county estimate. Some of the distributions will not be centered and there are several explanations, all of which are attributable either to model inadequacies or variation in the population of county prevalence values. Variation may occur because of sampling variation in the CDC data, or natural variation or differences among counties with respect to the individuals within the county. For example, a county population may be unusual in the sense that the demographic profile of the county does not predict prevalence well so that estimates π_η , . . ., π and p_t will differ systematically. These differences are apparent in Figure 12. Unusual county population attributes will carry down to the block-group level and hence, the block- group estimates will manifest these attributes. Therefore, a plot of the block-group estimates against the CDC prevalence estimates ought to appear much like Figure 12. However, since the block-group populations will exhibit more variability with respect to the predictive variables than the counties (because block-group populations are fractions of the county population), the estimates π_η , π_η , . . ., _{i n} ought to exhibit more variation. If the model is insensitive to variation demographics, obesity and exercise, then the distributions will not exhibit much variation.

Figure 15 shows the block-group prevalence estimates (as a percent of the population) for 2012 plotted against the CDC prevalence estimates for each of the 3141 counties, n = 217716. The distributions of the 217, 716 block-group estimates tend to be centered on the CDC estimates. If there were no sampling error and the predictive model were exact, then every distribution ought to be centered on the line. (The line has slope 1 and intercept 0.) That is not case but the large majority of distributions are centered close to the line. There is visual evidence of a slight tendency for the distributions to be right-skewed (having longer tails toward larger values), but that is to be expected in a situation in which most of the centers are closer to the minimum prevalence (0) than the maximum prevalence (100). There is more room for variation toward large values than small. The magnitude of some of the block-group estimates is surprisingly large— the maximum value is 42.5%— whereas the maximum county estimate is 19.8%.

To take a closer look, we have extracted the data for Montana and reproduced Figure 15 as Figure 16 with the addition of open circles identifying the distribution centers. Figure 16 shows block-group prevalence estimates (as a percent of the population) for 2012 plotted against the CDC county prevalence estimates for Montana, n = 841. The line has slope 1 and intercept 0; thus, the distributions ought to be centered on the line. The red points indicate the centers of the distributions. The two counties with the smallest county estimates are for Gallatin and Missoula counties notable for their large university populations.

On the whole, the distributions are centered about the county estimates. There are exceptions, though. For Gallatin county (prevalence = 3.9%) and Missoula county (prevalence = 5.1%), the block-group estimates are consistently larger than the county estimate, presumably because of the universities located in the counties. Most of the university students reside in a few block-groups; it may be that their health and behavior is different enough to affect prevalence at the county level and the few block- groups in which they reside. The same may be said of university employees with better education than the general population.

The counties with the largest county estimates are Glacier, Big Horn,

Blaine, and Roosevelt all of which are contained, or partially contained, in impoverished Indian reservations. Generally, the divergence of the block-group distributions from the county estimates reflect the omission of variables that are important factors (for instance, diet and nutrition) for describing variation in population prevalence.

TABLE 2

Variable Estimate Variable Estimate

Less than 18 -.01787 Pacific Islands

18 to 64 -.01817 Female .00635

Greater than 64 0 Obesity .02078

African American -.00729 Leisure .01462

American Indian -.00476 Alabama -0.87002

Asian -.00913 Alaska -1.36062

Non-Hispanic White -.01005 Arizona -1.03081

Hispanic -.00909 Arkansas -1.02340

Native Hawaiian and 0 California -0.99287 Variable Estimate Variable Estimate

Colorado -1.19636 Nevada -1.01685

Connecticut -1.08321 New Hampshire -1.02307

Delaware -1.05553 New Jersey -1.01899

District of Columbia -0.97850 New Mexico -1.16753

Florida -.91958 New York -1.00184

Georgia -.90786 North Carolina -.96431

Hawaii -1.17239 North Dakota -1.17791

Idaho -.99935 Ohio -.91946

Illinois -1.05288 Oklahoma -1.06835

Indiana -.96400 Oregon -1.04028

Iowa -1.15208 Rhode Island -1.08757

Kansas -1.06389 South Carolina -.91402

Kentucky -.89862 South Dakota -1.21942

Louisiana -.95515 Tennessee -.96874

Maine -.97718 Texas -.99060

Maryland -1.00554 Utah -1.08182

Massachusetts -.96690 Vermont -1.14922

Michigan -.96294 Virginia -.94673

Minnesota -1.12652 Washington -1.04468

Mississippi -.96392 West Virginia -.86630

Missouri -1.05986 Wisconsin -1.08832

Montana -1.17951 Wyoming -1.1 1521

Nebraska -1.12229 Pennsylvania -.96806

Table 2 shows parameter estimates for the county-level model of type 2 diabetes prevalence. The model does not contain a constant term (intercept). Since mean prevalence across all states is less than 0.5 (as a proportion), the coefficient for each state is negative. The baseline age class is greater than 64 years of age, the baseline gender is males, and the baseline race is Native Hawaiian and Pacific Islands. EXAMPLE 2

The following example provides a discussion of algorithms for the reduction of Behavioral Risk Surveillance Survey (BRFSS) data to a form suitable for mapping the prevalence and incidence of chronic diseases and related demographic variables. The process involves two algorithms. Together, their principal functions are data imputation and variance reduction. We also discuss forecasting of county population using U.S. Census Bureau data. As an overview, the process begins with consuming a set of publicly available data files produced by the Centers for Disease Control and Prevention. Each data record consists of a set of responses to questions asked of a survey participant. The data are reduced to set of summary statistics for each U.S. county and sampling year (years 2000 through 2012). Despite the large total number of records (~ 5.6 million), there are county and year combinations that are data deficient. The algorithms described below are used to ameliorate data deficiencies by smoothing and pattern matching.

The U. S. Centers for Disease Control and Prevention initiated the

Behavioral Risk Factor Surveillance System (BRFSS) in 1984 for the purpose of collecting information on the factors that affect health and well-being of U.S. residents. The BRFSS is now largest periodic sample survey in the world. In the course of the survey, a sample of U.S. adult residents are asked questions regarding health and health-related behaviors. To illustrate, one question asks has a doctor, nurse, or other health professional ever told you that you have diabetes? The possible responses are 1. yes, 2. no, 3. no, (but) pre-diabetes or borderline diabetes, 4. yes, but only during pregnancy, 5. don 't know/not sure, 6. refused. The interviewer may also enter the code 7 indicating that the question was not asked. A large fraction of the responses are informative; for example, response codes 5, 6, and 7 only amounted to .18% of the year 2014 sample.

We use the BRFSS data for estimating prevalence and incidence of several important chronic diseases and also for estimating of several demographic variables that are related to prevalence and incidence of chronic diseases. Data from the time span 2000 through 2012, inclusive are used for this purpose. After 2012, the CDC has suppressed or hidden the respondent's county of residence. For

geographically explicit analysis, these data are of limited value. Data from years preceding 2000 are lesser in volume and the coding of responses sometimes is inconsistent with the twenty-first century data. Table 3 lists the names of the data files used in our analyses.

Table 3 :

BRFSS data files for the analysis of disease prevalence and incidence. Year of origin is coded in two-digit form as part of the file name. cdbrfsOO.ASC cdbrfsOl .ASC cdbrfs02.ASC CDBRFS03.ASC CDBRFS04.AS CDBRFS05.ASC CDBRFS06.ASC CDBRFS07.ASC CDBRFS08.ASC CDBRFS09.ASC CDBRFS10.ASC LLCP2011.ASC LLCP2012.ASC

The individual data files are located in sub-directories labeled by year.

The responses to questions asked of a respondent are stored as a single line referred to herein as a record. Hence, each record consists of the answers to the questions asked of one respondent. A fixed length format is used to store the answers. Format varies with year. The CDC maintains annual codebooks that specify the formatting, the questions asked of the respondents, and the coding of answers. The code books are available for viewing from the CDC portal.

Because data format varies with year, we maintain a dictionary, or hash table, that records the field position of each variable in each year. The dictionary keys are year, and the value associated with a particular year is another dictionary that uses variable names as keys. The values associated with a variable name is the field position. For example, a partial entry for year 2012 is fieldDict[12] = {'bmi' :(1644, 1647),'diabetes' :97,'weight' :(1449, 1458),

Data processing begins with reading each record as a string. Substrings are extracted from the record, translated to a usable form (commonly but not exclusively, an ordinal variable), and stored. To extract a variable from the record, we look up its field position in the dictionary of fields positions, extract the substring from the record corresponding to the field positions, and assign the contents of the substring to the variable while translating the string to either an integer or floating point variable. The variable is then stored in a dictionary that uses the pair (year, county) as keys. The dictionary value a list consisting of all pairs. Every respondent that provided an informative response from the county and year will be included in the list. Each pair consists of the response to the question (the variable) and the sampling weight assigned to the respondent by the CDC (sampling weights are discussed next). 1.1 Sampling Weights

The Behavioral Risk Factor Surveillance System survey is conducted by telephone. Specifically, contacts are made via landlines and cell phones. Cell phone numbers are randomly sampled whereas landline numbers are sampled by stratified sampling. The resulting sample design is neither random nor representative since some sub-groups of the U.S. population are over-sampled at the expense of other sub-groups. Therefore, conventional estimators such as the sample mean do not necessarily yield unbiased estimates of the estimands. The CDC provides a set of sampling weights that are proportional (approximately) to the probability of selecting a respondent belonging to a particular subgroup defined by age, gender, race and several other demographic variables. These weights provide a means by which bias may be reduced or perhaps eliminated. The sampling weights reflect the likelihood selecting a particular respondent and are roughly proportional to the inverse estimated probability of selecting the respondent.

Sampling weights are used as follows. Let Vk, k = l, . . . , n, denote the sampling weight assigned to the kt record (or respondent). The weights are scaled to w_k = v_k / Ύ\" ν .

sum to 1 by defining a second set of weights ¹ for k = 1, . . . , n, where n denotes the number of informative responses for a particular year. We use w to denote the vector of weights. The weights can be incorporated into a linear estimator with no difficulty. (A linear estimator is a linear combination of the observations). A linear estimator computed from observations contained in the vector y = (y₁ . . . _y_n)T may be expressed as a linear combination:

n

y^T _W =∑_Wky_k , (1) k=\

where w^s are coefficients such that 0 < w, < 1 for all z, and w_t = 1 . The sample mean is recognizable as a linear predictor if we set w_t = n^~l for each k. Alternatively, an estimator of the mean may be computed by using the BRFSS sampling weights in formula (1). Many, but not all estimators are linear estimators. Examples include histograms and linear regression estimators. The median is a counter-example.

1.2 County-level Data

The first step of analysis is to estimate the mean of a variable for county i in year j. In the next section, it will become clear that estimation of the mean of an appropriately defined binary variable yields an estimate of the proportion of individuals in a population that possess the attribute indicated by the binary variable. Hence, this step also includes estimating the proportion of county residents that are obese, or diabetic, or some other condition.

The estimator of the mean level for county i and year j is a weighted sum where the weights are the BRFSS sam ling weights:

where Xyk is the kth observation from county / for year j. We may compute an estimate of the variance σ² of the variable using the weighted estimator

n

σ¹ = n ^xYw_l]kx _]k - ji¹. (3) k=\

We work with quantitative (e.g., body mass index) and binary variables. A quantitative variable takes on values that can be unambiguously ordered and for which a difference between two values has a clear meaning. For example, body mass index, a measure of body weight scaled by the individual' s height for comparability (units are kg/m²) is quantitative: there's no question that 20 < 30 and that 30 - 20 = 10 = 35 - 25. In comparison, to represent diabetes, we may use values in {0, 1 } or {no, yes} . We cannot unambiguously assign a value to yes - no. The binary variables are indicators of a state or condition associated with a BRFSS respondent. In particular, binary variables are used to estimate prevalence (regardless of the condition or disease).

We also estimate the distribution some quantitative variables (body mass index in particular) for a county i for 2012— the most recent year of data— using a histogram estimator (discussed below).

1.3 Estimation of Prevalence

There are several conditions or diseases for which we estimate the prevalence. Prevalence is the proportion of the adult population that has a particular disease or condition. Equivalently, prevalence is the probability that an individual selected at random from a population has the disease. A common application of the methods described herein is to estimate the prevalence of a disease for each county in the United States for a span of years. As the span usually includes future years, we forecast prevalence.

To simplify notation, we will temporarily ignore county and year and discuss the estimation of prevalence for one specific year and county. In the absence of disproportionate weighting, the usual prevalence estimator is the sample proportion of affirmatory binary responses. For the time being, suppose that the responses to the question are coded as 1 for yes and 0 for no. The data consists of n binary observations i, . . . , x„. Then, the sample proportion is computed as n x_k where X_k denotes the Mi binary response and n denotes the number of responses obtained from the county and the year.

However, because of disproportionate sampling, we use the weighted prevalence estimator

where vt is the BRFSS sampling weight associated with the kt respondent. Formula (4) is essentially the same as formula (2). Therefore, we will use the general notation^ to refer to the estimated prevalence for county i in year j computed using formula (4).

1.4 Notation

From this point forward, we treat the county means as the units of observation. The following notation is used to denote an estimate for county i in year j: i_j . Note that in Section 1.2 of Example 2 above we describe the calculation of the county mean estimates as a linear combination and used the symbol y to represent the mean for some county and year. We change notation now, and identify the mean for county i and year j as y_tj .

The observations from a single county are collected as the vector

y, = (y i . . . y _m)^T, (5) where 0 < m < 13 is the number of years of observation. Not every variable was measured in every county in every year, and so a significant effort was needed to impute the missing values.

2 Analytics

We describe smoothing algorithms for data imputation and variance reduction in this section.

Despite the massively large volume of the BRFSS data sets, the partitioning of the U.S. as 3221 geographic units (counties primarily, with a few exceptions such as the District of Columbia and some U. S. territories) implies that the sample size for some geographic units is insufficient to provide precise estimates of all the estimands of interest. Moreover, some counties (those with population sizes less than 10,000) are not sampled in the course of the BRFSS surveys. Imputation is necessary to generate estimates for these small population counties. 2.1 Smoothing Algorithms

This discussion explains the algorithms used to impute estimates for counties missing data and to improve the precision of counties with small sample sizes.

Genetically, the estimator of the estimand of interest is denoted asyy, where i references the county and j references year. Two iterations of smoothing are carried out: pattern matching and spatial smoothing.

Both smoothing algorithms use the same core algorithm, the exponentially weighted k-nearest neighbor prediction function. The next section describes it. Then, we discuss pattern matching and spatial smoothing.

The exponentially weighted k-nearest neighbor smoothing algorithm

The exponentially weighted ^-nearest smoothing algorithm predicts the value of a random variable as a linear combination of observations. We suppose that there is a data set consisting of scalar observations D = {yi, . . . , y_n). The values are subject to variability or error, or course, and the objective is to reduce the variability by replacing a value yo E D, or predicting an unobserved value yO with a linear

combination of observations. The exponentially weighted k-nearest neighbor predictor of the target y₀ is

where Wj is a weight (not a BRFSS sampling weight) discussed in detail momentarily. The terms _y[₀], ¾, . . . , y„ are observations that have been ordered with respect to their distance from the target y₀. For instance, the terms may be county means measured in the same year as yO that have been ordered with respect to the Euclidean distance between the centroid of the county from which y₀ originated and all counties in the data set. Whenever, the target is available, we include it in the estimator (equation 6 of Example 2). Note that it will always be the nearest neighbor, i.e., y[o] =yo, since the distance between _y₀ and itself is zero. All other counties have a positive distance to the county of y₀. The weights are defined by

ν_; = α(1 - α)^Μ, / = 1, 2, (7) where 0 < a < 1 is a tuning constant. Figure 29 is a graph 2900 which shows the sequence of weights plotted against i for four choices of a (right panel). The left panel shows the weights corresponding to the conventional k-nearest neighbor prediction functions for k G {3, 5, 10, 20}. Large values of a place more weight on the nearest neighbors and produce weights that decay rapidly to zero, whereas less weight is placed on the near values and the rate of decay is slower for smaller values of a.

We turn now to the smoothing algorithms. They are applied in this order: pattern matching followed by spatial smoothing. In both smoothing algorithms, smoothing is applied to a vector of observations from a county so that each element in a vector of county data is smoothed using the data from same set of neighboring counties. The target is then a vector yO with length equal to the numbers of years of observation on the county. 2.1.2 Pattern Matching

The intent of this algorithm is to reduce the effect of anomalous observations and impute missing data. Every county with data is used as a target and therefore undergoes smoothing by pattern matching. The main idea is to find counties with a similar pattern of values to a target and compute a linear combination of the values as the smoothed value while giving the greatest weight to the most similar patterns.

If there is a complete set of annual observations, the data from county i is a vector

y, = (y,,i ... y,,_m)^T, (8) where m = 13 is the number of years of observation. (Not every variable was measured in every county in every year). To be clear on notation, yi consists of the annual estimates _y,_;i, ... , y_ifn where m is the number of years of observation on the variable in the BRFSS data. There's a great deal of variation among variables with respect to completeness: for some variables, there's very little annual data that is missing; for others, many of the 13 years may be missing, and some counties with low populations may be missing entirely. Some data vectors, then, will not look like j,— they will be missing data.

Suppose the target is jo, and so we will impute missing years if there are any, and otherwise reduce variation in the elements of jo by smoothing. (It is assumed that there is at least one observation in the target vector otherwise, there's no pattern to match.) We determine a neighborhood set N₀ = {j[oj, J[i], ··· , J[s]}, where s is the number of observations that can be matched with y₀. The count s depends on the number of years for which the BRFSS collected observations on the variable for the counties.

An observation is included in the set neighborhood set No (i-e., matched) if there is at least one common year of observational data. It's convenient to define the indicator variable Io_j according to

The variable _j is similarly defined. The distance from j₀ to j, is computed from the common years and is

where m = 13 is the total number of years. The denominator is the number of years with data from both counties. The practical effect of computing distances between counties using equation 10 of Example 2 is that it allows the counties to be ordered so that data vectors most similar to yO with respect to the observed values (the patterns) will be nearest (having the small distances).

The next step is to order all observations in the neighborhood Ν0 according to the distances d_M (y₀, yi), ... , d_M (yO, y_s), smallest to largest. The ordered vectors are denoted by [_;] , i = 0, 1, ... , s - 1. Observations with the smallest indices are assigned the greatest weight in the calculation of the smoothed values. The smoothed value for year j is computed as

We compute a smoothed estimate for each of the m years.

Consequently, if there is no data for j year for the county, we are imputing the value y_{0 j} . On the other hand, if there is no data for a county (in any year), pattern matching is not possible since we require at least one year of data from the county.

The tuning constant determines the relative contribution of each neighbor (and the observation vector y₀) towards the smoothed value y₀ . We use a = .2 which assigns a relative weight of .2 to y₀ and .16 to the nearest neighbor (besides itself). The weights continue to decrease by a factor of .8 as the index increases. The logic of this weighting scheme (determined by the choice a = .2) is this: if a county has most or all years of data, then the nearest neighbors too will have most years. (By definition, a neighbor must have the same years of data as the target but may have more years). To be a near neighbor, the annual values must be consistently close (as there are typically more than 2500 possible neighbors) and the effect of smoothing will not be great because the nearest annual values y_[0], ¾, . . . , entering into the estimate are similar to yo_j . On the other hand, if a number of years are missing, then we are imputing the values and we wish to have a relatively large set of values significantly contributing to the imputed value, necessitating a < .25.

2.1.3 Spatial Smoothing

Spatial smoothing is used to generate estimates for counties with no data and also to reduce variability in the pattern-matched estimates. A very similar neighborhood formation algorithm is used here though distances between counties are computed differently and therefore different neighborhoods are produced.

Distances between counties are computed as the straight-line Euclidean distance between county centroids. The squared distance from county 0 and county i is computed from the latitude (lat) and longitude (Ion) of the county centroids, and so the distance between the data vectors y₀ and j⁾, may be expressed as

^ ( o _{5 i} ) = (^lato - lat,. )² + (lon₀ - Ion, )² , (12) where (lat_;, lon_;) is the county centroid of the z^'th county. We order the observation vectors using this metric so that y₀ is nearest (its distance to y₀ is of course, zero), and ^ is the next spatially closest observation vector, and so on. The exception to this statement occurs when there are no observations at all on a county, and in this case, the nearest county is the nearest county for which data are available. The ordered observations are collected again in the neighborhood No.

For every county i and every year j, we compute the imputed or smoothed value according to j = > (¹³)

where s is the number of counties for which there are at least one year of data. The value of s depends on the variable. Collecting the annual values yields the spatially smoothed vector y₀ .

The tuning constant a is set to be between .3 and .5 because values in this range force the smoothed vector y₀ to be strongly determined by the nearest vector y[₀] (which is the country vector y₀ unless there is no data for the country in which case the nearest county with data strongly determines the estimate y₀ ).

3 Forecasting of BRFSS Variables

Forecasting uses a very simple linear model. We forecast the future value of y,y based on two county-specific terms: the estimated mean level for the time span 2000 through 2012, and the estimated rate of change during the time span 2000 through 2013. This treatment, or model, is pragmatic and conservative since 13 years of data is too little to adequately estimate time-varying rates of change reliably in each of 3221 geographic entities.

An estimator of rate of change for the z^'th county is the least squares estimator of β_\ based on the simple linear regression model

where E(7_y) is the expected mean level of the variable of interest for county i and year j. The intercept has no practical interpretation since it is the mean level in year 0. The setup is changed so that has a practical interpretation. We set x, to be the difference from year 2007 and year,, and replace year, with x, in the model. Then, is the expected value when x₇ = 0 or equivalently, for the year, = 2007. Shifting year provides an estimate of the mean near the midpoint of the time span, namely β₀ , , that utilizes all 13 years of data instead of the point estimate j⁾, ₂₀₀₇ . If the rate of change is approximately constant over the time span, then ^⁰ is a more precise estimator of the expected value at the midpoint of the time span. Since shifting the year variable adds a more information to analysis, we adopt the model

E(¾ = #>,, + A,, Xy, (15) where x, = 2007 - year,. It should be recognized that equation 14 of Example 2 implies a constant rate of change. For some variables, a linear rate of change is not sustainable over a long time interval as it would yield absurd estimates for years distant from 2007

(unless the estimated rate of change were zero). We use the model only for forecasting over a relatively short series of years, namely the time interval 2013 to 2025.

Forecasting is carried out using the pattern-matched and spatially smoothed data vectors y_{l 5} y_n , where n = 3221 is the number of geographic entities.

4 Density Estimation

For quantitative variables, we compute the estimates of the distribution of values for each county, an analysis generally referred to as density estimation. We use all years of data to compute a single density estimate for each county having available data. The foundation of the density estimate is a histogram.

Mathematically, a histogram is a set of pairs corresponding to the rectangles that comprise the histogram. One element of the pair is an interval that defines the base and the second element specifies the height of the rectangle. The height is either the number of observations that fall in the interval or the relative frequency of observations falling into the interval. The union of the intervals span the observed range of a variable of interest. Accordingly, we define a histogram

(mathematically) as the set of pairs

H = {(b_lJ>₁), . . . , (b_hj>_h)}. (16) The number of intervals is h, the z^'th interval is bj = (/,·, wi], and pi is the relative frequency of observations belonging to the interval. The interval b_i+i takes its lower bound as the previous upper bound, i.e., = u_t. The term p_t is often used as an estimator of the proportion of the population belonging to b_t. We show the intervals as open on the left and closed on the right but there' s no reason not to define the intervals as be closed on the left and open on the right instead. In any case, equal-length intervals are formed by dividing the range into h segments. In most applications, every observation in the data set belongs to the base of the histogram, (/,·, w/,+1], where /,· is the lower bound on the first interval and uh is the upper bound on the largest of h intervals. For example, for body mass index, the intervals are (10, 11], (11, 12], . . . , (89, 90]). Among the 5.6 million observations on body mass index in the BRFSS database, a few values have been recorded as values between 90 and 99. It is our supposition that these values are recording errors.

The BRFSS sampling weights may be introduced to eliminate bias introduced by disproportionate sampling. To do so, we write the sample proportion as a sum of indicator variables. The indicator variable identifies whether or not the kt sampling unit belongs to a particular interval b_t. The proportion of the population with membership in b_t is usually estimated by the sample proportion of observations belonging to bin b This estimator can be expressed as

where / is an indicator variable defined b

A histogram may be constructed using sampling weights Wi, . . . , we define

The notation for completely describing histograms for our purposes uses i and j to designate county an year, as above, and & to designate histogram interval. The histogram for county i and year j is

Hij = {(½, py,i), (bijj, p_{I] h})}, (20) where

yi_j is observation / from county i and year j, and «, is the number of observations from county i and year j.

The estimated proportion of the county population belonging to interval bi is estimated by the proportion of sample values belonging to the interval. We apply a smoothing algorithm to reduce estimation error based on the logic that the proportions filling adjacent intervals ought to be similar. Therefore, for each b_t with associated sample proportion >,·, we determine a neighborhood set. The distance between intervals bi and b_j is the distance between the midpoints of the intervals. If the target of smoothing is the estimate pi, then we order all estimates p_\, ..., p_h (including pi with a distance of zero to itself) based on these distance, and relabel the estimates as p^, , ... , P _] and compute the smoothed estimate

h

P₁ =∑w₁P[_l] - (22)

Presently, this analysis has been limited to body mass index using years 2008 through 2012, inclusive. If there is no county data, then we substitute a histogram constructed from all data from the state containing the missing county.

5 Population Forecasting

We forecast population for each county using U. S. Census Bureau data. The data used for forecasting are different than the BRFSS data since the Census Bureau data are either census data or estimates derived from census values and hence, are very accurate. While the U.S. Census Bureau cannot truly count every resident of a county, their decadal counts are very close in a relative sense to the true counts (the true counts are of course a moving target because of birth, death, immigration, and so on). For this reason, smoothing is not necessary. U.S. Census Bureau also provides county population estimates for recent inter-decadal years. We presume that the accuracy of these estimates is good.

U.S. Census Bureau data from the years 1980, 1981, 1982, 1983, 1984,

1990, 1999, 2000,. . . , 2015 are used for forecasting county populations from 2016 through 2030. We use ^-nearest neighbor regression, an adaption of least squares regression that weights the observation years so that the latest years in the sequence of years have the greatest influence in determining the least squares coefficients. This strategy is based on the assumption that the most recent years of data contain more information about the future than earlier years.

An estimator of the population rate of change for the ith county is based on the simple linear regression model shown in formula (14) of Section 3. In this context, E(7, ) is the expected population level for county i and year j. As in the previously discussed forecasting algorithms (Section 3), we use year measured from a reference point. We set x, to be the difference from year, and 2012 and replace year, with Xj in the model. Then, is the expected population of county j in 2012.

Consequently, shifting year provides a population estimate near the endpoint of the time span, namely β_{0 i} , that utilizes all years of data. The forecasting model is then Έ(Υ_1]) = β_ϋ + β_ιμ_], (23) where x, = year, - 2012.

Since the latest observations in the sequence of data (ordered by year) are assumed to be more informative for estimating the rate of population change and for estimating the actual population in 2012, we do not compute the least squares estimators. The least squares estimators do not account for differences in information content and so we instead compute the k-nearest neighbor regression estimators (which do account for differences in information content).

An adaption of the least squares estimator of the parameter vector ?, =

(βο,ι βι,ι)^Τ produces the k-nearest neighbor regression estimator of ?,·. The least squares estimator in matrix form is β = (χ^τχ}¹Χ^τ _γ . (24) where the design or model matrix X is formed by stacking the row vectors x , i = 1, 2, ..., « = 22. The row vectors are

- 32)^r

- 3 l)^r

(25)

Stacking the row vectors produces the design matrix

where n = 22 and q = 2. T -vector

To differentially weight the observations, we construct a diagonal weight matrix W. The diagonal elements are proportional to the exponential weights defined in equation 7 of Example 2 with a = .1. Since the short series of n = 22 weights does not sum to one, we scale the weights to produce weights that do sum to one by computing the weights according to

The y^'th diagonal element of ^is Wj . (All off-diagonal elements zero). Finally, the ^-nearest neighbor regression estimator of ?, (county /^') is

/¾ = (x^rJFx)^_1x¾ .

Forecasts are computed as discussed reviously in Section 3 :

6 Data Summaries

This section provides summaries of some variables that are used in the hosted system. We provide three summaries.

1. A histogram showing the distribution of county estimates for year 2012, the most recent year for which the CDC has provided county identifiers.

2. A histogram that shows the distribution of differences between the year 2012 estimates and the 2025 forecasts. We compute the estimated difference (or change) between year 2012 estimates and the 2025 forecasts for the z^'th county as

δ_ι = # x (2025 - 2012), (30) where ?, is the estimated annual change in the variable for county , A i s computed by a simple linear regression of estimated prevalence on year using the county-specific estimates of prevalence.

3. A table that shows selected quantiles of the distribution of estimates (or forecasts) for the years 2000, 2003, ... , 2021, 2024. When the CDC data does not support estimation of prevalence in earlier years, the table will omit those years.

6.1 Adult Asthma

Figure 30 is a graph 3000 which shows that the distribution of adult asthma estimates for year 2012 is centered at 12.6%. The distribution is roughly symmetric and with only three outliers are present.

Figure 31 is a graph 3100 which shows that the distribution of forecasted changes in adult asthma prevalence from 2012 to 2025 is centered about 2.74% and that the estimated changes are predominantly positive (In fact, 90.9% are positive). Only a small fraction of the change estimates are greater than 5% (7.79% are greater than 5%).

Table 2 provides summary statistics in greater detail. The median estimate shifts by 2.5% from 2000 to 2012, and by 3.38% from 2012 to 2024, indicating that adult asthma prevalence is increasing at a somewhat greater rate in the forecasts. The extremes of the distribution do not particularly expand in the forecasts though. For example, the difference between the .995 quantile and the median in 2012 is 14.0% whereas the difference between the .995 quantile and the median in 2024 is 11.5%. Table 4:

Ouantiles of the distributions of prevalence estimates (adult asthma) for selected years.

Quantile

Year .005 .025 .25 .5 .75 .975 .995

2000 .071 .078 .093 .101 .11 .129 .141

2003 .084 .091 .106 .115 .124 .146 .167

2006 .088 .096 .114 .124 .134 .156 .167

2009 .049 .065 .107 .125 .147 .21 .269

2012 .057 .07 .109 .126 .144 .198 .249

2015 .067 .086 .125 .14 .156 .191 .214

2018 .06 .083 .129 .147 .165 .205 .236

2021 .051 .078 .132 .153 .173 .221 .255

2024 .043 .074 .135 .160 .182 .237 .275 7 Body Mass Index

Figure 32 is a graph 3200 which shows that the distribution of county mean body mass index (kg/m²) for year 2012 is centered at 28.3. The distribution is slightly skewed to the left and with few outlier counties, all having unusually small means (less than 26 kg/m²).

The forecasted change in county mean body mass index (kg/m²) from

2012 to 2025 is centered at 1.61 (kg/m²), as shown in a graph 3300 of Figure 33. All counties have positive forecasted changes though a few outlier counties have forecasted changes only slightly larger than zero.

Quantiles of estimated county mean body mass index (kg/m²) are shown in Table 5 for selected years.

Table 5: Ouantiles of estimated county mean body mass index (kg/m²) for selected years .

Quantile

Year .005 .025 .25 .5 .75 .975 .995

2000 25.17 25.749 26.418 26.666 26.884 27.349 27.65

2003 25.445 26.098 26.846 27.119 27.367 27.918 28.198

2006 25.757 26.367 27.241 27.538 27.82 28.433 28.839

2009 24.55 26.38 27.514 27.934 28.389 29.603 30.447

2012 26.102 26.668 27.582 28.024 28.513 29.66 30.67

2015 26.494 27.152 28.222 28.633 29.041 29.961 30.407

2018 26.734 27.406 28.544 28.998 29.462 30.529 31.048

2021 26.975 27.655 28.862 29.371 29.894 31.115 31.692

2024 27.216 27.893 29.184 29.742 30.324 31.684 32.356

8 Obesity

Obesity for a BRFSS respondent is computed as a binary variable equal to 1 if the respondent's calculated body mass index is at least 30 kg/m² and is 0 is the respondent's body mass index is less than 30 kg/m².

Figure 34 is a graph 3400 which shows that the distribution of obesity rate by county for year 2012 is centered at 32.0%. The distribution is roughly symmetric and with only outlier counties with are present, all having unusually low rates of obesity.

The forecasted change in obesity rate from 2012 to 2025 is centered about 10.6%), as shown in a graph 3500 of Figure 35. All have positive forecasted changes.

Quantiles of the estimated obesity rates are shown in Table 6. Table 6: Quantiles of the distributions of obesity rate for selected years.

Quantile

9 Mental Illness

Mental illness is defined in the context of the BRFSS question: "Now thinking about your mental health, which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good?" We identify a respondent as having a mental illness if their response was 15 days or more.

Figure 36 is a graph 3600 which shows that the distribution of estimated mental illness rate by county for year 2012 is centered at 11.1%. The distribution is roughly symmetric and with little evidence of outliers. The forecasted change in mental illness rate from 2012 to 2025 is centered about 2.4%, as shown in a graph 3700 of Figure 37. Most of the counties have positive forecasted changes.

Quantiles of the estimated mental illness rates are shown in Table 7.

Table 7: Quantiles of the distributions of mental illness rate for selected years.

Quantile

The various implementations described above can be combined to provide further implementations. To the extent that they are not inconsistent with the specific teachings and definitions herein, all of the U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non-patent publications referred to in this specification and/or listed in the Application Data Sheet, including but not limited to U.S. patent application Serial No. 62/254883 filed November 13, 2015; U.S. patent application Serial No. 62/265890 filed December 10, 2015; U.S. patent application Serial No. 62/265893 filed December 10, 2015; and U.S. patent application Serial No. 62/368827 filed July 29, 2016, are incorporated herein by reference, in their entirety. Aspects of the implementations can be modified, if necessary, to employ systems, circuits and concepts of the various patents, applications and publications to provide yet further implementations.

These and other changes can be made to the implementations in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific implementations disclosed in the specification and the claims, but should be construed to include all possible implementations along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.

Claims

1. A method of operation in a system that comprises at least one processor and at least non-transitory processor-readable medium communicatively coupled to the at least one processor and that stores at least one of processor-executable instructions or data, the method comprising:

accessing a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region;

accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, where the second type of geographic region is different from the first type of geographic region for at least one instance of a pair of geographic regions of the first and second types; and

generating a third data set, by at least one processor, the third data set searchable by a geographic region key that corresponds to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region, the data set including data regarding at least one demographic characteristic and at least one health-related characteristic representative of a population associated with the respective geographic region, where the at least one health-related characteristic is different from the at least one demographic characteristic.

2. The method of claim 1 wherein generating a third data set includes fitting a predictive model to the data of the first and the second data sets, by the at least one processor.

3. The method of claim 2 wherein fitting a predictive model to the data of the first and the second data sets comprises performing a binomial regression on the data of the first and the second data sets, by the at least one processor.

4. The method of any of claims 1 through 3 wherein the geographic regions of the second type encompass respective sets of the geographic regions of the first type.

5. The method of any of claims 1 through 3 wherein accessing a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region includes accessing a first data set that comprises demographic data for a respective population of each of a plurality of U.S. census block-groups.

6. The method of claim 5 wherein accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region includes accessing a second data set that comprises health-related data for a respective population of each of a plurality of counties.

7. The method of any of claims 1 through 3 wherein accessing a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region includes accessing a first data set that comprises data regarding at least one immutable characteristic.

8. The method of claim 7 wherein accessing a first data set that comprises data regarding at least one immutable characteristic includes accessing a first data set that comprises data regarding at least one of: a gender, an ethnicity, or an age class for individuals in the respective population of each of a plurality of geographic regions of a first type of geographic region.

9. The method of any of claims 1 through 3 wherein accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region includes accessing a second data set that comprises data regarding at least one non-demographic mutable characteristic.

10. The method of any of claims 1 through 3 wherein accessing a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region includes accessing a second data set that comprises data regarding at least one mutable characteristic.

11. The method of claim 10 wherein the data regarding at least one mutable characteristic includes data regarding at least one of: an obesity level or an activity level for individuals in the respective population of the geographic region of the second type of geographic region.

12. The method of claim 10 wherein accessing a second data set that comprises data regarding at least one mutable characteristic includes accessing a second data set that comprises data regarding at least one of: a percentage of the respective geographic area of the second type of geographic area that is rural, a total population of the respective geographic area of the second type of geographic area, or a total number of individuals by age class that have a defined condition.

13. The method of any of claims 1 through 3, further comprising: determining a bias correction for the data of the third data set, by the at least one processor.

14. The method of any of claims 1 through 3, further comprising: determining at least one confidence interval for the data of the third data set, by the at least one processor.

15. The method of claim 1 wherein the demographic data of the first data set represents a baseline period of time, and further comprising:

receiving user input that specifies an inquiry period of time, the inquiry period of time different from the baseline period of time; and

determining a prevalence of a health condition in at least a portion of the population for the inquiry period of time.

16. The method of claim 1 wherein the determining a prevalence of a health condition in at least a portion of the population for the inquiry period of time includes accumulating a prevalence over multiple sub-periods of the inquiry period.

17. The method of any of claims 1 or 15, further comprising: receiving user input that specifies a geographic area that overlaps a boundary of at least one of the geographic regions of the first type of geographic region; and

determining a prevalence of a health condition in at least a portion of the population for the geographic area specified by the user input.

18. The method of claim 17 wherein receiving user input comprises: a plurality of selections of points or lines on a map that define a polygon, and further comprising:

converting the user input that defines a polygon into a number of geographic region key values.

19. The method of any of claims 1 or 15, further comprising: receiving user input that specifies a geographic area that encompasses at least a portion of at least two of the geographic regions of the first type of geographic region; and

20. The method of claim 19 wherein receiving user input comprises: a plurality of selections of points or lines on a map that define a polygon, and further comprising:

21. The method of claim 1 wherein the demographic data of the first data set represents a baseline period of time, and further comprising:

determining a rate of change in a prevalence of a health condition in at least a portion of the population for the inquiry period of time.

22. A system, comprising: at least one processor; and

at least non-transitory processor-readable medium communicatively coupled to the at least processor and that stores at least one of processor-executable instructions or data, execution of which causes the at least one processor to:

access a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region;

access a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, where the second type of geographic region is different from the first type of geographic region for at least one instance of a pair of geographic regions of the first and second types; and

generate a third data set, the third data set searchable by a geographic region key that corresponds to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region, the data set including data regarding at least one demographic characteristic and at least one health-related characteristic representative of a population associated with the respective geographic region, where the at least one health-related characteristic is different from the at least one demographic characteristic.

23. The system of claim 22 wherein to generate a third data set the at least one processor fits a predictive model to the data of the first and the second data sets.

24. The system of claim 2 wherein to fit a predictive model to the data of the first and the second data sets the at least one processor performs a binomial regression on the data of the first and the second data sets.

25. The system of any of claims 22 through 24 wherein the geographic regions of the second type encompass respective sets of the geographic regions of the first type.

26. The system of any of claims 22 through 24 wherein the geographic regions of the first type are U.S. census block-groups.

27. The system of claim 26 wherein the geographic regions of the second type are counties.

28. The system of any of claims 22 through 24 wherein the data regarding at least one demographic characteristic comprises data regarding at least one immutable characteristic.

29. The system of claim 28 wherein the data regarding at least one immutable characteristic includes data regarding at least one of: a gender, an ethnicity, or an age class for individuals in the population.

30. The system of any of claims 22 through 24 wherein the data regarding at least one health-related characteristic comprises data regarding at least one non-demographic mutable characteristic.

31. The system of any of claims 22 through 24 wherein the data regarding at least one health-related characteristic comprises data regarding at least one mutable characteristic.

32. The system of claim 31 wherein the data regarding at least one mutable characteristic includes data regarding at least one of: an obesity level or an activity level for individuals in the respective population of the geographic region of the second type of geographic region.

33. The system of claim 31 wherein the data regarding at least one mutable characteristic includes data regarding at least one of: a percentage of the respective geographic area of the second type of geographic area that is rural, a total population of the respective geographic area of the second type of geographic area, or a total number of individuals by age class that have a defined condition.

34. The system of any of claims 22 through 24 wherein the at least one processor further:

determines a bias correction for the data of the third data set, by the at least one processor.

35. The system of any of claims 22 through 24 wherein the at least one processor further:

determines at least one confidence interval for the data of the third data set, by the at least one processor.

36. The system of claim 22 wherein the demographic data of the first data set represents a baseline period of time, and the at least one processor further:

receives user input that specifies an inquiry period of time, the inquiry period of time different from the baseline period of time; and

determines a prevalence of a health condition in at least a portion of the population for the inquiry period of time.

37. The system of claim 22 wherein to determine a prevalence of a health condition in at least a portion of the population for the inquiry period of time the at least one processor accumulates a prevalence over multiple sub-periods of the inquiry period.

38. The system of any of claims 22 or 36, wherein the at least one processor further:

receives user input that specifies a geographic area that overlaps a boundary of at least one of the geographic regions of the first type of geographic region; and

determine a prevalence of a health condition in at least a portion of the population for the geographic area specified by the user input.

39. The system of claim 38 wherein the at least one processor receives user input as a plurality of selections of points or lines on a map that define a polygon, and further:

converts the user input that defines a polygon into a number of geographic region key values.

40. The system of any of claims 22 or 36 wherein the at least one processor receives user input that specifies a geographic area that encompasses at least a portion of at least two of the geographic regions of the first type of geographic region, and further:

determines a prevalence of a health condition in at least a portion of the population for the geographic area specified by the user input.

41. The system of claim 40 wherein the at least one processor receives user input as a plurality of selections of points or lines on a map that define a polygon, and further:

42. The system of claim 22 wherein the demographic data of the first data set represents a baseline period of time, and the at least one processor further:

determine a rate of change in a prevalence of a health condition in at least a portion of the population for the inquiry period of time.

43. A system, comprising:

at least one processor; and

at least one processor-readable medium that stores at least one of processor executable instructions or data, which when executed by the at least one processor cause the at least one processor to:

generate at least a first boundary by encoding at least one user definition of at least a first user defined geographical region;

compare at least one portion of at least the first boundary of at least the first user defined geographical region to at least population data corresponding to at least the first user defined geographical region to produce at least a first estimated population of at least the first user defined geographical region; and

convert at least the first estimated population of at least the first user defined geographical region to at least a first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region, at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region being at least an estimated or forecasted count of at least a first estimated or forecasted number of patients affected by at least a first health condition and that at least reside in at least the first user defined geographical region.

44. The system of claim 43 wherein the at least one processor converts at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on at least a first rate of at least the first health condition in at least the first estimated population of at least the first user defined geographical region.

45. The system of claim 43 wherein the at least one processor converts at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on at least respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region.

46. The system of claim 45 wherein the respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region are at least respective estimated prevalences of the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region.

47. The system of claim 46 wherein the first health condition is at least one of type 2 diabetes or chronic lung disease.

48. The system of claim 45 wherein the respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region are at least respective numbers of hospitalizations attributable to at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region.

49. The system of claim 48 wherein the first health condition is at least one of coronary artery disease or stroke.

50. The system of claim 45 wherein the respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region are at least respective incidences of the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region.

51. The system of claim 50 wherein the first health condition is prostate cancer.

52. The system of claim 43 wherein the at least one processor further selects each census block at least partially contained in at least the first user defined geographical region.

53. The system of claim 52 wherein the at least one processor further reduces, for each respective census block that is partially yet not fully contained in at least the first user defined geographical region, at least one respective attribute associated with the respective census block by a proportion that at least corresponds to at least a first percentage of area of the respective census block that is contained in at least the first user defined geographical region.

54. The system of claim 53 wherein the at least one respective attribute associated with the respective census block is at least one of a respective population of the respective census block and a respective estimated population of the respective census block.

55. The system of claim 53 wherein the at least one respective attribute associated with the respective census block is at least one of a respective first health condition rate of the respective census block for at least the first health condition and a respective estimated or forecasted first health condition rate of the respective census block for at least the first health condition.

56. The system of claim 43 wherein the at least one processor further converts at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region to at least a first estimated or forecasted annual cost of treating at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region.

57. The system of claim 56 wherein the at least one processor converts at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted annual cost of treating at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on at least a first estimated or forecasted cost of treating a given individual affected by at least the first health condition.

58. The system of claim 57 wherein the at least one processor further obtains at least the first estimated or forecasted annual cost of treating the given individual affected by at least the first health condition from at least one user.

59. The system of claim 43 wherein the at least one processor- readable medium further stores at least one of processor executable instructions or data, which when executed by the at least one processor further cause the at least one processor to:

access, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count, a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region;

access, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count, a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, where the second type of geographic region is different from the first type of geographic region for at least one instance of a pair of geographic regions of the first and second types; and

generate, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count, a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, a third data set, by at least one processor, the third data set searchable by a geographic region key that corresponds to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region, the data set including data regarding at least one demographic characteristic and at least one health-related characteristic representative of a population associated with the respective geographic region, where the at least one health-related characteristic is different from the at least one demographic characteristic,

wherein the at least one processor converts at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on the third data set searchable by the geographic region key that corresponds to the geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region.

60. A method, comprising:

generating, by at least one processor, at least a first boundary by encoding at least one user definition of at least a first user defined geographical region;

comparing, by the at least one processor, at least one portion of at least the first boundary of at least the first user defined geographical region to at least population data corresponding to at least the first user defined geographical region to produce at least a first estimated population of at least the first user defined geographical region; and

converting, by the at least one processor, at least the first estimated population of at least the first user defined geographical region to at least a first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region, at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region being at least an estimated or forecasted count of at least a first estimated or forecasted number of patients affected by at least a first health condition and that at least reside in at least the first user defined geographical region.

61. The method of claim 60 wherein converting at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region includes converting, by the at least one processor, at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on at least a first rate of at least the first health condition in at least the first estimated population of at least the first user defined geographical region.

62. The method of claim 60 wherein converting at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region includes converting, by the at least one processor, at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on at least respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region.

63. The method of claim 62 wherein the respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region are at least respective estimated prevalences of the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region.

64. The method of claim 63 wherein the first health condition is at least one of type 2 diabetes or chronic lung disease.

65. The method of claim 62 wherein the respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region are at least respective numbers of hospitalizations attributable to at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region.

66. The method of claim 65 wherein the first health condition is at least one of coronary artery disease or stroke.

67. The method of claim 62 wherein the respective rates of at least the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region are at least respective incidences of the first health condition in each respective population of each respective one of at least one geographical portion of at least the first user defined geographical region.

68. The method of claim 67 wherein the first health condition is prostate cancer.

69. The method of claim 60, further comprising:

selecting, by the at least one processor, each census block at least partially contained in at least the first user defined geographical region.

70. The method of claim 69, further comprising:

reducing, by the at least one processor, for each respective census block that is partially yet not fully contained in at least the first user defined geographical region, at least one respective attribute associated with the respective census block by a proportion that at least corresponds to at least a first percentage of area of the respective census block that is contained in at least the first user defined geographical region.

71. The method of claim 70 wherein the at least one respective attribute associated with the respective census block is at least one of a respective population of the respective census block and a respective estimated population of the respective census block.

72. The method of claim 70 wherein the at least one respective attribute associated with the respective census block is at least one of a respective first health condition rate of the respective census block for at least the first health condition and a respective estimate first health condition rate of the respective census block for at least the first health condition.

73. The method of claim 60, further comprising:

converting, by the at least one processor, at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region to at least a first estimate annual cost of treating at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region.

74. The method of claim 73 wherein converting at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted annual cost of treating at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region includes converting, by the at least one processor, at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted annual cost of treating at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region based at least in part on at least a first estimated or forecasted cost of treating a given individual affected by at least the first health condition.

75. The method of claim 74 wherein the at least one processor further obtains at least the first estimated or forecasted annual cost of treating the given individual affected by at least the first health condition from at least one user.

76. The method of claim 60, further comprising:

accessing, at least prior to converting at least the first estimated population to at least the first estimated or forecasted health condition patient count, a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region;

accessing, at least prior to converting at least the first estimated population to at least the first estimated or forecasted health condition patient count, a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, where the second type of geographic region is different from the first type of geographic region for at least one instance of a pair of geographic regions of the first and the second types; and

generating, at least prior to converting at least the first estimated population to at least the first estimated or forecasted health condition patient count, a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, a third data set, by at least one processor, the third data set searchable by a geographic region key that corresponds to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region, the data set including data regarding at least one demographic characteristic and at least one health-related characteristic representative of a population associated with the respective geographic region, where the at least one health-related characteristic is different from the at least one demographic characteristic,

wherein converting at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region is based at least in part on the third data set searchable by the geographic region key that corresponds to the geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region.

77. A system, comprising:

at least one processor; and

at least one processor-readable medium that stores at least one of processor executable instructions or data, which when executed by the processor cause the at least one processor to:

estimate or forecast a prevalence of a first health condition in a population of a first user selected or defined geographic region for a first time period;

estimate or forecast a prevalence of the first health condition in the population of the first user selected or defined geographic region for a second time period, the second time period different than the first time period;

estimate or forecast a total number of new cases of the first health condition that will occur during the second period;

estimate or forecast a portion of the first population prone to the first health condition based on the estimated or forecasted total number of new cases of the first health condition that will occur during the second period; and

estimate or forecast a net present value of a defined intervention, the defined intervention which inhibits an onset of the first health condition in at least the portion of the first population prone to the first health condition.

78. The system of claim 77 wherein the first health condition is type 2 diabetes and the at least one processor estimates or forecasts a portion of the first population which is pre-diabetes based on an estimated or forecasted total number of new cases of type 2 diabetes that will occur during the second period.

79. The system of claim 78 wherein the at least one processor estimates or forecasts a portion of the first population which is pre-diabetes based on a percentage of the estimated or forecasted total number of new cases of type 2 diabetes that will occur during the second period, the percentage being a value between 5% and 10% inclusive.

80. The system of claim 77 wherein to estimate or forecast a total number of new cases of the first health condition that will occur during the second period the at least one processor determines a difference between an estimated or forecasted prevalence in the second period and an estimated or forecasted prevalence in the first period, and determines a product of the difference and a size of the population.

81. The system of claim 77 wherein to estimate a net present value of a defined intervention the at least one processor determines a per capita net present value.

82. The system of claim 81 wherein to determine a per capita net present value the at least one processor ^wnere y is a total number

of years and i is a discounted value of future money, and fi to f_y = c*r*a where c is an annual cost of treating the first health condition, r is the defined intervention's rate of risk reduction and a is a rate at which people with a second health condition that is a precursor to the first health condition develop the first health condition.

83. The system of claim 81 wherein the at least one processor further determines an estimated or forecasted return on investment for the defined intervention.

84. The system of claim 83 wherein to determine an estimate return on investment for the defined intervention the at least one processor further determines h = v/f₀ where h is return on investment, v is the per capita net present value, and f₀ is a cost per person of the defined intervention.

85. The system of claim 83 wherein the at least one processor further determines w = v*b where w is total present savings of the defined intervention, v is the per capita net present value, and b is a number of people with the second health condition that is the precursor to the first health condition.

86. The system of claim 78 wherein the at least one processor further receives user input that specifies the first user selected or defined geographic region.

87. The system of claim 85 wherein the first user selected or defined geographic region is a user defined geographic area.

88. The system of claim 86 wherein the at least one processor receives a plurality of user selections of points or lines on a map that define a polygon, and further converts the user input that defines a polygon into a number of geographic region key values.

89. The system of claim 86 wherein the at least one processor receives at least one user selection of at least one county, census tract, zip code or designated place.

90. The system of claim 89 wherein the at least one processor receives at least one user selection of at least one county, census tract, zip code or designated place via user selection of at least one location on a map which corresponds to the at least one county, census tract, zip code or designated place.

91. The system of claim 78 wherein the first health condition is chronic lung disease.

92. The system of claim 78 wherein the at least one processor further:

obtains at least one mutable ancillary characteristic; and

converts the at least one mutable ancillary characteristic into an estimated or forecasted portion of the population of the first user selected or defined geographic region with a particular set of attribute values and with the first health condition.

93. The system of claim 92 wherein the at least one mutable ancillary characteristic includes at least one of an obesity level and an exercise level.

94. The system of claim 92 wherein the at least one processor obtains the at least one mutable ancillary characteristic from a user.

95. The system of claim 92 wherein the at least one processor further converts the estimated or forecasted portion of the population of the first user selected or defined geographic region into an estimated or forecasted prevalence of the first health condition in the subpopulation of the population of the first user selected or defined geographic region for the second time period.

96. The system of claim 78 wherein the at least one processor- readable medium further stores at least one of processor executable instructions or data, which when executed by the at least one processor further cause the at least one processor to:

generate, at least prior to estimation or forecast of the prevalence of the first health condition in the population of the first user selected or defined geographic region for the first time period, at least a first boundary by encoding at least one user definition of at least a first user defined geographical region;

compare, at least prior to estimation or forecast of the prevalence of the first health condition in the population of the first user selected or defined geographic region for the first time period, at least one portion of at least the first boundary of at least the first user defined geographical region to at least population data corresponding to at least the first user defined geographical region to produce at least a first estimate population of at least the first user defined geographical region; and

convert, at least prior to estimation or forecast of the prevalence of the first health condition in the population of the first user selected or defined geographic region for the first time period, at least the first estimated population of at least the first user defined geographical region to at least a first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region, at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region being at least an estimated or forecasted count of at least a first estimated or forecast number of patients affected by at least a first health condition and that at least reside in at least the first user defined geographical region,

wherein the at least one processor estimates or forecasts the prevalence of the first health condition in the population of the first user selected or defined geographic region for the first time period based at least in part on at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region.

97. The system of claim 96 wherein the at least one processor- readable medium further stores at least one of processor executable instructions or data, which when executed by the at least one processor further cause the at least one processor to:

98. The system of claim 78 wherein the at least one processor:

receives a user selection of one or more user selected or defined geographic regions;

generates a geographic area report for each of the one or more user selected or defined geographic regions; and

presents the generated geographic area reports to the user.

99. The system of claim 98 wherein the at least one processor generates a geographic area report which includes a health condition by demographics report for at least one health condition and at least one demographic.

100. The system of claim 78 wherein the at least one processor:

receives a user selection of claims data for a claims population; and presents the selected claims data for the claims population to a user on a map.

101. The system of claim 78 wherein the at least one processor:

receives a user selection of health-related data; and

presents the health-related data to a user on a map.

102. The system of claim 101 wherein the health-related data relates to at least one of political data, physical data, social data, hazards data, or disease data.

103. The system of claim 78 wherein the at least one processor:

receives a user selection of live data; and

presents the live data to a user on a map.

104. The system of claim 103 wherein the live data comprises at least one of live air quality data or live hazards data.

105. The system of claim 78 wherein the at least one processor determines S(n'_g^ = (p^■ d)n'_g ^■ C(l), where S(n'_g^ is total annual, associated health care savings, C(l) is the annual, estimated per capita health care costs per person, p is a rate which reduces incidence as a result of the intervention, and n'_g is the estimated count of the population afflicted with the first health condition in the user selected or defined region.

106. The system of claim 78 wherein the at least one processor determines yearly cash flows based at least in part on an estimated cost associated with the defined intervention and an estimated health care savings associated with the defined intervention.

107. The system of claim 78 wherein the at least one processor receives a user indication of at least one of a cost of treating the first health condition or a cost of implementing the defined intervention.

108. A system, comprising:

at least one processor; and

estimate or forecast a first health condition patient count in a population of a user selected or defined geographic region for a first time period, the first health condition count being a count of a number of patients affected by a first health condition;

estimate or forecast a first health condition patient count in the population of the user selected or defined geographic region for a second time period, the second time period different than the first time period; and

estimate or forecast a net present value of a defined intervention, the defined intervention which inhibits an onset of the first health condition in at least a portion of the population of the user selected or defined geographic region.

109. The system of claim 108 wherein the at least one processor further:

estimates or forecasts a total number of new cases of the first health condition that will occur during the second period; and

estimates or forecasts a portion of the first population prone to the first health condition based on the estimated or forecasted total number of new cases of the first health condition that will occur during the second period.

110. The system of claim 108 wherein the at least one processor estimates or forecasts the first health condition patient count in the population of the user selected or defined geographic region for the second time period by converting a rate of the first health condition in the population of the user selected or defined geographic region into the first health condition patient count in the population of the user selected or defined geographic region for the second time period.

111. The system of claim 110 wherein the rate of the first health condition in the population of the user selected or defined geographic region is a prevalence of the first health condition in the population of the user selected or defined geographic region.

112. The system of claim 111 wherein the first health condition is at least one of type 2 diabetes or chronic lung disease.

113. The system of claim 110 wherein the rate of the first health condition in the population of the user selected or defined geographic region is a number of hospitalizations attributable to the first health condition in at least a portion of the population of the user selected or defined geographic region.

114. The system of claim 113 wherein the first health condition is at least one of coronary artery disease or stroke.

115. The system of claim 114 wherein the rate of the first health condition in the population of the user selected or defined geographic region is an incidence of the first health condition in the population of the user selected or defined geographic region.

116. The system of claim 115 wherein the first health condition is prostate cancer.

117. The system of claim 108 wherein the at least one processor further estimates an annual cost of treating every individual affected by the first health condition in the population of the user selected or defined geographic region by converting at least an estimated or forecasted annual treatment cost per individual into the annual cost of treating every individual affected by the first health condition in the population of the user selected or defined geographic region.

118. The system of claim 117 wherein the at least one processor further converts at least the annual cost of treating every individual affected by the first health condition in the population of the user selected or defined geographic region and an estimated or forecasted cost of the defined intervention into an estimated or forecasted cost savings of implementing the defined intervention in the population of the user selected or defined geographic region.

119. The system of claim 117 wherein the at least one processor further converts at least the annual cost of treating every individual affected by the first health condition in the population of the user selected or defined geographic region and the estimated or forecasted net present value of the defined intervention into an estimated or forecasted return on investment for implementing the defined intervention in the population of the user selected or defined geographic region.

120. The system of claim 108 wherein the at least one processor further:

obtains at least a first mutable ancillary characteristic; and converts at least the first mutable ancillary characteristic into a first estimated or forecasted portion of the population of the user selected or defined geographic region, the first estimated or forecasted portion being a first estimated or forecasted subpopulation of the population of the user selected or defined geographic region, the first estimated or forecasted subpopulation having both a first particular set of attribute values and the first health condition.

121. The system of claim 120 wherein at least the first mutable ancillary characteristic includes at least one of an obesity level and an exercise level.

122. The system of claim 120 wherein the at least one processor obtains at least the first mutable ancillary characteristic from a user.

123. The system of claim 120 wherein the at least one processor further converts the first estimated or forecasted portion of the population of the user selected or defined geographic region into an estimated or forecasted first health condition patient count in the first subpopulation of the population of the user selected or defined geographic region for the second time period.

124. The system of claim 123 wherein the at least one processor estimates the net present value of the defined intervention by converting the first estimated or forecasted first health condition patient count in the first subpopulation of the population of the user selected or defined geographic region for the second time period into the net present value of the defined intervention.

125. The system of claim 124 wherein the at least one processor converts the net present value of the defined intervention into a return on investment of the defined intervention.

126. The system of claim 120 wherein the at least one processor further:

obtains at least a second mutable ancillary characteristic; converts at least the second mutable ancillary characteristic into a second estimated or forecasted portion of the population of the user selected or defined geographic region, the second estimated or forecasted portion being a second estimated or forecasted subpopulation of the population of the user selected or defined geographic region, the second estimated or forecasted subpopulation having both a second particular set of attribute values and the first health condition;

converts the second estimated or forecasted portion of the population of the user selected or defined geographic region into an estimated or forecasted first health condition patient count in the second subpopulation of the population of the user selected or defined geographic region for the second time period;

converts the estimated or forecasted first health condition patient count in the second subpopulation of the population of the user selected or defined geographic region for the second time period into a net present value of another defined intervention; and

converts the net present value of the other defined intervention into a return on investment of the other defined intervention.

127. The system of claim 125 wherein the at least one processor further:

compares the return on investment of the defined intervention with the return on investment of the other defined intervention to produce a comparison result; and converts the comparison result into a humanly perceptible indication of the comparison result.

128. The system of claim 108 wherein the at least one processor- readable medium further stores at least one of processor executable instructions or data, which when executed by the at least one processor further cause the at least one processor to:

generate, at least prior to estimation or forecast of the first health condition patient count in the population of the user selected or defined geographic region for the first time period, at least a first boundary by encoding at least one user definition of at least a first user defined geographical region;

compare, at least prior to estimation or forecast of the first health condition patient count in the population of the user selected or defined geographic region for the first time period, at least one portion of at least the first boundary of at least the first user defined geographical region to at least population data corresponding to at least the first user defined geographical region to produce at least a first estimate or forecast population of at least the first user defined geographical region; and

convert, at least prior to estimation or forecast of the first health condition patient count in the population of the user selected or defined geographic region for the first time period, at least the first estimated population of at least the first user defined geographical region to at least a first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region for the first time period, at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region for the first time period being at least an estimated or forecasted count of at least a first estimated or forecasted number of patients affected by at least a first health condition and that at least reside in at least the first user defined geographical region for the first time period,

wherein the at least one processor estimates or forecasts the first health condition patient count in the population of the user selected or defined geographic region for the first time period based at least in part on at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region for the first time period.

129. The system of claim 108 wherein the at least one processor- readable medium further stores at least one of processor executable instructions or data, which when executed by the at least one processor further cause the at least one processor to:

access, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region for the first time period, a first data set that comprises demographic data for a respective population of each of a plurality of geographic regions of a first type of geographic region;

access, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region for the first time period, a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, where the second type of geographic region is different from the first type of geographic region for at least one instance of a pair of geographic regions of the first and second types; and

generate, at least prior to conversion of at least the first estimated population to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region for the first time period, a second data set that comprises health-related data for a respective population of each of a plurality of geographic regions of a second type of geographic region, a third data set, by at least one processor, the third data set searchable by a geographic region key that corresponds to a geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region, the data set including data regarding at least one demographic characteristic and at least one health-related characteristic representative of a population associated with the respective geographic region, where the at least one health-related characteristic is different from the at least one demographic characteristic,

wherein the at least one processor converts at least the first estimated population of at least the first user defined geographical region to at least the first estimated or forecasted health condition patient count in at least the first estimated population of at least the first user defined geographical region for the first time period based at least in part on the third data set searchable by the geographic region key that corresponds to the geographic region identifier that uniquely identifies the geographic regions of the first type of geographic region.

130. The system of claim 108 wherein the at least one processor: receives a user selection of one or more user selected or defined geographic regions;

presents the generated geographic area reports to the user.

131. The system of claim 130 wherein the at least one processor generates a geographic area report which includes a health condition by demographics report for at least one health condition and at least one demographic.

132. The system of claim 108 wherein the at least one processor: receives a user selection of claims data for a claims population; and presents the selected claims data for the claims population to a user on a map.

133. The system of claim 108 wherein the at least one processor: receives a user selection of health-related data; and presents the health-related data to a user on a map.

134. The system of claim 133 wherein the health-related data relates to at least one of political data, physical data, social data, hazards data, or disease data.

135. The system of claim 108 wherein the at least one processor: receives a user selection of live data; and

presents the live data to a user on a map.

136. The system of claim 103 wherein the live data comprises at least one of live air quality data or live hazards data.

137. The system of claim 108 wherein the at least one processor determines S(n'_g^ = (p^■ d)n'_g ^■ C(l), where S(n'_g^ is total annual, associated health care savings, C(l) is the annual, estimated per capita health care costs per person, p is a rate which reduces incidence as a result of the intervention, and n'_g is the estimated count of the population afflicted with the first health condition in the user selected or defined region.

138. The system of claim 108 wherein the at least one processor determines yearly cash flows based at least in part on an estimated cost associated with the defined intervention and an estimated health care savings associated with the defined intervention.

139. The system of claim 108 wherein the at least one processor receives a user indication of at least one of a cost of treating the first health condition or a cost of implementing the defined intervention.

140. The system of claim 108 wherein the at least one processor further receives user input that specifies the first user selected or defined geographic region.

141. The system of claim 140 wherein the at least one processor receives a plurality of user selections of points or lines on a map that define a polygon, and further converts the user input that defines a polygon into a number of geographic region key values.

142. The system of claim 140 wherein the at least one processor receives at least one user selection of at least one county, census tract, zip code or designated place.

143. The system of claim 142 wherein the at least one processor receives at least one user selection of at least one county, census tract, zip code or designated place via user selection of at least one location on a map which corresponds to the at least one county, census tract, zip code or designated place.