Prostate cancer (PCa) is a major cause of disease and mortality among men, and we are interested in determining its explanatory variables and study how the relationships vary spatially. Specifically, the correlation between PCa and obesity incidence rates on a county level is of particular interest in this study. Geographically linear regression (GLR) was used to define a non-spatial model, and geographically weighted regression (GWR) was then applied to explore the spatial variation of the linear relationship between PCa and obesity rates. We found that the GLR model was biased since its residual values were clustered in space. On the other hand, GWR provided us with a trustworthy model and reported a pervasively positive correlation between the variables in the deep South. Therefore, we suggest the people in the region commit to a healthier diet and lifestyle to lower their obesity cases and therefore PCa rates.
5 thoughts on “Spatial Heterogeneity of Prostate Cancer Incidence Rate and its Predictive Model – A Geographically Weighted Regression Analysis”
Comments are closed.
I was curious to see if you had looked at any potential omitted variables that may bias the coefficients of your geographically weighted regression. One that comes to mind is severity of obesity, which would lead to coefficients indicating a greater effect in locations with more severe obesity.
Yes, when doing the explanatory regression, we looked at more than 25 variables and narrowed down to 8 variables. In the poster, we condensed them to obesity rates, above 45 years old, and African American ethnicity. There are probably other variables that can be used to fit into the model, such as diabetes rates, but any additional predictor won’t bias our model. With only three predictors, we were able to explain 0.899 deviance, and we believe it is statistically powerful enough for this effort. Also, the severity of obesity is reflected in the obesity incidence rate on a county level. The obesity case count is a binary variable and not a continuous variable. We are not interested in studying the degree of obesity in the individual persons in the U.S., because let’s say you want to quantify the degree of obesity with BMI, and it doesn’t make sense to average all the BMI values from everyone in a county and take the mean as the “degree of obesity” of the county. Since our atomic unit is the county, we quantify the severity of obesity using the obesity incidence rate, which is the obesity cases in a county divided by the total number of people in the county. The same idea applies to the prostate cancer rate. Thank you for your question, and I hope I clarified it.
Hi guys! Welcome to my poster! My name is Peregrine Liu, and I am a senior student at Vanderbilt majoring in BME and Biochem. I conducted a project studying how prostate cancer rate and its predictors, especially obesity rate, vary across space. The result was fascinating and could be meaningful to you and your family, so please check it out if you want to advise your elderly family members a better living style to minimize their risk of developing prostate cancer! Please let me know what you think, and feel free to leave any comments!
Very clear presentation and results, Peregrine–good going. I appreciate that you walked through the process clearly. One question I have is about terminology: in the first part you mention GLR as “Geographically Linear Regression”, but I think you mean *Generalized* Linear Regression (i.e. aspatial)? If so, then the aspatial GLR produced spatially clustered residuals. That then suggests that the aspatial model is not well parameterized because there is a spatial dependency at work, correct? Then you ran GWR, and the results show clustering of high correlations in deep south, etc. Am I reading that correctly, or was your “GLR” actually GWR to begin with? I’m confident you ran the steps correctly, just wondering about the terminology, what is being referred to. In your conclusions you mention testing between bivariate relationships to further specify the potential causal mechanism. What variables might these be? I’m curious how you might further specify it. You have a good start to publishable paper her, good going!
Hi Dr. Wernke, you are absolutely right. It should be “Generalized linear regression,” which is used to model linear, aspatial relationships. That was my mistake, and thank you for helping me point it out.
Again, your interpretation was very accurate that GLR’s spatially clustered residuals inspired us to use GWR, which considers space when modeling the relationships.
I think it would be interesting to test bivariate relationships on the county level, and these could be positive linear, negative linear, convex, or concave relationships. I will start the analysis using the same variables used in this study: age, African American ethnicity, and obesity incidence rate. I will further improve the model by incorporating more predictors, such as diabetes incidence rate and BMI.