範例 - NasaWeather::atmos

Data Description

The atmos data set resides in the nasaweather package of the R programming language. It contains a collection of atmospheric variables measured between 1995 and 2000 on a grid of 576 coordinates in the western hemisphere. The data set comes from the 2006 ASA Data Expo.

Some of the variables in the atmos data set are:

  • temp - The mean monthly air temperature near the surface of the Earth (measured in kelvins (K))

  • pressure - The mean monthly air pressure at the surface of the Earth (measured in millibars (mb))

  • ozone - The mean monthly abundance of atmospheric ozone (measured in Dobson units (DU))

You can convert the temperature unit from Kelvin to Celsius with the formula

\[ celsius = kelvin - 273.15 \]

And you can convert the result to Fahrenheit with the formula

\[ fahrenheit = celsius \times \frac{9}{5} + 32 \]

For example, r example_kelvin degrees Kelvin corresponds to r example_kelvin - 273.15 degrees Celsius.

Application Example

Load package & 分組 & 作圖

# install.packages("nasaweather")
# install.packages("ggvis")
library("nasaweather")
library("dplyr")
library("ggvis")
# Set the year variable to 1995
year <- 1995

means <- atmos %>%
  filter(year == year) %>%
  group_by(long, lat) %>%
  summarize(temp = mean(temp, na.rm = TRUE),
         pressure = mean(pressure, na.rm = TRUE),
         ozone = mean(ozone, na.rm = TRUE),
         cloudlow = mean(cloudlow, na.rm = TRUE),
         cloudmid = mean(cloudmid, na.rm = TRUE),
         cloudhigh = mean(cloudhigh, na.rm = TRUE)) %>%
  ungroup()

# Inspect the means variable
means
## # A tibble: 576 x 8
##     long    lat  temp pressure ozone cloudlow cloudmid cloudhigh
##    <dbl>  <dbl> <dbl>    <dbl> <dbl>    <dbl>    <dbl>     <dbl>
##  1 -114. -21.2   296.     1000  268.     37.2     5.78     1.99 
##  2 -114. -18.7   296.     1000  266.     39.4     4.06     1.04 
##  3 -114. -16.2   297.     1000  263.     40.2     3.82     0.688
##  4 -114. -13.7   297.     1000  260.     38.1     3.47     0.660
##  5 -114. -11.2   298.     1000  259.     34.6     3.12     0.847
##  6 -114.  -8.72  298.     1000  258.     31.3     3.22     1.58 
##  7 -114.  -6.23  299.     1000  257.     27.8     3.99     2.77 
##  8 -114.  -3.73  299.     1000  256.     28.1     5.01     3.32 
##  9 -114.  -1.23  298.     1000  257.     26.0     5.30     3.07 
## 10 -114.   1.26  299.     1000  256.     30.9     7.24     4.23 
## # ... with 566 more rows
#plot the temp variable vs the ozone variable
means %>%
  ggvis(x = ~temp, y = ~ozone) %>%
  layer_points()

Modeling 關聯預測

# Change the model: base prediction only on temp
mod <- lm(ozone ~ temp, data = means)

# Generate a model summary and interpret the results
summary(mod)
## 
## Call:
## lm(formula = ozone ~ temp, data = means)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -53.305  -9.587  -3.129   8.074  33.255 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 939.3453    47.5335   19.76   <2e-16 ***
## temp         -2.2562     0.1595  -14.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.16 on 574 degrees of freedom
## Multiple R-squared:  0.2584, Adjusted R-squared:  0.2571 
## F-statistic:   200 on 1 and 574 DF,  p-value: < 2.2e-16

looking for the model’s estimates for the intercept and temp coefficients, as well as the p-value associated with each coefficient and the model’s overall Adjusted R-squared.

Reporting 報告撰寫

how to write the narrative sections

Cleaning

For the remainder of the report, we will look only at data from the year 2005. We aggregate our data by location, using the R code below.

means <- atmos %>%
  filter(year == year) %>%
  group_by(long, lat) %>%
  summarize(temp = mean(temp, na.rm = TRUE),
         pressure = mean(pressure, na.rm = TRUE),
         ozone = mean(ozone, na.rm = TRUE),
         cloudlow = mean(cloudlow, na.rm = TRUE),
         cloudmid = mean(cloudmid, na.rm = TRUE),
         cloudhigh = mean(cloudhigh, na.rm = TRUE)) %>%
  ungroup()

where the year object equals 2005.

Ozone and temperature

Is the relationship between ozone and temperature useful for understanding fluctuations in ozone? A scatterplot of the variables shows a strong, but unusual relationship.

We suspect that group level effects are caused by environmental conditions that vary by locale. To test this idea, we sort each data point into one of four geographic regions:

means$locale <- "north america"
means$locale[means$lat < 10] <- "south pacific"
means$locale[means$long > -80 & means$lat < 10] <- "south america"
means$locale[means$long > -80 & means$lat > 10] <- "north atlantic"

Model

We suggest that ozone is highly correlated with temperature, but that a different relationship exists for each geographic region. We capture this relationship with a second order linear model of the form

\[ ozone = \alpha + \beta_{1} temperature + \sum_{locales} \beta_{i} locale_{i} + \sum_{locales} \beta_{j} interaction_{j} + \epsilon\]

This yields the following coefficients and model lines.

lm(ozone ~ temp + locale + temp:locale, data = means)
## 
## Call:
## lm(formula = ozone ~ temp + locale + temp:locale, data = means)
## 
## Coefficients:
##               (Intercept)                       temp  
##                  1336.508                     -3.559  
##      localenorth atlantic        localesouth america  
##                   548.248                  -1061.452  
##       localesouth pacific  temp:localenorth atlantic  
##                  -549.906                     -1.827  
##  temp:localesouth america   temp:localesouth pacific  
##                     3.496                      1.785
## Guessing formula = ozone ~ temp

Diagnostics

An anova test suggests that both locale and the interaction effect of locale and temperature are useful for predicting ozone (i.e., the p-value that compares the full model to the reduced models is statistically significant).

mod <- lm(ozone ~ temp, data = means)
mod2 <- lm(ozone ~ temp + locale, data = means)
mod3 <- lm(ozone ~ temp + locale + temp:locale, data = means)

anova(mod, mod2, mod3)
## Analysis of Variance Table
## 
## Model 1: ozone ~ temp
## Model 2: ozone ~ temp + locale
## Model 3: ozone ~ temp + locale + temp:locale
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
## 1    574 99335                                  
## 2    571 41425  3     57911 706.17 < 2.2e-16 ***
## 3    568 15527  3     25898 315.81 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

收穫

YAML Head

CSS:

  • 創建css檔案放在同一個檔案位置下
  • 在html_notebook: 之後
  • css: faded.css (名稱)

Others

  • number_sections: yes 根據##產生數序階層
  • toc: yes 出現大鋼

Markdown

  • italicized by surrounding it in asterisks
  • bold by surrounding it in two asterisks
  • monospaced (like code) by surrounding it in backticks
  • link by surrounding it in hard brackets and then placing the link behind it in parentheses
    • bulleted list in Markdown,
    • place each item on a new line after an asterisk and a space
  • ordered list by placing each item on a new line
    1. after a number
    1. followed by a period
    1. followed by a space Don't forget to leave a blank line before staring the summation. Otherwise the list will not render correctly.
LaTeX equations
  • To embed an equation \(1*1=1\) in its own centered equation block, surround the equation with two pairs of dollar signs \[1*1=1\]

Code chunk

embed R code into your report. This gives you the best of both worlds: formatted text for narration, and precise R code for reproducible analysis.

knitr syntax

customize each R code chunk in your report by providing optional arguments after the r

  • echo = FALSE 讓程式碼不出現(只執行、秀結果)
  • eval = FALSE 反過來,只讓Code出現(不執行、沒結果) use to display example code that should not be run
  • results = ‘hide’ 會顯示Code並執行(不秀結果)
  • 可以直接給chunk名稱,另外呼叫時用ref.lable=‘名稱’

原則

  • It is common to display figures without the code that generates them (the code is a distraction). 通常有圖會關code (echo)
  • inline R code: 如採用means$locale[1]結果south pacific
  • Each R Markdown document is given a fresh, empty R session to run its code chunks in. This means that you will need to define any R objects that this document uses - and load any packages that it uses - inside the same R Markdown document. The document won’t have access to the objects that exist in your current R session.
  • By default, R Markdown will include message, warning and error in your report. We can use options to prevent from displaying these.

pandoc

Each R Markdown output template is a collection of knitr and pandoc options. You can customize your output by overwriting the default options that come with the template.

  • theme: change the CSS style of HTML output: default, cerulean, journal, flatly, readable, spacelab, united, or cosmo.

Session Info

sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Chinese (Traditional)_Taiwan.950 
## [2] LC_CTYPE=Chinese (Traditional)_Taiwan.950   
## [3] LC_MONETARY=Chinese (Traditional)_Taiwan.950
## [4] LC_NUMERIC=C                                
## [5] LC_TIME=Chinese (Traditional)_Taiwan.950    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bindrcpp_0.2.2  ggvis_0.4.4     dplyr_0.7.6     nasaweather_0.1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.18     knitr_1.20       bindr_0.1.1      magrittr_1.5    
##  [5] tidyselect_0.2.4 xtable_1.8-3     R6_2.2.2         rlang_0.2.2     
##  [9] fansi_0.3.0      stringr_1.3.1    tools_3.5.1      utf8_1.1.4      
## [13] cli_1.0.0        htmltools_0.3.6  lazyeval_0.2.1   yaml_2.2.0      
## [17] rprojroot_1.3-2  digest_0.6.17    assertthat_0.2.0 tibble_1.4.2    
## [21] crayon_1.3.4     shiny_1.2.0      later_0.7.5      purrr_0.2.5     
## [25] promises_1.0.1   mime_0.5         glue_1.3.0       evaluate_0.11   
## [29] rmarkdown_1.10   stringi_1.1.7    compiler_3.5.1   pillar_1.3.0    
## [33] backports_1.1.2  jsonlite_1.5     httpuv_1.4.5     pkgconfig_2.0.2