~/Week 4

Brandon Rozek

Photo of Brandon Rozek

PhD Student @ RPI studying Automated Reasoning in AI and Linux Enthusiast.

The cacher Package for R

The value of this is so other people can get the analysis or clone the analysis and look at subsets of the code. Or maybe more specifically data objects. People who want to run your code may not necessarily have the resources that you have. Because of that, they may not want to run the entire Markov chain Monte Carlo simulation that you did to get the posterior distribution or the histogram that you got at the end.

But the idea is that you peel the onion a little bit rather than just go straight to the core.

Using cacher as an Author

  1. Parse the R source file; Create the necessary cache directiories and subdirectories
  2. Cycle through each expression in the source file
    • If an expression has never been evaluated, evaluate it and store any resulting R objects in the cache database
    • If any cached results exists, lazy-load the results from the cache database and move to the next expression
    • If an expression does not create any R objects (i.e, there is nothing to cache), add the expression to the list of expressions where evaluation needs to be forced
    • Write out metadata for this expression to the metadata file

Using cacher as a Reader

A journal article says

​ “… the code and data for this analysis can be found in the cacher package 092dcc7dda4b93e42f23e038a60e1d44dbec7b3f”

library(cacher)
clonecache(id = "092dcc7dda4b93e42f23e038a60e1d44dbec7b3f")
clonecache(id = "092d") ## Same as above
# Created cache directory `.cache`
showfiles()
# [1] "top20.R"
sourcefile("top20.R")

Cloning an Analysis

graphcode() gives a node graph representing the code

Running Code

Checking Code and Objects

You can inspect data objects with loadcache. This loads in pointers to each of the data objects into the workspace. Once you access the object, it will transfer it from the cache.

cacher Summary

Case Study: Air Pollution

Particulate Matter – PM

When doing air pollution studies you’re looking at particulate matter pollution. The dust is not just one monolithic piece of dirt or soot but it’s actually composed of many different chemical constituents.

Metals inert things like salts and other kinds of components so there’s a possibility that a subset of those constituents are really harmful elements.

PM is composed of many different chemical constituents and it’s important to understand that the Environmental Protection Agency (EPA) monitors the chemical constituents of particulate matter and has been doing so since 1999 or 2000 on a national basis.

What causes PM to be Toxic?

NMMAPS

NMMAPS and Reproducibility

What Causes Particulate Matter to be Toxic?

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1665439

A Reanalysis of the Lippmann et al. Study

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2137127

Does Nickel Make PM Toxic?

Does Nickel Make PM Toxic?

One of the most important things about those three points to the right is those are called high leverage points. So the regression line can be very senstive to high leverage points. Removing those three points from the dataset brings the regression line’s slope down a little bit. Which then produces a line that is no longer statistical significant (p-value about 0.31)

What Have We Learned?

Lessons Learned