Creating p maps using R

Written 11/13/2006 by Liberty Hamilton. Updated 2/17/08 by Owen Phillips. Email Dr. Katherine Narr if you have any questions.

To run R, you will need the ucf files for each subject as well as a text file describing each subject. The ucf files and the text file should be placed in the same directory.

Setting up your account to use cranium

To run on the command line, you must first make sure that your account is set up to use R on cerebro (the grid). To do this, follow these steps (you should only have to do this once!):

  1. In your home directory, make a backup of your .Renviron file that configures to run on inire (you may not have one).
  2. cp ~woods/.Renviron ~/.
  3. Connect to ssh -X
  4. source /usr/sge/loni/common/settings.csh (if you are using the .csh shell)

Creating a Text File

  1. To create the text file, first decide which variables you'd like to analyze, and which variables you need to covary for. If you want to look at a group effect, covaried for sex and age, your text file should contain data for all of these categories. You could also look at a group by genotype interaction, covaried by brain volume. There are lots of options.
  2. The first row of the text file should look like the text below. Be sure to create the text files so that they are unformatted (plain text only). You can use TextEdit on the mac, or nedit, vi, or whichever text editor you prefer, as long as you save the file without any formatting.
    • Note that the first column header in the first row must read File, with a capital F. Anything else will cause R to crash.
    • There should be no spaces in the category names, and try to keep them fairly short. brain_vol instead of Brain Volume will work.
    • Make the text files tab-delimited. This will tell R which columns are which. Don't worry if the columns don't look like they line up visually. If there is one tab between each column, everything should be fine.
    • Also note that if you are missing data in a cell of the table, R will not run. To avoid this problem, you can create separate text files eliminating those rows that are missing data. For example, if you are missing genotyping data for a subject, you will have to exclude that subject from your text file for the genotype analysis.

File group sex age genotype brain_vol

The first column should contain the full names of the ucfs that you are analyzing. This can include the path, but if you run R in the same directory as your UCFs you will not need the path name to be specified.

Remember: You will need separate text files if you want to look at male controls versus male patients, or female controls versus female patients. All of the files in the text file will be used in the analysis; there isn't a way to specify only certain lines in the text file.

Example Text File: (note that columns don't line up, but there is a tab between each field)

File group sex age genotype brain_vol
/mydirectory/project/1234_hippo_L.ucf 1 0 23 aa 1803
/mydirectory/project/1254_hippo_L.ucf 0 1 42 bb 1799
/mydirectory/project/1412_hippo_L.ucf 1 1 55 ab 1782
/mydirectory/project/1223_hippo_L.ucf 1 0 29 bb 1833
/mydirectory/project/1255_hippo_L.ucf 0 1 42 aa 1792
/mydirectory/project/4921_hippo_L.ucf 0 0 24 ab 1822

R command line explanation

Running R on the command line is pretty simple, even though the command line arguments are very long. To run on the grid, be sure to put the "qsub" command before your command line so that your job will be submitted to the cluster queue.

  1. First connect to
    > ssh -X
  2. Run command lines as shown below:

Here is an example:

Group Effects:

qsub -b y -q long.q /ifs/woods/R/R-2.7.2/bin/R CMD BATCH --no-save --no-restore --quiet --args -table/mydirectory/project/text_file_LEFT.txt -formulay~sex+age+group+genotype -reducedy~sex+age+genotype -output/mydirectory/project/L_groupeffect.ucf /ifs/woods/rshape/batch_commands/anova_shape_with_sign_batch.R /mydirectory/project/L_groupeffect_error.log

So, you will have to change these parts: -table, -formulay, -reducedy, -output

-table The text file you created in the first section.
-formulay All of the variables you want to look at (all headers in your text file, excluding File)
-reducedy Your covariates, i.e. the variables for which you don't want to look at effects. If you want to see group effects, put everything but group in your -reducedy.
-output The name of the ucf file to be outputted. Call this something like group_effects_L.ucf or groupfemalesL.ucf.

Also remember to change the error log at the very end, so you will have an error log for each instance of R that you run. You can use this error log to see where things may have gone wrong.

More Examples:

Sex Effects:

qsub -b y -q long.q /ifs/woods/R/R-2.7.2/bin/R CMD BATCH --no-save --no-restore --quiet --args -table/mydirectory/project/text_file_LEFT.txt -formulay~sex+age+group+genotype -reducedy~age+group+genotype -output/mydirectory/project/L_sexeffect.ucf /ifs/woods/rshape/batch_commands/anova_shape_with_sign_batch.R /mydirectory/project/L_sexeffect_error.log

Group by genotype interaction:

qsub -b y -q long.q /ifs/woods/R/R-2.7.2/bin/R CMD BATCH --no-save --no-restore --quiet --args -table/mydirectory/project/text_file_LEFT.txt -formulay~sex+age+group+genotype+group:genotype -reducedy~sex+age+group+genotype -output/mydirectory/project/L_groupbygenoeffect.ucf /ifs/woods/rshape/batch_commands/anova_shape_with_sign_batch.R /mydirectory/project/L_groupbygenoeffect_error.log

pmaps and rmaps of the thalamus

Running R in the Pipeline

To run an ANOVA in R using the Pipeline, download the anova_signed.pipe module and create the text files as detailed above.

Other Statistical Models:

The following statistical models are also possible using the scripts found in /ifs/woods/rshape/batch_commands.