Start a new topic

Help plotting PCAIR outputs

Hello,


We would like to plot the PCAIR results found int he TOPMed Combined Exchange area. The goal is to create plots like these found in the PC-Air vignette


image


The TOPMed combined exchange area PC-Air outputs are not of the object type "pcair".  Has anyone had experience with this?



We would also like to potentially use PCAtools for more advanced plotting but this is not necessary.


Thank you.



Hi Dave,


The PC-AiR results for freeze8 are stored as a list, and the PCs for each sample are stored as a matrix in the `vectors` element of the list. You can plot them using the following example code:


```

library(tidyverse)

# Load PC-AiR results into R.

pcair <- get(load("freeze8_pcair_results.RData"))

# Convert to data frame.

x <- pcair$vectors %>% as.data.frame() %>% rownames_to_column("sample.id")

# with base R:

plot(x$V1, x$V2)

# with ggplot:

ggplot(x, aes(x=V1, y=V2)) + geom_point()

```


The point style in the plots from the GENESIS vignette indicate whether a sample is in the unrelated or the related set. The `pcair$unrels` and `pcair$rels` elements indicate which samples are in the unrelated and related set, respectively. If you'd like to include that in the plot:

```

x <- x %>% mutate(type=ifelse(sample.id %in% pcair$rels, "related", "unrelated"))

ggplot(x, aes(x=V1, y=V2, pch=type)) + geom_point() 

```


1 person likes this

Thanks Adrienne!  We ran the above code and we are still getting some hard to discern plots.  Should we run KING and PC-AIR to get better display of the PC separations?  FYI, we are focusing just on MESA for this GWAS.  Ultimately trying to determine how many PCs to include to fit the null model.




The TOPMed DCC already ran KING and PC-AiR to get the PC-AiR results in the Exchange Area ("freeze8_pcair_results.RData"), so I don't think you need to do that. You could try making a pairs plot of different PCs against each other (generic example of a pairs plot: https://statisticsglobe.com/r-pairs-plot-example/), as well as a parallel coordinates plot (generic example of a parallel coordinates plot: https://r-graph-gallery.com/parallel-plot.html). We scale the PCs to between 0-1 for the parallel coordinates to highlight separation. You may also need to use transparency for the plotted lines or points, since there is often a lot of overlapping data on the plots.


When running PC-AiR, we found that the first 11 PCs are needed to control for population structure, and we didn't see much separation in PC12 or higher. We therefore recommended including the PC-AiR PCs 1-11 when fitting a null model for GWAS using freeze8 data. 


You can find an example of the parallel coordinates plots in the TOPMed design paper (using freeze5, not freeze8, but results are similar): https://www.nature.com/articles/s41586-021-03205-y#Fig6




1 person likes this
Login or Signup to post a comment