---
title: "cgdsr to cBioPortalData: Migration Tutorial"
author: "Karim Mezhoud & Marcel Ramos"
date: "`r format(Sys.time(), '%B %d, %Y')`"
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{cgdsr to cBioPortalData Migration}
  %\VignetteEncoding{UTF-8}
output:
  BiocStyle::html_document:
    code_folding: show
    number_sections: no
    toc: yes
    toc_depth: 4
---

```{r, setup, include=FALSE}
knitr::opts_chunk$set(cache = TRUE)
```

# Introduction

This vignette aims to help developers migrate from the now defunct `cgdsr`
CRAN package. Note that the `cgdsr` package code is shown for comparison but it
is not guaranteed to work. If you have questions regarding the contents,
please create an issue at the GitHub repository:
https://github.com/waldronlab/cBioPortalData/issues

# Loading the package

```{r,include=TRUE,results="hide",message=FALSE,warning=FALSE}
library(cBioPortalData)
```

# Discovering studies {.tabset .tabset-fade .tabset-pills}

## `cBioPortalData` setup

Here we show the default inputs to the cBioPortal function for clarity.

```{r , comment=FALSE, warning=FALSE, message=FALSE}
cbio <- cBioPortal(
    hostname = "www.cbioportal.org",
    protocol = "https",
    api. = "/api/v2/api-docs"
)
getStudies(cbio)
```

Note that the `studyId` column is important for further queries.

```{r}
head(getStudies(cbio)[["studyId"]])
```

## `cgdsr` setup

```{r,eval=FALSE}
library(cgdsr)
cgds <- CGDS("http://www.cbioportal.org/")
getCancerStudies.CGDS(cgds)
```

# Obtaining Cases {.tabset .tabset-fade .tabset-pills}

## `cBioPortalData` (Cases)

### Notes

* Each patient is identified by a `patientId`.
* `sampleListId` identifies groups of `patientId` based on profile type
* The `sampleLists` function uses `studyId` input to return `sampleListId`

### sampleLists

For the sample list identifiers, you can use `sampleLists` and inspect the
`sampleListId` column.

```{r}
samps <- sampleLists(cbio, "gbm_tcga_pub")
samps[, c("category", "name", "sampleListId")]
```

### samples from sampleLists

It is possible to get `case_ids` directly when using the `samplesInSampleLists`
function. The function handles multiple `sampleList` identifiers.

```{r}
samplesInSampleLists(
    api = cbio,
    sampleListIds = c(
        "gbm_tcga_pub_expr_classical", "gbm_tcga_pub_expr_mesenchymal"
    )
)
```

### getSampleInfo

To get more information about patients, we can query with `getSampleInfo`
function.

```{r}
getSampleInfo(api = cbio,  studyId = "gbm_tcga_pub", projection = "SUMMARY")
```

## `cgdsr` (Cases)

### Notes

* 'Cases' and 'Patients' are synonymous.
* Each patient is identified by a `case_id`.
* Queries only handle a single `cancerStudy` identifier
* The `case_list_description` describes the assays

### `getCaseLists` and `getClinicalData`

We obtain the first `case_list_id` in the `cgds` object from above and the
corresponding clinical data for that case list (`gbm_tcga_pub_all` as the case
list in this example).

```{r,eval=FALSE}
clist1 <-
    getCaseLists.CGDS(cgds, cancerStudy = "gbm_tcga_pub")[1, "case_list_id"]

getClinicalData.CGDS(cgds, clist1)
```

# Obtaining Clinical Data {.tabset .tabset-fade .tabset-pills}

## `cBioPortalData` (Clinical)

### All clinical data

Note that a `sampleListId` is not required when using the
`fetchAllClinicalDataInStudyUsingPOST` internal endpoint. Data for all
patients can be obtained using the `clinicalData` function.

```{r}
clinicalData(cbio, "gbm_tcga_pub")
```

### By sample data

You can use a different endpoint to obtain data for a single sample.
First, obtain a single `sampleId` with the `samplesInSampleLists` function.

```{r}
clist1 <- "gbm_tcga_pub_all"
samplist <- samplesInSampleLists(cbio, clist1)
onesample <- samplist[["gbm_tcga_pub_all"]][1]
onesample
```

Then we use the API endpoint to retrieve the data. Note that you would run
`httr::content` on the output to extract the data.

```{r}
cbio$getAllClinicalDataOfSampleInStudyUsingGET(
    sampleId = onesample, studyId = "gbm_tcga_pub"
)
```

## `cgdsr` (Clinical)

### Notes

* `getClinicalData` uses `case_list_id` as input without specifying the
`study_id` as case list identifiers are unique to each study.

### getClinicalData

We query clinical data for the `gbm_tcga_pub_expr_classical` case list
identifier which is part of the `gbm_tcga_pub` study.

```{r, eval = FALSE}
getClinicalData.CGDS(x = cgds,
    caseList = "gbm_tcga_pub_expr_classical"
)
```

# Clinical Data Summary

`cgdsr` allows you to obtain clinical data for a case list subset
(54 cases with `gbm_tcga_pub_expr_classical`) and `cBioPortalData` provides
clinical data for all 206 samples in `gbm_tcga_pub` using the `clinicalData`
function.

* `cgdsr` returns a `data.frame` with `sampleId` (TCGA.02.0009.01)  but not
`patientId` (TCGA.02.0009)
* `cBioPortalData` returns `sampleId` (TCGA-02-0009-01) and `patientId`
(TCGA-02-0009).
* Note the differences in identifier delimiters between the two packages,
`cgdsr` provides `case_id`s with `.` and `cBioPortalData` returns `patientId`s
with `-`.

You may be interested in other clinical data endpoints. For a list, use
the `searchOps` function.

```{r}
searchOps(cbio, "clinical")
```

# Molecular or Genetic Profiles {.tabset .tabset-fade .tabset-pills}

## `cBioPortalData` (molecularProfiles)

```{r}
molecularProfiles(api = cbio, studyId = "gbm_tcga_pub")
```

Note that we want to pull the `molecularProfileId` column to use in other
queries.

## `cgdsr` (getGeneticProfiles)

```{r,eval=FALSE}
getGeneticProfiles.CGDS(cgds, cancerStudy = "gbm_tcga_pub")
```

# Genomic Profile Data for a set of genes {.tabset .tabset-fade .tabset-pills}

## `cBioPortalData` (Indentify samples and genes)

Currently, some conversion is needed to directly use the `molecularData`
function, if you only have Hugo symbols. First, convert to Entrez gene IDs
and then obtain all the samples in the sample list of interest.

### Convert `hugoGeneSymbol` to `entrezGeneId`

```{r}
genetab <- queryGeneTable(cbio,
    by = "hugoGeneSymbol",
    genes = c("NF1", "TP53", "ABL1")
)
genetab
entrez <- genetab[["entrezGeneId"]]
```

### Obtain all samples in study

```{r}
allsamps <- samplesInSampleLists(cbio, "gbm_tcga_pub_all")
```

In the next section, we will show how to use the genes and sample identifiers
to obtain the molecular profile data.

## `cgdsr` (Profile Data)

The `getProfileData` function allows for straightforward retrieval of the
molecular profile data with only a case list and genetic profile identifiers.

```{r,eval=FALSE}
getProfileData.CGDS(x = cgds,
    genes = c("NF1", "TP53", "ABL1"),
    geneticProfiles = "gbm_tcga_pub_mrna",
    caseList = "gbm_tcga_pub_all"
)
```

# Molecular Data with `cBioPortalData`

`cBioPortalData` provides a number of options for retrieving molecular profile
data depending on the use case. Note that `molecularData` is mostly used
internally and that the `cBioPortalData` function is the user-friendly method
for downloading such data.

## `molecularData`

We use the translated `entrez` identifiers from above.

```{r}
molecularData(cbio, "gbm_tcga_pub_mrna",
    entrezGeneIds = entrez, sampleIds = unlist(allsamps))
```

## `getDataByGenes`

The `getDataByGenes` function automatically figures out all the sample
identifiers in the study and it allows Hugo and Entrez identifiers, as well
as `genePanelId` inputs.

```{r}
getDataByGenes(
    api =  cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_mrna"
)
```

## `cBioPortalData`: the main end-user function

It is important to note that end users who wish to obtain the data as
easily as possible should use the main `cBioPortalData` function:

```{r}
gbm_pub <- cBioPortalData(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"), by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_mrna"
)

assay(gbm_pub[["gbm_tcga_pub_mrna"]])[, 1:4]
```

# Mutation Data {.tabset .tabset-fade .tabset-pills}

## `cBioPortalData` (mutationData)

Similar to `molecularData`, mutation data can be obtained with the
`mutationData` function or the `getDataByGenes` function.

```{r}
mutationData(
    api = cbio,
    molecularProfileIds = "gbm_tcga_pub_mutations",
    entrezGeneIds = entrez,
    sampleIds = unlist(allsamps)
)
```

```{r}
getDataByGenes(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_mutations"
)
```

## `cgdsr` (getMutationData)

```{r,eval=FALSE}
getMutationData.CGDS(
    x = cgds,
    caseList = "getMutationData",
    geneticProfile = "gbm_tcga_pub_mutations",
    genes = c("NF1", "TP53", "ABL1")
)
```

# Copy Number Alteration (CNA) {.tabset .tabset-fade .tabset-pills}

## `cBioPortalData` (CNA)

Copy Number Alteration data  can be obtained with the `getDataByGenes` function
or by the main `cBioPortal` function.

```{r}
getDataByGenes(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_cna_rae"
)
```

```{r}
cBioPortalData(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_cna_rae"
)
```


## `cgdsr` (CNA)

```{r, eval=FALSE}
getProfileData.CGDS(
    x = cgds,
    genes = c("NF1", "TP53", "ABL1"),
    geneticProfiles = "gbm_tcga_pub_cna_rae",
    caseList = "gbm_tcga_pub_cna"
)
```


# Methylation Data  {.tabset .tabset-fade .tabset-pills}

## `cBioPortalData` (Methylation)

Similar to Copy Number Alteration, Methylation can be obtained by
`getDataByGenes` function or by 'cBioPortalData' function.

```{r}
getDataByGenes(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_methylation_hm27"
)
```


```{r}
cBioPortalData(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_methylation_hm27"
)
```

## `cgdsr` (Methylation)

```{r,eval=FALSE}
getProfileData.CGDS(
    x = cgds,
    genes = c("NF1", "TP53", "ABL1"),
    geneticProfiles = "gbm_tcga_pub_methylation_hm27",
    caseList = "gbm_tcga_pub_methylation_hm27"
)
```

# sessionInfo

```{r}
sessionInfo()
```