Given a repository of gene-pathway associations either in a tab delimited file with three columns (pathwayID,pathway Description,Gene) or a corresponding dataframe, this function identifies all Gene Pair Signatures (pairs of genes that are as a combination unique to a single pathway) and Pathway Unique Genes (genes that are uniquely associated with a single pathway) and stores them in a format that is usable by sigora. Please also see the "details" and "note" sections below.

makeGPS(
  pathwayTable = NULL,
  fn = NULL,
  maxLevels = 5,
  saveFile = NULL,
  repoName = "userrepo",
  maxFunperGene = 100,
  maxGenesperPathway = 500,
  minGenesperPathway = 10
)

Arguments

pathwayTable

A data frame describing gene-pathway associations in following format: pathwayID,pathwayName,Gene. Either pathwayTable or fn should be provided.

fn

Where to find the repository.Either pathwayTable or fn should be provided.

maxLevels

For hierarchical repositories, the number of levels to consider.

saveFile

Where to save the object as an rda file.

repoName

Repository name.

maxFunperGene

A cutoff threshold, genes with more than this number of associated pathways are excluded to speed up the GPS identification process.

maxGenesperPathway

A cutoff threshold, pathways with more than this number of associated genes are excluded to speed up the GPS identification process.

minGenesperPathway

A cutoff threshold, pathways with less than this number of associated genes are excluded to speed up the GPS identification process.

Value

A GPS repository, to be used by sigora and ora.

Details

The primary purpose of makeGPS is to convert a user-supplied gene-pathway association table to a repository of weighted Gene Pair Signatures (GPS) that are unique features of pathways. Such GPS can than be used for signature (gene-pair) based analyses using sigora. Additionally, the resulting object also retains the original "single gene"-"pathway" associations for the purpose of followup analyses, such as comparison of sigora-results to traditional methods. ora is an implementation of the traditional (individual gene) Overrepresentation Analysis.

Note

This function relies on package slam, which should be installed from CRAN. It is fairly memory intensive, and it is recommended to be run on a machine with at least 6GB of RAM. Also, make sure to save and reuse the resulting GPS repository in future analyses!

References

Foroushani AB, Brinkman FS and Lynn DJ (2013).“Pathway-GPS and SIGORA: identifying relevant pathways based on the over-representation of their gene-pair signatures.”PeerJ, 1

See also

Examples


data(nciTable); data(idmap)
## what the input looks like:
head(nciTable)
#>           pathwayId                                           pathwayName
#> 1 pi3kplctrkpathway Trk receptor signaling mediated by PI3K and PLC-gamma
#> 2 pi3kplctrkpathway Trk receptor signaling mediated by PI3K and PLC-gamma
#> 3 pi3kplctrkpathway Trk receptor signaling mediated by PI3K and PLC-gamma
#> 4 pi3kplctrkpathway Trk receptor signaling mediated by PI3K and PLC-gamma
#> 5 pi3kplctrkpathway Trk receptor signaling mediated by PI3K and PLC-gamma
#> 6 pi3kplctrkpathway Trk receptor signaling mediated by PI3K and PLC-gamma
#>              gene
#> 1 ENSG00000140992
#> 2 ENSG00000196689
#> 3 ENSG00000142208
#> 4 ENSG00000145675
#> 5 ENSG00000138741
#> 6 ENSG00000152495
## create a SigObject. use the saveFile parameter for reuse.
nciH<-makeGPS(pathwayTable=load_data('nciTable'))
#> Time difference of 1.044506 secs
ils<-grep("^IL",idmap[,"Symbol"],value=TRUE)
ilnci<-sigora(queryList=ils,GPSrepo=nciH,level=3)
#> [1] "Mapped identifiers from Symbol  to  Ensembl.Gene.ID ..."
#>       pathwy.id                    description   pvalues Bonferroni successes
#> 1   il23pathway IL23-mediated signaling events 5.494e-64  1.049e-61     36.27
#> 2   il27pathway IL27-mediated signaling events 3.164e-34  6.043e-32     18.14
#> 3 il12_2pathway IL12-mediated signaling events 3.188e-12  6.089e-10     13.20
#> 4    il1pathway  IL1-mediated signaling events 1.115e-09  2.130e-07      8.42
#> 5  il4_2pathway  IL4-mediated signaling events 1.070e-05  2.044e-03      9.03
#>   PathwaySize        N sample.size
#> 1      172.95 46257.95       93.08
#> 2       65.51 46257.95       93.08
#> 3      420.16 46257.95       93.08
#> 4      156.05 46257.95       93.08
#> 5      687.89 46257.95       93.08