Yasui Biostatistics Research Team   


Yutaka Yasui

Team Members





SNP-SNP Interactions (Dinu et al., 2012)

SAM-GS (Dinu et al., 2007)


PBEE (Martinez et al., 2007)

LCT for multiple continuous phenotypes (Wang et al., 2014)

Cumulative Burden (Mean Cumulative Count) estimation (Dong et al., 2014)


SNP-SNP Interactions Discovered by Logic Regression

• Paper:

Irina Dinu, Surakameth Mahasirimongkol, Qi Liu, Hideki Yanai, Noha Sharaf Eldin, Erin Kreiter, Xuan Wu, Shahab Jabbari, Katsushi Tokunaga, and Yutaka Yasui, "SNP-SNP Interactions Discovered by Logic Regression Explain Crohn's Disease Genetics"   PLoS ONE 7(10): e43035. doi:10.1371/journal.pone.0043035.

• Software:

To obtain the code and example datasetes, please download from here 

SAM-GS: Significant Analysis of Microarrays for Gene Sets

Gene Set Analysis Software

• Paper:

Irina Dinu, John D. Potter, Thomas Mueller, Qi Liu, Adeniyi J. Adewale, Gian S. Jhangri, Gunilla Einecke, Konrad S. Famulski, Philip Halloran, and Yutaka Yasui, "Improving GSEA for Analysis of Biologic Pathways for Differential Gene Expression across a Binary Phenotype"   (January 2007). COBRA Preprint Series. Article 16.

Dinu I, Potter JD, Mueller T, Liu Q, Adewale AJ, Jhangri GS, Einecke G, Famulski KS, Halloran P, and Yasui Y. "Improving Gene Set Analysis of Microarray Data by SAM-GS", BMC Bioinformatics 2007, 8:242

Liu Q, Dinu I, Adewale AJ, Potter JD, and Yasui Y. "Comparative Evaluation of Gene-set Analysis Methods", BMC Bioinformatics 2007, 8:431

• Software:

To obtain the free Excel Add-In software (last modified on September 5, 2012), please download EdmontonMethods.rar and the Documentation.

SAM-GS first version was made available on May 28, 2007.

R is downloadable from http://cran.r-project.org/.

R code of SAM-GS  (programmed by Irina Dinu, modified and improved by Dr. Aurelien de Reynies at Ligue Nationale Contre le Cancer, Paris. Dr. Aurelien de Reynies kindly revised the code and we reposted on April 4, 2011).

Python version of SAM-GS (developed by Dr. Simone Leo at the Advanced Computing and Communications Program of CRS4, Italy).

R code of Linear Combination Test for Hierarchical Gene Set Analysis.  

• Example datasets:

Gene-expression dataset:       p53.csv     

Gene-set-definition dataset:  c2.csv     c2part1.csv    c2part2.csv    c2.v2.symbols.gmt

SAM-GSR: Significant Analysis of Microarrays for Gene Set Reduction

• Code

R code of SAM-GSR. (Please note this is not the R code of SAM-GS. Please see above for the R code of SAM-GS)

PBEE: Population Based Estimating Equation

Integrated Estimating Equations of Individual and Aggregated Health Data

• Paper:

Jose Miguel Martinez, Joan Benach, Josep Ginebra, Fernando G. Benavides, and Yutaka Yasui (2007) "An Integrated Analysis of Individual and Aggregated Health Data Using Estimating Equations", The International Journal of Biostatistics: Vol. 3: Iss. 1, Article 10.

• Three R programs used for the simulation work:

ECC.txt    NCC.txt    SCC.txt

LCT for multiple continuous phenotypes

• Paper:

Xiaoming Wang, Saumyadipta Pyne, and Irina Dinu, "Gene set enrichment analysis for multiple continuous phenotypes",   BMC Bioinformatics 2014, 15:260 doi:10.1186/1471-2105-15-260.

• Code:

R code for analyzing multiple continuous phenotypes using LCT method

Mean Cumulative Count

• Paper:

Dong H, Robison LL, Leisenring WM, Martin LJ, Armstrong GT, Yasui Y. Estimating the burden of recurrent events in the presence of competing risks: The method of mean cumulative count. American Journal of Epidemiology.

• Code:

R code for Cumulative Burden (Mean Cumulative Count) estimation: This version of code illustrated the difference between "sum of CumIs" and "MCC" that was dosucssed in the 7th and 8th paragraphs in the manuscript.

R code for Cumulative Burden estimation through "sum of CumIs" with 95% confidence interval: This version of code calcuetd MCC through"sum of CumIs", with the option to get 95% confidence intervals through bootstrap.

Example datasets used in the above code: "dat for MCC14.csv"     "smndata.csv"

SAS code for Cumulative Burden (Mean Cumulative Count) estimation

SAS code for Cumulative Incidence estimation with left truncation and right censoring & Example dataset