Are you over 18 and want to see adult content?
More Annotations
A complete backup of www.newindianexpress.com/nation/2020/feb/13/three-lawyers-injured-in-blast-on-lucknow-court-premises-two-mo
Are you over 18 and want to see adult content?
A complete backup of www.elfagr.com/3867688
Are you over 18 and want to see adult content?
A complete backup of www.almasryalyoum.com/news/details/1470735
Are you over 18 and want to see adult content?
Favourite Annotations
Custom Sticker Printing – Rockin’ Stickers
Are you over 18 and want to see adult content?
Ahorro Total: Tienda de Muebles Baratos - Muebles Online
Are you over 18 and want to see adult content?
Christian University, Online Christian University, Christian College Dallas
Are you over 18 and want to see adult content?
Home | Electronic Cigarette Vaporizers Houston
Are you over 18 and want to see adult content?
A complete backup of paulcarroll.com.au
Are you over 18 and want to see adult content?
BudgetPC Mt Waverley -Brand New, Refurbished and Used Desktop, Notebooks, Custom Built PC, Computer Parts
Are you over 18 and want to see adult content?
Imoplan Planejamento Imobiliário
Are you over 18 and want to see adult content?
Ahora Río Negro | Un sitio, toda la provincia
Are you over 18 and want to see adult content?
Text
SHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously.DESEQ2 COURSE WORK
AUTOENCODERS AND ANOMALY DETECTION WITH MACHINE LEARNING Autoencoders and anomaly detection with machine learning in fraud analytics. Tweet. 01 May 2017. All my previous posts on machine learning have dealt with supervised learning. But we can also use machine learning for unsupervised learning. The latter are e.g. used for clustering and (non-linear) dimensionality reduction. DEALING WITH UNBALANCED DATA IN MACHINE LEARNINGSEE MORE ONSHIRING.GITHUB.IO
DATA SCIENCE FOR BUSINESS NETWORK ANALYSIS OF GAME OF THRONES FAMILY TIES In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones. Not surprisingly, we learn that House Stark (specifically Ned and Sansa) and House Lannister (especially Tyrion) are the most important family connections in Game of Thrones; they also connect many of the storylines and are central parts of the narrative. PLOTTING TREES FROM RANDOM FOREST MODELS WITH GGRAPH Preparing the data and modeling. The data set I am using in these example analyses, is the Breast Cancer Wisconsin (Diagnostic) Dataset.The data was downloaded from the UC Irvine Machine Learning Repository.. The first data set looks at the predictor classes: FEATURE SELECTION IN MACHINE LEARNING (BREAST CANCER DATASETS) Feature Selection in Machine Learning (Breast Cancer Datasets) Machine learning uses so called features (i.e. variables or attributes) to generate predictive models. Using a suitable combination of features is essential for obtaining high precision and accuracy. Because too many (unspecific) features pose the problem of overfitting the model HOW TO BUILD A SHINY APP FOR DISEASE- & TRAIT-ASSOCIATED This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. The National Human Genome Research Institute (NHGRI) catalog of Genome-Wide Association Studies (GWAS) is a curatedresource of
R VS PYTHON
I’m an avid R user and rarely use anything else for data analysis and visualisations. But while R is my go-to, in some cases, Python might actually be a better alternative. That’s why I wanted to see how R and Python fare in a one-on-one comparison of an analysis that’s representative of what I would typically work with.SHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously.DESEQ2 COURSE WORK
AUTOENCODERS AND ANOMALY DETECTION WITH MACHINE LEARNING Autoencoders and anomaly detection with machine learning in fraud analytics. Tweet. 01 May 2017. All my previous posts on machine learning have dealt with supervised learning. But we can also use machine learning for unsupervised learning. The latter are e.g. used for clustering and (non-linear) dimensionality reduction. DEALING WITH UNBALANCED DATA IN MACHINE LEARNINGSEE MORE ONSHIRING.GITHUB.IO
DATA SCIENCE FOR BUSINESS NETWORK ANALYSIS OF GAME OF THRONES FAMILY TIES In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones. Not surprisingly, we learn that House Stark (specifically Ned and Sansa) and House Lannister (especially Tyrion) are the most important family connections in Game of Thrones; they also connect many of the storylines and are central parts of the narrative. PLOTTING TREES FROM RANDOM FOREST MODELS WITH GGRAPH Preparing the data and modeling. The data set I am using in these example analyses, is the Breast Cancer Wisconsin (Diagnostic) Dataset.The data was downloaded from the UC Irvine Machine Learning Repository.. The first data set looks at the predictor classes: FEATURE SELECTION IN MACHINE LEARNING (BREAST CANCER DATASETS) Feature Selection in Machine Learning (Breast Cancer Datasets) Machine learning uses so called features (i.e. variables or attributes) to generate predictive models. Using a suitable combination of features is essential for obtaining high precision and accuracy. Because too many (unspecific) features pose the problem of overfitting the model HOW TO BUILD A SHINY APP FOR DISEASE- & TRAIT-ASSOCIATED This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. The National Human Genome Research Institute (NHGRI) catalog of Genome-Wide Association Studies (GWAS) is a curatedresource of
R VS PYTHON
I’m an avid R user and rarely use anything else for data analysis and visualisations. But while R is my go-to, in some cases, Python might actually be a better alternative. That’s why I wanted to see how R and Python fare in a one-on-one comparison of an analysis that’s representative of what I would typically work with.SHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously.ABOUT ME
Welcome to my page! I’m Shirin, a biologist turned bioinformatician turned data scientist. I’m especially interested in machine learning and data visualization. DATA SCIENCE FOR BUSINESS Training and test data. My input data is the tibble retail_p_day, that was created in my last post.. I am splitting this dataset into training (all data points before/on Nov. 1st 2011) and test samples (all data points after Nov. 1st 2011). DATA ON TOUR: PLOTTING 3D MAPS AND LOCATION TRACKS Hiking tracks. The hiking tracks we followed came mostly from a German hiking guide-book, the Rother Wanderführer, 7th edition from 2016.They were in standard .gpx format and could be read with readGPX().. Only one of our hikes did not come from this book, but from Wikiloc.It could be treated the same way as the other hiking tracks, though, so I combined all hiking tracks. CHARACTERIZING TWITTER FOLLOWERS WITH TIDYTEXT Now, we can access information from Twitter, like timeline tweets, user timelines, mentions, tweets & retweets, followers, etc. All the following datasets were retrieved on June 7th 2017, converted to a data frame for tidy analysis and saved for later use:R VS PYTHON
I’m an avid R user and rarely use anything else for data analysis and visualisations. But while R is my go-to, in some cases, Python might actually be a better alternative. That’s why I wanted to see how R and Python fare in a one-on-one comparison of an analysis that’s representative of what I would typically work with. CREATING A NETWORK OF HUMAN GENE HOMOLOGY WITH R AND D3 Identifying human gene homologs Protein coding genes. The majority of human genes are protein coding genes. This means that their DNA sequence will be translated into a protein with specific cellularfunctions.
HOW TO MAP YOUR GOOGLE LOCATION HISTORY WITH R ## timestampMs latitudeE7 longitudeE7 accuracy activitys ## 1 1482393378938 519601402 76004708 29 NULL ## 2 1482393333953 519601402 76004708 29 NULL ## 3 1482393033893 519603616 76002628 20 1482393165600, still, 100 ## 4 1482392814435 519603684 76001572 20 1482392817678, still, 100 ## 5 1482392734911 519603684 76001572 20NULL ## 6
SOCIAL NETWORK ANALYSIS AND TOPIC MODELING OF CODECENTRIC I have written the following post about Social Network Analysis and Topic Modeling of codecentric’ s Twitter friends and followers for codecentric’s blog:. Recently, Matthias Radtke has written a very nice blog post on Topic Modeling of the codecentric Blog Articles, where he is giving a comprehensive introduction to Topic Modeling. CAN WE PREDICT FLU DEATHS WITH MACHINE LEARNING AND R? Among the many R packages, there is the outbreaks package. It contains datasets on epidemics, on of which is from the 2013 outbreak of influenza A H7N9 in China, asSHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously.DESEQ2 COURSE WORK
EXPRANALYSIS PACKAGE DEALING WITH UNBALANCED DATA IN MACHINE LEARNINGSEE MORE ONSHIRING.GITHUB.IO
DATA SCIENCE FOR BUSINESS DATA SCIENCE FOR BUSINESS NETWORK ANALYSIS OF GAME OF THRONES FAMILY TIES In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones. Not surprisingly, we learn that House Stark (specifically Ned and Sansa) and House Lannister (especially Tyrion) are the most important family connections in Game of Thrones; they also connect many of the storylines and are central parts of the narrative. PLOTTING TREES FROM RANDOM FOREST MODELS WITH GGRAPH Preparing the data and modeling. The data set I am using in these example analyses, is the Breast Cancer Wisconsin (Diagnostic) Dataset.The data was downloaded from the UC Irvine Machine Learning Repository.. The first data set looks at the predictor classes: CONDITIONAL GGPLOT2 GEOMS IN FUNCTIONS (QTL PLOTS) The first example uses the hyper data set and builds a simple QTL model with three modeling functions: the EM algorithm, Haley-Knott regression and multiple imputation. The genome wide LOD threshold is calculated with permutation. Feeding this LOD threshold into the summary output gives us the markers with a significant phenotype association (i.e. the QTL). HOW TO BUILD A SHINY APP FOR DISEASE- & TRAIT-ASSOCIATED This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. The National Human Genome Research Institute (NHGRI) catalog of Genome-Wide Association Studies (GWAS) is a curatedresource of
SHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously.DESEQ2 COURSE WORK
EXPRANALYSIS PACKAGE DEALING WITH UNBALANCED DATA IN MACHINE LEARNINGSEE MORE ONSHIRING.GITHUB.IO
DATA SCIENCE FOR BUSINESS DATA SCIENCE FOR BUSINESS NETWORK ANALYSIS OF GAME OF THRONES FAMILY TIES In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones. Not surprisingly, we learn that House Stark (specifically Ned and Sansa) and House Lannister (especially Tyrion) are the most important family connections in Game of Thrones; they also connect many of the storylines and are central parts of the narrative. PLOTTING TREES FROM RANDOM FOREST MODELS WITH GGRAPH Preparing the data and modeling. The data set I am using in these example analyses, is the Breast Cancer Wisconsin (Diagnostic) Dataset.The data was downloaded from the UC Irvine Machine Learning Repository.. The first data set looks at the predictor classes: CONDITIONAL GGPLOT2 GEOMS IN FUNCTIONS (QTL PLOTS) The first example uses the hyper data set and builds a simple QTL model with three modeling functions: the EM algorithm, Haley-Knott regression and multiple imputation. The genome wide LOD threshold is calculated with permutation. Feeding this LOD threshold into the summary output gives us the markers with a significant phenotype association (i.e. the QTL). HOW TO BUILD A SHINY APP FOR DISEASE- & TRAIT-ASSOCIATED This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. The National Human Genome Research Institute (NHGRI) catalog of Genome-Wide Association Studies (GWAS) is a curatedresource of
SHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. CATEGORIES - GITHUB PAGES Dealing with unbalanced data in machine learning. Building meaningful machine learning models for disease prediction. Plotting trees from Random Forest models with ggraph. Hyper-parameter Tuning with Grid Search for Deep Learning. Building deep neural nets with h2o and rsparkling that predict arrhythmia of DATA SCIENCE FOR BUSINESS Training and test data. My input data is the tibble retail_p_day, that was created in my last post.. I am splitting this dataset into training (all data points before/on Nov. 1st 2011) and test samples (all data points after Nov. 1st 2011). EXPLORING THE HUMAN GENOME (PART 1) The narrow traditional definition of a gene is that it is a hereditary unit of information, which meant that it is a unit of DNA which encodes for the production of a protein. The Human Genome Project has estimated that the human genome comprises 20000 to 25000 genes. However, if we take the definition of gene more liberally, we couldalso
FEATURE SELECTION IN MACHINE LEARNING (BREAST CANCER DATASETS) Feature Selection in Machine Learning (Breast Cancer Datasets) Machine learning uses so called features (i.e. variables or attributes) to generate predictive models. Using a suitable combination of features is essential for obtaining high precision and accuracy. Because too many (unspecific) features pose the problem of overfitting the model MIGRATING FROM GITHUB TO GITLAB WITH RSTUDIO (TUTORIAL) GitHub vs. GitLab. Git is a distributed implementation of version control. Many people have written very eloquently about why it is a good idea to use version control, not only if you collaborate in a team but also if you work on your own; one example is this article from RStudio’s Support pages.. In short, its main feature is that version control allows you to keep track of the changes you DATA SCIENCE FOR FRAUD DETECTION I have written the following post about Data Science for Fraud Detection at my company codecentric’s blog:. Fraud can be defined as “the crime of getting money by deceiving people” (Cambridge Dictionary); it is as old as humanity: whenever two parties exchange goods or conduct business there is the potential for one partyscamming the other.
HYPER-PARAMETER TUNING WITH GRID SEARCH FOR DEEP LEARNING Hyper-parameter tuning with grid search allows us to test different combinations of hyper-parameters and find one with improved accuracy. Keep in mind though, that hyper-parameter tuning can only improve the model so much without overfitting. If you can’t achieve sufficient accuracy, the input features might simply not be adequate for the EXPLORE PREDICTIVE MAINTENANCE WITH FLEXDASHBOARD I have written the following post about Predictive Maintenance and flexdashboard at my company codecentric’s blog:. Predictive Maintenance is an increasingly popular strategy associated with Industry 4.0; it uses advanced analytics and machine learning to optimize machine costs and output (see Google Trends plot below). EXTREME GRADIENT BOOSTING AND PREPROCESSING IN MACHINE In last week’s post I explored whether machine learning models can be applied to predict flu deaths from the 2013 outbreak of influenza A H7N9 in China. There, I compared random forests, elastic-net regularized generalized linear models, k-nearest neighbors, penalized discriminant analysis, stabilized linear discriminant analysis, nearest shrunken centroids, single C5.0 tree and partialSHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. NETWORK ANALYSIS OF GAME OF THRONES FAMILY TIES In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones. Not surprisingly, we learn that House Stark (specifically Ned and Sansa) and House Lannister (especially Tyrion) are the most important family connections in Game of Thrones; they also connect many of the storylines and are central parts of the narrative. DEALING WITH UNBALANCED DATA IN MACHINE LEARNINGSEE MORE ONSHIRING.GITHUB.IO
DATA SCIENCE FOR BUSINESS DATA SCIENCE FOR BUSINESS PLOTTING TREES FROM RANDOM FOREST MODELS WITH GGRAPH Preparing the data and modeling. The data set I am using in these example analyses, is the Breast Cancer Wisconsin (Diagnostic) Dataset.The data was downloaded from the UC Irvine Machine Learning Repository.. The first data set looks at the predictor classes: AUTOENCODERS AND ANOMALY DETECTION WITH MACHINE LEARNING Autoencoders and anomaly detection with machine learning in fraud analytics. Tweet. 01 May 2017. All my previous posts on machine learning have dealt with supervised learning. But we can also use machine learning for unsupervised learning. The latter are e.g. used for clustering and (non-linear) dimensionality reduction. FEATURE SELECTION IN MACHINE LEARNING (BREAST CANCER DATASETS) Feature Selection in Machine Learning (Breast Cancer Datasets) Machine learning uses so called features (i.e. variables or attributes) to generate predictive models. Using a suitable combination of features is essential for obtaining high precision and accuracy. Because too many (unspecific) features pose the problem of overfitting the model HOW TO BUILD A SHINY APP FOR DISEASE- & TRAIT-ASSOCIATED This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. The National Human Genome Research Institute (NHGRI) catalog of Genome-Wide Association Studies (GWAS) is a curatedresource of
R VS PYTHON
I’m an avid R user and rarely use anything else for data analysis and visualisations. But while R is my go-to, in some cases, Python might actually be a better alternative. That’s why I wanted to see how R and Python fare in a one-on-one comparison of an analysis that’s representative of what I would typically work with.SHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. NETWORK ANALYSIS OF GAME OF THRONES FAMILY TIES In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones. Not surprisingly, we learn that House Stark (specifically Ned and Sansa) and House Lannister (especially Tyrion) are the most important family connections in Game of Thrones; they also connect many of the storylines and are central parts of the narrative. DEALING WITH UNBALANCED DATA IN MACHINE LEARNINGSEE MORE ONSHIRING.GITHUB.IO
DATA SCIENCE FOR BUSINESS DATA SCIENCE FOR BUSINESS PLOTTING TREES FROM RANDOM FOREST MODELS WITH GGRAPH Preparing the data and modeling. The data set I am using in these example analyses, is the Breast Cancer Wisconsin (Diagnostic) Dataset.The data was downloaded from the UC Irvine Machine Learning Repository.. The first data set looks at the predictor classes: AUTOENCODERS AND ANOMALY DETECTION WITH MACHINE LEARNING Autoencoders and anomaly detection with machine learning in fraud analytics. Tweet. 01 May 2017. All my previous posts on machine learning have dealt with supervised learning. But we can also use machine learning for unsupervised learning. The latter are e.g. used for clustering and (non-linear) dimensionality reduction. FEATURE SELECTION IN MACHINE LEARNING (BREAST CANCER DATASETS) Feature Selection in Machine Learning (Breast Cancer Datasets) Machine learning uses so called features (i.e. variables or attributes) to generate predictive models. Using a suitable combination of features is essential for obtaining high precision and accuracy. Because too many (unspecific) features pose the problem of overfitting the model HOW TO BUILD A SHINY APP FOR DISEASE- & TRAIT-ASSOCIATED This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. The National Human Genome Research Institute (NHGRI) catalog of Genome-Wide Association Studies (GWAS) is a curatedresource of
R VS PYTHON
I’m an avid R user and rarely use anything else for data analysis and visualisations. But while R is my go-to, in some cases, Python might actually be a better alternative. That’s why I wanted to see how R and Python fare in a one-on-one comparison of an analysis that’s representative of what I would typically work with.DESEQ2 COURSE WORK
DESeq2 Course Work. Tweet. 29 September 2016. The following workflow has been designed as teaching instructions for an introductory course to RNA-seq data analysis with DESeq2. The course is designed for PhD students and will be given at the University of Münster from 10th to 21st of October 2016. For questions or other comments, please EXPRANALYSIS PACKAGE exprAnalysis package. I created the R package exprAnalysis designed to streamline my RNA-seq data analysis pipeline. Below you find the vignette for installation and usage of the package. This package combines functions from various packages used to analyze and visualize expression data from NGS or expression chips. EXPLORING THE HUMAN GENOME (PART 1) The narrow traditional definition of a gene is that it is a hereditary unit of information, which meant that it is a unit of DNA which encodes for the production of a protein. The Human Genome Project has estimated that the human genome comprises 20000 to 25000 genes. However, if we take the definition of gene more liberally, we couldalso
R VS PYTHON
I’m an avid R user and rarely use anything else for data analysis and visualisations. But while R is my go-to, in some cases, Python might actually be a better alternative. That’s why I wanted to see how R and Python fare in a one-on-one comparison of an analysis that’s representative of what I would typically work with. BUILDING DEEP NEURAL NETS WITH H2O AND RSPARKLING THAT The R package h2o provides a convenient interface to H2O, which is an open-source machine learning and deep learning platform. H2O can be integrated with Apache Spark ( Sparkling Water) and therefore allows the implementation of complex or big models in a fast and scalable manner. H2O distributes a wide range of common machine learning CHARACTERIZING TWITTER FOLLOWERS WITH TIDYTEXT Now, we can access information from Twitter, like timeline tweets, user timelines, mentions, tweets & retweets, followers, etc. All the following datasets were retrieved on June 7th 2017, converted to a data frame for tidy analysis and saved for later use: HOW TO BUILD A SHINY APP FOR DISEASE- & TRAIT-ASSOCIATED This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. The National Human Genome Research Institute (NHGRI) catalog of Genome-Wide Association Studies (GWAS) is a curatedresource of
BUILDING MEANINGFUL MACHINE LEARNING MODELS FOR DISEASE Webinar for the ISDS R Group. This document presents the code I used to produce the example analysis and figures shown in my webinar on building meaningful machine learning models for disease prediction. SCRATCHING THE SURFACE OF GENDER BIASES The world map. The map has been downloaded from the Natural Earth Data website.The country borders were reduced by 200 meters with ArcGIS Pro, so that clicking within any country on the map would show the corresponding country’s border as the nearest point. ArcGIS Pro was also used to convert the map to Mercator projection.The changed shapefiles can be downloaded from my Github SOCIAL NETWORK ANALYSIS AND TOPIC MODELING OF CODECENTRIC I have written the following post about Social Network Analysis and Topic Modeling of codecentric’ s Twitter friends and followers for codecentric’s blog:. Recently, Matthias Radtke has written a very nice blog post on Topic Modeling of the codecentric Blog Articles, where he is giving a comprehensive introduction to Topic Modeling.SHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. DATA SCIENCE FOR BUSINESS DATA SCIENCE FOR BUSINESSDESEQ2 COURSE WORK
EXPRANALYSIS PACKAGE DEALING WITH UNBALANCED DATA IN MACHINE LEARNINGSEE MORE ONSHIRING.GITHUB.IO
NETWORK ANALYSIS OF GAME OF THRONES FAMILY TIES In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones. Not surprisingly, we learn that House Stark (specifically Ned and Sansa) and House Lannister (especially Tyrion) are the most important family connections in Game of Thrones; they also connect many of the storylines and are central parts of the narrative. PLOTTING TREES FROM RANDOM FOREST MODELS WITH GGRAPH Preparing the data and modeling. The data set I am using in these example analyses, is the Breast Cancer Wisconsin (Diagnostic) Dataset.The data was downloaded from the UC Irvine Machine Learning Repository.. The first data set looks at the predictor classes: HOW TO BUILD A SHINY APP FOR DISEASE- & TRAIT-ASSOCIATED This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. The National Human Genome Research Institute (NHGRI) catalog of Genome-Wide Association Studies (GWAS) is a curatedresource of
HOW TO MAP YOUR GOOGLE LOCATION HISTORY WITH R ## timestampMs latitudeE7 longitudeE7 accuracy activitys ## 1 1482393378938 519601402 76004708 29 NULL ## 2 1482393333953 519601402 76004708 29 NULL ## 3 1482393033893 519603616 76002628 20 1482393165600, still, 100 ## 4 1482392814435 519603684 76001572 20 1482392817678, still, 100 ## 5 1482392734911 519603684 76001572 20NULL ## 6
SHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. DATA SCIENCE FOR BUSINESS DATA SCIENCE FOR BUSINESSDESEQ2 COURSE WORK
EXPRANALYSIS PACKAGE DEALING WITH UNBALANCED DATA IN MACHINE LEARNINGSEE MORE ONSHIRING.GITHUB.IO
NETWORK ANALYSIS OF GAME OF THRONES FAMILY TIES In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones. Not surprisingly, we learn that House Stark (specifically Ned and Sansa) and House Lannister (especially Tyrion) are the most important family connections in Game of Thrones; they also connect many of the storylines and are central parts of the narrative. PLOTTING TREES FROM RANDOM FOREST MODELS WITH GGRAPH Preparing the data and modeling. The data set I am using in these example analyses, is the Breast Cancer Wisconsin (Diagnostic) Dataset.The data was downloaded from the UC Irvine Machine Learning Repository.. The first data set looks at the predictor classes: HOW TO BUILD A SHINY APP FOR DISEASE- & TRAIT-ASSOCIATED This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. The National Human Genome Research Institute (NHGRI) catalog of Genome-Wide Association Studies (GWAS) is a curatedresource of
HOW TO MAP YOUR GOOGLE LOCATION HISTORY WITH R ## timestampMs latitudeE7 longitudeE7 accuracy activitys ## 1 1482393378938 519601402 76004708 29 NULL ## 2 1482393333953 519601402 76004708 29 NULL ## 3 1482393033893 519603616 76002628 20 1482393165600, still, 100 ## 4 1482392814435 519603684 76001572 20 1482392817678, still, 100 ## 5 1482392734911 519603684 76001572 20NULL ## 6
SHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously.ABOUT ME
Welcome to my page! I’m Shirin, a biologist turned bioinformatician turned data scientist. I’m especially interested in machine learning and data visualization. ARCHIVE - SHIRING.GITHUB.IO May 28, 2017 » Data Science for Business - Time Series Forecasting Part 1: EDA & Data Preparation. May 20, 2017 » New R Users group in Münster! May 15, 2017 » Network analysis of Game of Thrones family ties. May 2, 2017 » Update to autoencoders and anomaly detection withmachine learning in
CATEGORIES - GITHUB PAGES Dealing with unbalanced data in machine learning. Building meaningful machine learning models for disease prediction. Plotting trees from Random Forest models with ggraph. Hyper-parameter Tuning with Grid Search for Deep Learning. Building deep neural nets with h2o and rsparkling that predict arrhythmia of MIGRATING FROM GITHUB TO GITLAB WITH RSTUDIO (TUTORIAL) GitHub vs. GitLab. Git is a distributed implementation of version control. Many people have written very eloquently about why it is a good idea to use version control, not only if you collaborate in a team but also if you work on your own; one example is this article from RStudio’s Support pages.. In short, its main feature is that version control allows you to keep track of the changes you CONDITIONAL GGPLOT2 GEOMS IN FUNCTIONS (QTL PLOTS) The first example uses the hyper data set and builds a simple QTL model with three modeling functions: the EM algorithm, Haley-Knott regression and multiple imputation. The genome wide LOD threshold is calculated with permutation. Feeding this LOD threshold into the summary output gives us the markers with a significant phenotype association (i.e. the QTL). EXPLORING THE HUMAN GENOME (PART 1) The narrow traditional definition of a gene is that it is a hereditary unit of information, which meant that it is a unit of DNA which encodes for the production of a protein. The Human Genome Project has estimated that the human genome comprises 20000 to 25000 genes. However, if we take the definition of gene more liberally, we couldalso
FEATURE SELECTION IN MACHINE LEARNING (BREAST CANCER DATASETS) Feature Selection in Machine Learning (Breast Cancer Datasets) Machine learning uses so called features (i.e. variables or attributes) to generate predictive models. Using a suitable combination of features is essential for obtaining high precision and accuracy. Because too many (unspecific) features pose the problem of overfitting the model BUILDING DEEP NEURAL NETS WITH H2O AND RSPARKLING THAT The R package h2o provides a convenient interface to H2O, which is an open-source machine learning and deep learning platform. H2O can be integrated with Apache Spark ( Sparkling Water) and therefore allows the implementation of complex or big models in a fast and scalable manner. H2O distributes a wide range of common machine learning EXPLORE PREDICTIVE MAINTENANCE WITH FLEXDASHBOARD I have written the following post about Predictive Maintenance and flexdashboard at my company codecentric’s blog:. Predictive Maintenance is an increasingly popular strategy associated with Industry 4.0; it uses advanced analytics and machine learning to optimize machine costs and output (see Google Trends plot below).SHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. DATA SCIENCE FOR BUSINESS DATA SCIENCE FOR BUSINESSDESEQ2 COURSE WORK
DEALING WITH UNBALANCED DATA IN MACHINE LEARNINGSEE MORE ONSHIRING.GITHUB.IO
NETWORK ANALYSIS OF GAME OF THRONES FAMILY TIES In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones. Not surprisingly, we learn that House Stark (specifically Ned and Sansa) and House Lannister (especially Tyrion) are the most important family connections in Game of Thrones; they also connect many of the storylines and are central parts of the narrative. FEATURE SELECTION IN MACHINE LEARNING (BREAST CANCER DATASETS) Feature Selection in Machine Learning (Breast Cancer Datasets) Machine learning uses so called features (i.e. variables or attributes) to generate predictive models. Using a suitable combination of features is essential for obtaining high precision and accuracy. Because too many (unspecific) features pose the problem of overfitting the model CONDITIONAL GGPLOT2 GEOMS IN FUNCTIONS (QTL PLOTS) The first example uses the hyper data set and builds a simple QTL model with three modeling functions: the EM algorithm, Haley-Knott regression and multiple imputation. The genome wide LOD threshold is calculated with permutation. Feeding this LOD threshold into the summary output gives us the markers with a significant phenotype association (i.e. the QTL). HOW TO BUILD A SHINY APP FOR DISEASE- & TRAIT-ASSOCIATED This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. The National Human Genome Research Institute (NHGRI) catalog of Genome-Wide Association Studies (GWAS) is a curatedresource of
HOW TO MAP YOUR GOOGLE LOCATION HISTORY WITH R ## timestampMs latitudeE7 longitudeE7 accuracy activitys ## 1 1482393378938 519601402 76004708 29 NULL ## 2 1482393333953 519601402 76004708 29 NULL ## 3 1482393033893 519603616 76002628 20 1482393165600, still, 100 ## 4 1482392814435 519603684 76001572 20 1482392817678, still, 100 ## 5 1482392734911 519603684 76001572 20NULL ## 6
SHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. DATA SCIENCE FOR BUSINESS DATA SCIENCE FOR BUSINESSDESEQ2 COURSE WORK
DEALING WITH UNBALANCED DATA IN MACHINE LEARNINGSEE MORE ONSHIRING.GITHUB.IO
NETWORK ANALYSIS OF GAME OF THRONES FAMILY TIES In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones. Not surprisingly, we learn that House Stark (specifically Ned and Sansa) and House Lannister (especially Tyrion) are the most important family connections in Game of Thrones; they also connect many of the storylines and are central parts of the narrative. FEATURE SELECTION IN MACHINE LEARNING (BREAST CANCER DATASETS) Feature Selection in Machine Learning (Breast Cancer Datasets) Machine learning uses so called features (i.e. variables or attributes) to generate predictive models. Using a suitable combination of features is essential for obtaining high precision and accuracy. Because too many (unspecific) features pose the problem of overfitting the model CONDITIONAL GGPLOT2 GEOMS IN FUNCTIONS (QTL PLOTS) The first example uses the hyper data set and builds a simple QTL model with three modeling functions: the EM algorithm, Haley-Knott regression and multiple imputation. The genome wide LOD threshold is calculated with permutation. Feeding this LOD threshold into the summary output gives us the markers with a significant phenotype association (i.e. the QTL). HOW TO BUILD A SHINY APP FOR DISEASE- & TRAIT-ASSOCIATED This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. The National Human Genome Research Institute (NHGRI) catalog of Genome-Wide Association Studies (GWAS) is a curatedresource of
HOW TO MAP YOUR GOOGLE LOCATION HISTORY WITH R ## timestampMs latitudeE7 longitudeE7 accuracy activitys ## 1 1482393378938 519601402 76004708 29 NULL ## 2 1482393333953 519601402 76004708 29 NULL ## 3 1482393033893 519603616 76002628 20 1482393165600, still, 100 ## 4 1482392814435 519603684 76001572 20 1482392817678, still, 100 ## 5 1482392734911 519603684 76001572 20NULL ## 6
SHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously.ABOUT ME
Welcome to my page! I’m Shirin, a biologist turned bioinformatician turned data scientist. I’m especially interested in machine learning and data visualization. ARCHIVE - SHIRING.GITHUB.IO May 28, 2017 » Data Science for Business - Time Series Forecasting Part 1: EDA & Data Preparation. May 20, 2017 » New R Users group in Münster! May 15, 2017 » Network analysis of Game of Thrones family ties. May 2, 2017 » Update to autoencoders and anomaly detection withmachine learning in
CATEGORIES - GITHUB PAGES Dealing with unbalanced data in machine learning. Building meaningful machine learning models for disease prediction. Plotting trees from Random Forest models with ggraph. Hyper-parameter Tuning with Grid Search for Deep Learning. Building deep neural nets with h2o and rsparkling that predict arrhythmia of EXPLORING THE HUMAN GENOME (PART 1) The narrow traditional definition of a gene is that it is a hereditary unit of information, which meant that it is a unit of DNA which encodes for the production of a protein. The Human Genome Project has estimated that the human genome comprises 20000 to 25000 genes. However, if we take the definition of gene more liberally, we couldalso
FEATURE SELECTION IN MACHINE LEARNING (BREAST CANCER DATASETS) Feature Selection in Machine Learning (Breast Cancer Datasets) Machine learning uses so called features (i.e. variables or attributes) to generate predictive models. Using a suitable combination of features is essential for obtaining high precision and accuracy. Because too many (unspecific) features pose the problem of overfitting the model PLOTTING TREES FROM RANDOM FOREST MODELS WITH GGRAPH Today, I want to show how I use Thomas Lin Pedersen’s awesome ggraph package to plot decision trees from Random Forest models.. I am very much a visual person, so I try to plot as much of my results as possible because it helps me get a better feel for what is going onwith my data.
HYPER-PARAMETER TUNING WITH GRID SEARCH FOR DEEP LEARNING Hyper-parameter tuning with grid search allows us to test different combinations of hyper-parameters and find one with improved accuracy. Keep in mind though, that hyper-parameter tuning can only improve the model so much without overfitting. If you can’t achieve sufficient accuracy, the input features might simply not be adequate for the EXPLAINING COMPLEX MACHINE LEARNING MODELS WITH LIME HowLIMEworks 1. Permutationofeachtestcasetoexplain 2. Complexmodelpredictsallpermutedtestcases 3. Distancebetweenpermutationsandoriginaltextcaseis EXPLORE PREDICTIVE MAINTENANCE WITH FLEXDASHBOARD I have written the following post about Predictive Maintenance and flexdashboard at my company codecentric’s blog:. Predictive Maintenance is an increasingly popular strategy associated with Industry 4.0; it uses advanced analytics and machine learning to optimize machine costs and output (see Google Trends plot below).SHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously.ABOUT ME
Welcome to my page! I’m Shirin, a biologist turned bioinformatician turned data scientist. I’m especially interested in machine learning and data visualization. DATA SCIENCE FOR BUSINESS NETWORK ANALYSIS OF GAME OF THRONES FAMILY TIESALL GAME OF THRONES BOOKSGAME OF THRONES BOOK DOWNLOADGAME OF THRONES BOOK ONEGAMES OF THRONES BOOKS LISTNEW GAME OF THRONES BOOKTHE GAME OF THRONES BOOKS In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones. Not surprisingly, we learn that House Stark (specifically Ned and Sansa) and House Lannister (especially Tyrion) are the most important family connections in Game of Thrones; they also connect many of the storylines and are central parts of the narrative. DEALING WITH UNBALANCED DATA IN MACHINE LEARNINGSEE MORE ONSHIRING.GITHUB.IO
PLOTTING TREES FROM RANDOM FOREST MODELS WITH GGRAPH Preparing the data and modeling. The data set I am using in these example analyses, is the Breast Cancer Wisconsin (Diagnostic) Dataset.The data was downloaded from the UC Irvine Machine Learning Repository.. The first data set looks at the predictor classes:DESEQ2 COURSE WORK
AUTOENCODERS AND ANOMALY DETECTION WITH MACHINE LEARNINGANOMALY DETECTION ALGORITHMSANOMALY DETECTION TECHNIQUESKERAS ANOMALYDETECTION
Autoencoders and anomaly detection with machine learning in fraud analytics. Tweet. 01 May 2017. All my previous posts on machine learning have dealt with supervised learning. But we can also use machine learning for unsupervised learning. The latter are e.g. used for clustering and (non-linear) dimensionality reduction. HOW TO BUILD A SHINY APP FOR DISEASE- & TRAIT-ASSOCIATED This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. The National Human Genome Research Institute (NHGRI) catalog of Genome-Wide Association Studies (GWAS) is a curatedresource of
HOW TO MAP YOUR GOOGLE LOCATION HISTORY WITH R ## timestampMs latitudeE7 longitudeE7 accuracy activitys ## 1 1482393378938 519601402 76004708 29 NULL ## 2 1482393333953 519601402 76004708 29 NULL ## 3 1482393033893 519603616 76002628 20 1482393165600, still, 100 ## 4 1482392814435 519603684 76001572 20 1482392817678, still, 100 ## 5 1482392734911 519603684 76001572 20NULL ## 6
SHIRIN'S PLAYGROUND
18 Dec 2016 » How to build a Shiny app for disease- & trait-associated locations of the human genome. This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously.ABOUT ME
Welcome to my page! I’m Shirin, a biologist turned bioinformatician turned data scientist. I’m especially interested in machine learning and data visualization. DATA SCIENCE FOR BUSINESS NETWORK ANALYSIS OF GAME OF THRONES FAMILY TIESALL GAME OF THRONES BOOKSGAME OF THRONES BOOK DOWNLOADGAME OF THRONES BOOK ONEGAMES OF THRONES BOOKS LISTNEW GAME OF THRONES BOOKTHE GAME OF THRONES BOOKS In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones. Not surprisingly, we learn that House Stark (specifically Ned and Sansa) and House Lannister (especially Tyrion) are the most important family connections in Game of Thrones; they also connect many of the storylines and are central parts of the narrative. DEALING WITH UNBALANCED DATA IN MACHINE LEARNINGSEE MORE ONSHIRING.GITHUB.IO
PLOTTING TREES FROM RANDOM FOREST MODELS WITH GGRAPH Preparing the data and modeling. The data set I am using in these example analyses, is the Breast Cancer Wisconsin (Diagnostic) Dataset.The data was downloaded from the UC Irvine Machine Learning Repository.. The first data set looks at the predictor classes:DESEQ2 COURSE WORK
AUTOENCODERS AND ANOMALY DETECTION WITH MACHINE LEARNINGANOMALY DETECTION ALGORITHMSANOMALY DETECTION TECHNIQUESKERAS ANOMALYDETECTION
Autoencoders and anomaly detection with machine learning in fraud analytics. Tweet. 01 May 2017. All my previous posts on machine learning have dealt with supervised learning. But we can also use machine learning for unsupervised learning. The latter are e.g. used for clustering and (non-linear) dimensionality reduction. HOW TO BUILD A SHINY APP FOR DISEASE- & TRAIT-ASSOCIATED This app is based on the gwascat R package and its ebicat38 database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously. The National Human Genome Research Institute (NHGRI) catalog of Genome-Wide Association Studies (GWAS) is a curatedresource of
HOW TO MAP YOUR GOOGLE LOCATION HISTORY WITH R ## timestampMs latitudeE7 longitudeE7 accuracy activitys ## 1 1482393378938 519601402 76004708 29 NULL ## 2 1482393333953 519601402 76004708 29 NULL ## 3 1482393033893 519603616 76002628 20 1482393165600, still, 100 ## 4 1482392814435 519603684 76001572 20 1482392817678, still, 100 ## 5 1482392734911 519603684 76001572 20NULL ## 6
ARCHIVE - SHIRING.GITHUB.IO May 28, 2017 » Data Science for Business - Time Series Forecasting Part 1: EDA & Data Preparation. May 20, 2017 » New R Users group in Münster! May 15, 2017 » Network analysis of Game of Thrones family ties. May 2, 2017 » Update to autoencoders and anomaly detection withmachine learning in
CATEGORIES - GITHUB PAGES Dealing with unbalanced data in machine learning. Building meaningful machine learning models for disease prediction. Plotting trees from Random Forest models with ggraph. Hyper-parameter Tuning with Grid Search for Deep Learning. Building deep neural nets with h2o and rsparkling that predict arrhythmia of AUTOENCODERS AND ANOMALY DETECTION WITH MACHINE LEARNING Autoencoders and anomaly detection with machine learning in fraud analytics. Tweet. 01 May 2017. All my previous posts on machine learning have dealt with supervised learning. But we can also use machine learning for unsupervised learning. The latter are e.g. used for clustering and (non-linear) dimensionality reduction. DATA ON TOUR: PLOTTING 3D MAPS AND LOCATION TRACKS Hiking tracks. The hiking tracks we followed came mostly from a German hiking guide-book, the Rother Wanderführer, 7th edition from 2016.They were in standard .gpx format and could be read with readGPX().. Only one of our hikes did not come from this book, but from Wikiloc.It could be treated the same way as the other hiking tracks, though, so I combined all hiking tracks. EXPLORING THE HUMAN GENOME (PART 1) The narrow traditional definition of a gene is that it is a hereditary unit of information, which meant that it is a unit of DNA which encodes for the production of a protein. The Human Genome Project has estimated that the human genome comprises 20000 to 25000 genes. However, if we take the definition of gene more liberally, we couldalso
DATA SCIENCE FOR BUSINESS Training and test data. My input data is the tibble retail_p_day, that was created in my last post.. I am splitting this dataset into training (all data points before/on Nov. 1st 2011) and test samples (all data points after Nov. 1st 2011). FEATURE SELECTION IN MACHINE LEARNING (BREAST CANCER DATASETS) Feature Selection in Machine Learning (Breast Cancer Datasets) Machine learning uses so called features (i.e. variables or attributes) to generate predictive models. Using a suitable combination of features is essential for obtaining high precision and accuracy. Because too many (unspecific) features pose the problem of overfitting the model BUILDING DEEP NEURAL NETS WITH H2O AND RSPARKLING THAT The R package h2o provides a convenient interface to H2O, which is an open-source machine learning and deep learning platform. H2O can be integrated with Apache Spark ( Sparkling Water) and therefore allows the implementation of complex or big models in a fast and scalable manner. H2O distributes a wide range of common machine learningR VS PYTHON
I’m an avid R user and rarely use anything else for data analysis and visualisations. But while R is my go-to, in some cases, Python might actually be a better alternative. That’s why I wanted to see how R and Python fare in a one-on-one comparison of an analysis that’s representative of what I would typically work with. DATA SCIENCE FOR FRAUD DETECTION I have written the following post about Data Science for Fraud Detection at my company codecentric’s blog:. Fraud can be defined as “the crime of getting money by deceiving people” (Cambridge Dictionary); it is as old as humanity: whenever two parties exchange goods or conduct business there is the potential for one partyscamming the other.
Toggle navigation Shirin's playgRound* About me
* Archive
* Categories
* Feeds
* Tags
Submit
SHIRIN'S PLAYGROUND EXPLORING AND PLAYING WITH DATA IN R*
02 NOV 2017 » EXPLORE PREDICTIVE MAINTENANCE WITH FLEXDASHBOARDShirin Glander
I have written the following post about Predictive Maintenance andflexdashboard
at my company codecentric ’s blog:Continue reading...
*
28 SEP 2017 » BLOCKCHAIN & DISTRIBUTED ML - MY REPORT FROM THEDATA2DAY CONFERENCE
Shirin Glander
Continue reading...
*
20 SEP 2017 » FROM BIOLOGY TO INDUSTRY. A BLOGGER’S JOURNEY TODATA SCIENCE.
Shirin Glander
Today, I have given a webinar for the Applied Epidemiology Didactic of the University of Wisconsin - Madison titled “From Biology to Industry. A Blogger’s Journey to Data Science.”Continue reading...
*
19 SEP 2017 » WHY I USE R FOR DATA SCIENCE - AN ODE TO RShirin Glander
I have written a blog post about why I love Rand prefer
it to other languages. The post is on my new site , but since it isn’t on R-bloggers yet I am also posting the link here:
Continue reading...
*
14 SEP 2017 » MOVING MY BLOG TO BLOGDOWNShirin Glander
It’s been a long time coming but I finally moved my blog from Jekyll/Bootstrap on Github pages to blogdown, Hugo and Netlify ! Moreover, I also now have my own domain name www.shirin-glander.de . :-)Continue reading...
*
06 SEP 2017 » DATA SCIENCE FOR FRAUD DETECTIONShirin Glander
I have written the following post about Data Science for FraudDetection
at my company codecentric ’s blog:Continue reading...
*
04 SEP 2017 » MIGRATING FROM GITHUB TO GITLAB WITH RSTUDIO(TUTORIAL)
Shirin Glander
GITHUB VS. GITLAB
Continue reading...
*
28 JUL 2017 » SOCIAL NETWORK ANALYSIS AND TOPIC MODELING OF CODECENTRIC’S TWITTER FRIENDS AND FOLLOWERSShirin Glander
I have written the following post about Social Network Analysis and Topic Modeling of codecentric’s Twitter friends and followers for codecentric ’s blog:Continue reading...
*
17 JUL 2017 » HOW TO DO OPTICAL CHARACTER RECOGNITION (OCR) OF NON-ENGLISH DOCUMENTS IN R USING TESSERACT?Shirin Glander
One of the many great packages of rOpenSci has implemented the open source engine Tesseract.
Continue reading...
*
28 JUN 2017 » CHARACTERIZING TWITTER FOLLOWERS WITH TIDYTEXTShirin Glander
Lately, I have been more and more taken with tidy principles of data analysis. They are elegant and make analyses clearer and easier to comprehend. Following the TIDYVERSE and GGRAPH, I have been quite intrigued by applying tidy principles to text analysis with Julia Silge and David Robinson’s TIDYTEXT.
Continue reading...
*
13 JUN 2017 » DATA SCIENCE FOR BUSINESS - TIME SERIES FORECASTING PART 3: FORECASTING WITH FACEBOOK'S PROPHETShirin Glander
In my last two posts (Part 1and Part 2
),
I explored time series forecasting with the TIMEKIT package.Continue reading...
*
09 JUN 2017 » DATA SCIENCE FOR BUSINESS - TIME SERIES FORECASTING PART 2: FORECASTING WITH TIMEKITShirin Glander
In my last post
,
I prepared and visually explored time series data.Continue reading...
*
28 MAY 2017 » DATA SCIENCE FOR BUSINESS - TIME SERIES FORECASTING PART 1: EDA & DATA PREPARATIONShirin Glander
Data Science is a fairly broad term and encompasses a wide range of techniques from data visualization to statistics and machine learning models. But the techniques are only tools in a - sometimes very messy - toolbox. And while it is important to know and understand these tools, here, I want to go at it from a different angle: What is the task at hand that data science tools can help tackle, and what question do we want to have answered?Continue reading...
*
20 MAY 2017 » NEW R USERS GROUP IN MÜNSTER!Shirin Glander
This is to announce that Münster now has its very own R users group!Continue reading...
*
15 MAY 2017 » NETWORK ANALYSIS OF GAME OF THRONES FAMILY TIESShirin Glander
In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones.Continue reading...
*
02 MAY 2017 » UPDATE TO AUTOENCODERS AND ANOMALY DETECTION WITH MACHINE LEARNING IN FRAUD ANALYTICSShirin Glander
This is a reply to Wojciech Indyk’s comment on yesterday’s post on autoencoders and anomaly detection with machine learning in fraudanalytics :
Continue reading...
*
01 MAY 2017 » AUTOENCODERS AND ANOMALY DETECTION WITH MACHINE LEARNING IN FRAUD ANALYTICSShirin Glander
All my previous posts on machine learning have dealt with supervised learning. But we can also use machine learning for unsupervised learning. The latter are e.g. used for clustering and (non-linear) dimensionality reduction.Continue reading...
*
23 APR 2017 » DOES MONEY BUY HAPPINESS AFTER ALL? MACHINE LEARNINGWITH ONE RULE
Shirin Glander
This week, I am exploring Holger K. von Jouanne-Diedrich’s OneRpackage
for
machine learning. I am running an example analysis on world happiness data and compare the results with other machine learning models (decision trees, random forest, gradient boosting trees and neuralnets).
Continue reading...
*
23 APR 2017 » EXPLAINING COMPLEX MACHINE LEARNING MODELS WITH LIMEShirin Glander
The classification decisions made by machine learning models are usually difficult - if not impossible - to understand by our human brains. The complexity of some of the most accurate classifiers, like neural networks, is what makes them perform so well - often with better results than achieved by humans. But it also makes them inherently hard to explain, especially to non-data scientists.Continue reading...
*
16 APR 2017 » HAPPY EASTER: PLOTTING HARE POPULATIONS IN GERMANYShirin Glander
For Easter, I wanted to have a look at the number of hares in Germany. Wild hare populations have been rapidly declining over the last 10 years but during the last three years they have at least been stable.Continue reading...
*
09 APR 2017 » DATA ON TOUR: PLOTTING 3D MAPS AND LOCATION TRACKSDr. Shirin Glander
Recently, I was on Gran Canaria for a vacation. So, what better way to keep up the holiday spirit a while longer than to visualize all the places we went in R!?Continue reading...
*
02 APR 2017 » DEALING WITH UNBALANCED DATA IN MACHINE LEARNINGShirin Glander
In my last post
,
where I shared the code that I used to produce an example analysis to go along with my webinar on building meaningful models for diseaseprediction
,
I mentioned that it is advised to consider over- or under-sampling when you have unbalanced data sets. Because my focus in this webinar was on evaluating model performance, I did not want to add an additional layer of complexity and therefore did not further discuss how to specifically deal with unbalanced data.Continue reading...
*
31 MAR 2017 » BUILDING MEANINGFUL MACHINE LEARNING MODELS FORDISEASE PREDICTION
Shirin Glander
WEBINAR FOR THE ISDS R GROUPContinue reading...
*
16 MAR 2017 » PLOTTING TREES FROM RANDOM FOREST MODELS WITH GGRAPHShirin Glander
Today, I want to show how I use Thomas Lin Pedersen’s awesome ggraph package to plot decision trees from Random Forest models.Continue reading...
*
07 MAR 2017 » HYPER-PARAMETER TUNING WITH GRID SEARCH FOR DEEPLEARNING
Shirin Glander
Last week I showed how to build a deep neural network with H2O andRSPARKLING
. As we
could see there, it is not trivial to optimize the hyper-parameters for modeling. Hyper-parameter tuning with grid search allows us to test different combinations of hyper-parameters and find one withimproved accuracy.
Continue reading...
*
27 FEB 2017 » BUILDING DEEP NEURAL NETS WITH H2O AND RSPARKLING THAT PREDICT ARRHYTHMIA OF THE HEARTShirin Glander
Last week, I introduced how to run machine learning applications on Spark from within R, using the SPARKLYR package. This week, I am showing how to build feed-forward deep neural networks or multilayer perceptrons. The models in this example are built to classify ECG data into being either from _healthy_ hearts or from someone suffering from _arrhythmia_. I will show how to prepare a dataset for modeling, setting weights and other modeling parameters and finally, how to evaluate model performance with the H2O packagevia RSPARKLING.
Continue reading...
*
19 FEB 2017 » PREDICTING FOOD PREFERENCES WITH SPARKLYR (MACHINELEARNING)
Shirin Glander
This week I want to show how to run machine learning applications on a Spark cluster. I am using the SPARKLYR package, which provides a handy interface to access Apache Spark functionalities via R.Continue reading...
*
12 FEB 2017 » CONDITIONAL GGPLOT2 GEOMS IN FUNCTIONS (QTL PLOTS)Shirin Glander
When running an analysis, I am usually combining functions from multiple packages. Most of these packages come with their own plotting functions. And while they are certainly convenient in that they allow me to get a quick glance at the data or the output, they all have their own style. If I want to prepare a report, proposal or a paper though, I want all my plots to come from a single cast so that they give a consistent feel to the story I want to tell with my data.Continue reading...
*
06 FEB 2017 » SCRATCHING THE SURFACE OF GENDER BIASESShirin Glander
Today, I want to share my analysis of the World Gender Statisticsdataset.
Continue reading...
*
30 JAN 2017 » NEW FEATURES IN WORLD GENDER STATISTICS APPShirin Glander
In my last post , I
built a shiny app to explore World Gender Statistics.
Continue reading...
*
29 JAN 2017 » EXPLORING WORLD GENDER STATISTICS WITH SHINYShirin Glander
This week I explored the World Gender Statistics dataset. You can look at 160 measurements over 56 years with my Shiny app here.
Continue reading...
*
22 JAN 2017 » R VS PYTHON - A ONE-ON-ONE COMPARISONShirin Glander
I’m an avid R user and rarely use anything else for data analysis and visualisations. But while R is my go-to, in some cases, Python might actually be a better alternative.
Continue reading...
*
15 JAN 2017 » FEATURE SELECTION IN MACHINE LEARNING (BREAST CANCERDATASETS)
Shirin Glander
Machine learning uses so called features (i.e. variables or attributes) to generate predictive models. Using a suitable combination of features is essential for obtaining high precision and accuracy. Because too many (unspecific) features pose the problem of overfitting the model, we generally want to restrict the features in our models to those, that are most relevant for the response variable we want to predict. Using as few features as possible will also reduce the complexity of our models, which means it needs less time and computer power to run and is easier to understand.Continue reading...
*
05 JAN 2017 » GENE HOMOLOGY PART 3 - VISUALIZING GENE ONTOLOGY OFCONSERVED GENES
Shirin Glander
WHICH GENES HAVE HOMOLOGS IN MANY SPECIES?Continue reading...
*
30 DEC 2016 » HOW TO MAP YOUR GOOGLE LOCATION HISTORY WITH RShirin Glander
It’s no secret that Google Big Brothers most of us. But at least they allow us to access quite a lot of the data they have collected on us. Among this is the Google location history.Continue reading...
*
22 DEC 2016 » ANIMATING PLOTS OF BEER INGREDIENTS AND SIN TAXES OVERTIME
Shirin Glander
With the upcoming holidays, I thought it fitting to finally explorethe ttbbeer
package. It contains data on beer ingredients used in US breweries from 2006 to 2015 and on the (sin) taxrates for
beer, champagne, distilled spirits, wine and various tobacco itemssince 1862.
Continue reading...
*
18 DEC 2016 » HOW TO BUILD A SHINY APP FOR DISEASE- & TRAIT-ASSOCIATED LOCATIONS OF THE HUMAN GENOMEShirin Glander
This app is based on the gwascatR
package and its _ebicat38_ database and shows trait-associated SNP locations of the human genome. You can visualize and compare the genomic locations of up to 8 traits simultaneously.Continue reading...
*
14 DEC 2016 » GENE HOMOLOGY PART 2 - CREATING DIRECTED NETWORKS WITHIGRAPH
Shirin Glander
In my last post
I
created a gene homology network for human genes. In this post I want to extend the network to include edges for other species.Continue reading...
*
11 DEC 2016 » CREATING A NETWORK OF HUMAN GENE HOMOLOGY WITH R ANDD3
Shirin Glander
EDITED ON 20 DECEMBER 2016Continue reading...
*
04 DEC 2016 » HOW TO SET UP YOUR OWN R BLOG WITH GITHUB PAGES ANDJEKYLL BOOTSTRAP
Shirin Glander
THIS POST IS IN REPLY TO A REQUEST: HOW DID I SET UP THIS R BLOG?Continue reading...
*
02 DEC 2016 » EXTREME GRADIENT BOOSTING AND PREPROCESSING IN MACHINE LEARNING - ADDENDUM TO PREDICTING FLU OUTCOME WITH RShirin Glander
In last week’s post I explored whether machine learning models can be applied to predict flu deaths from the 2013 outbreak of influenza A H7N9 in China. There, I compared random forests, elastic-net regularized generalized linear models, k-nearest neighbors, penalized discriminant analysis, stabilized linear discriminant analysis, nearest shrunken centroids, single C5.0 tree and partial least squares.Continue reading...
*
27 NOV 2016 » CAN WE PREDICT FLU DEATHS WITH MACHINE LEARNING AND R?Shirin Glander
EDITED ON 26 DECEMBER 2016Continue reading...
*
20 NOV 2016 » ANALYSING THE GILMORE GIRLS' COFFEE ADDICTION WITH RShirin Glander
Last week’s post showed how to create a Gilmore Girls characternetwork
.
Continue reading...
*
13 NOV 2016 » CREATING A GILMORE GIRLS CHARACTER NETWORK WITH RShirin Glander
With the impending (and by many - including me - much awaited) Gilmore Girls Revival , I wanted to take a somewhat different look at our beloved characters from Stars Hollow.Continue reading...
*
06 NOV 2016 » IS 'YEAH' JOSH AND CHUCK'S FAVORITE WORD?Shirin Glander
TEXT MINING AND SENTIMENT ANALYSIS OF A STUFF YOU SHOULD KNOW PODCASTContinue reading...
*
01 NOV 2016 » EXPLORING THE HUMAN GENOME (PART 2) - TRANSCRIPTSShirin Glander
HOW MANY TRANSCRIPTS AND PROTEINS DO GENES HAVE?Continue reading...
*
23 OCT 2016 » EXPLORING THE HUMAN GENOME (PART 1) - GENE ANNOTATIONSShirin Glander
When working with any type of genome data, we often look for annotation information about genes, e.g. what’s the gene’s full name, what’s its abbreviated symbol, what ID it has in other databases, what functions have been described, how many and which transcripts exist, etc.Continue reading...
*
16 OCT 2016 » USA/ CANADA ROADTRIP 2016Shirin Glander
MAPPING GPS DATA FROM OUR USA/ CANADA ROADTRIPContinue reading...
*
29 SEP 2016 » DESEQ2 COURSE WORKShirin Glander
-------------------------Continue reading...
*
28 SEP 2016 » EXPRANALYSIS PACKAGEShirin Glander
I created the R package EXPRANALYSIS designed to streamline my RNA-seq data analysis pipeline. Below you find the vignette for installation and usage of the package.Continue reading...
* Prev
* 1
* Next
------------------------- Also check out R-bloggers for lots of coolR stuff!
2019 Shirin Elsinghorstwith help from
Jekyll Bootstrap and BootstrapDetails
Copyright © 2024 ArchiveBay.com. All rights reserved. Terms of Use | Privacy Policy | DMCA | 2021 | Feedback | Advertising | RSS 2.0