A complete backup of napitupulu-jon.appspot.com

More Annotations

Favourite Annotations

Text

DATA SCIENCE, AI, & BUSINESS When you have single evaluation metric, you have to know what is the impact of your results to the business side. Analytical speaking, you want to find whether your results is significantly different. DATASETS AND QUESTION The Enron fraud is a big, messy and totally fascinating story about corporate malfeasance of nearly every imaginable type. The Enron email and financial datasets are also big, messy treasure troves of information, which become much more useful once you know your way

around them a bit.

K-MEANS WITH SCIKIT-LEARN K-Means ¶. Introduced K-Means, one of the most famous algorithm of Unsupervised Learning. First we randomly place two (or any other number, definitely less than the number of data point, and in this example, intuitively we pick number two) cluster center. Next, we're assigning all the data points to whichever cluster closest to them. CONDITIONS AND INFERENCE OF LINEAR REGRESSION Up until know, you should now intuitively that linear regression is the least squares line that minimizes the sum of squared residuals. We can check the conditions for linear regression, by looking at linearity, nearly normal residuals, and constant variability. We also looking for linear regression of categorical variables, and how to make inference on it. A/B TESTING OVERVIEW A/B Testing is the test that we want to test for particular product. Usually A/B testing works for testing changes in elements in the web page. A/B testing framework is following sequence: Design a r DECISION TREES WITH SCIKIT-LEARN Decision Trees with scikit-learn. Decision Trees is one of the oldest machine learning algorithm. It's extremely robutst, and it can traceback for decades. Decision Trees is the algorithm that without expensive kernel in SVM, able to solve non-linear problem with linear surface. For other information, please check this link. TEXT LEARNING WITH SCIKIT-LEARN Text Learning with scikit-learn. Text Learning, is machine learning on broad area which incorporate text. Many search giants, like Google, Yahoo, Baidu, tried to to learn text from various search. In this example we take a look at bag of words, which contains words, and from the data, count the frequency of word occurs in the text. OUTLIERS WITH SCIKIT-LEARN Outlier in datapoints is normally occurs. It probably some mistyped data from input of other people (e.g. 200 instead of 20). In this plot we see there's outliers that drawn outside the trend of the d FEATURE SELECTION WITH SCIKIT-LEARN %%writefile visualize_new_feature.py import pickle from get_data import getData def computeFraction( poi_messages, all_messages ): """ given a number messages to/from POI (numerator) and number of all messages to/from a person (denominator), return the fraction of messages to/from that person that are from/to a POI """ ### you fill in this code, so that it returns either ## COMPARING CATEGORICAL PROPORTIONS AND CHI-SQUARE So eleven out of twelve people correctly guess the back of their hand. Since this is the data from our study, proportion hat is 0.9167. proportion that we are going to test is the proportion in population given that this is random choosing, 0.1 (1/10 proportion of successfully predicting 1 DATA SCIENCE, AI, & BUSINESS When you have single evaluation metric, you have to know what is the impact of your results to the business side. Analytical speaking, you want to find whether your results is significantly different. DATASETS AND QUESTION The Enron fraud is a big, messy and totally fascinating story about corporate malfeasance of nearly every imaginable type. The Enron email and financial datasets are also big, messy treasure troves of information, which become much more useful once you know your way

around them a bit.

K-MEANS WITH SCIKIT-LEARN K-Means ¶. Introduced K-Means, one of the most famous algorithm of Unsupervised Learning. First we randomly place two (or any other number, definitely less than the number of data point, and in this example, intuitively we pick number two) cluster center. Next, we're assigning all the data points to whichever cluster closest to them. CONDITIONS AND INFERENCE OF LINEAR REGRESSION Up until know, you should now intuitively that linear regression is the least squares line that minimizes the sum of squared residuals. We can check the conditions for linear regression, by looking at linearity, nearly normal residuals, and constant variability. We also looking for linear regression of categorical variables, and how to make inference on it. A/B TESTING OVERVIEW A/B Testing is the test that we want to test for particular product. Usually A/B testing works for testing changes in elements in the web page. A/B testing framework is following sequence: Design a r DECISION TREES WITH SCIKIT-LEARN Decision Trees with scikit-learn. Decision Trees is one of the oldest machine learning algorithm. It's extremely robutst, and it can traceback for decades. Decision Trees is the algorithm that without expensive kernel in SVM, able to solve non-linear problem with linear surface. For other information, please check this link. TEXT LEARNING WITH SCIKIT-LEARN Text Learning with scikit-learn. Text Learning, is machine learning on broad area which incorporate text. Many search giants, like Google, Yahoo, Baidu, tried to to learn text from various search. In this example we take a look at bag of words, which contains words, and from the data, count the frequency of word occurs in the text. OUTLIERS WITH SCIKIT-LEARN Outlier in datapoints is normally occurs. It probably some mistyped data from input of other people (e.g. 200 instead of 20). In this plot we see there's outliers that drawn outside the trend of the d FEATURE SELECTION WITH SCIKIT-LEARN %%writefile visualize_new_feature.py import pickle from get_data import getData def computeFraction( poi_messages, all_messages ): """ given a number messages to/from POI (numerator) and number of all messages to/from a person (denominator), return the fraction of messages to/from that person that are from/to a POI """ ### you fill in this code, so that it returns either ## COMPARING CATEGORICAL PROPORTIONS AND CHI-SQUARE So eleven out of twelve people correctly guess the back of their hand. Since this is the data from our study, proportion hat is 0.9167. proportion that we are going to test is the proportion in population given that this is random choosing, 0.1 (1/10 proportion of successfully predicting 1 INTRODUCTION TO LINEAR REGRESSION We can replace the formula with $\bar{y}$, which denotes the average of response variables, and $\bar{x}$, which denotes the average of explanatory variable.The intercept will then simply swapping the parameters.You see that in linear regression, it expected that the line is go through the center of the data. A/B TESTING OVERVIEW A/B testing is used as a general method online to test features, decide audience control and experiment set, and which is better. A/B testing is used to find the global maximum significant of one changes, between control group and experiment group.

DIAMONDS-ANALYSIS

Diamonds increase significantly from the mine to the market. Who knows in each step in the process, there’s will be a big jump of price. Finally, as earlier we stated, there’s cartel of diamonds (e.g. De Beer) that maybe monopoly some price of diamonds in the market. Not just them, but also major players in the diamonds market. DECISION TREES WITH SCIKIT-LEARN Decision Trees with scikit-learn. Decision Trees is one of the oldest machine learning algorithm. It's extremely robutst, and it can traceback for decades. Decision Trees is the algorithm that without expensive kernel in SVM, able to solve non-linear problem with linear surface. For other information, please check this link. A/B TESTING SANITY CHECK Beside the population sizing as invariant metric, there's also could be other thing to pick depending on your case. Suppose Audacity running two experiment, changing order course list, and changing infrastructure to reduce the load time. HYPOTHESIS TESTING AND CONFIDENCE INTERVAL FOR CATEGORICAL In numerical variable, you want to take the average mean and infer the average and the differences. In categorical variable, you take the proportion of frequency, you may want to perform some contigency table.Studies that take percentage are likely categorical variables (XX% support vs XX% oppose same sex marriage). A/B TESTING SINGLE METRIC A/B Testing Single Metric. When you have single evaluation metric, you have to know what is the impact of your results to the business side. Analytical speaking, you want to find whether your results is significantly different. You then also want to know A/B TESTING MULTIPLE METRICS We can use multiple comparison when for example we have automated alerting. See if suddenly metric that behave differently occurs. Or if we use automated framework in exploratory data analysis, you want to make sure that the metric is occurs and the different is repeatable. PCA WITH SCIKIT-LEARN PCA is used thoroughly for most of the time in visualization data, alongside feature set compression. It's hard (othwerwise impossible)

to interpret

USING MAPREDUCE AND DESIGN PATTERN Using MapReduce and Design Pattern. MapReduce can handle all your process in the Hadoop File System. It broke your data into chunks that reside in each cluster, then perform your data in parallel way. Suppose we need hashtable and we use it to add key value to our hashtable. If there's millions data, we can be low on memory, out of

memory perhaps.

around them a bit.

K-MEANS WITH SCIKIT-LEARN K-Means ¶. Introduced K-Means, one of the most famous algorithm of Unsupervised Learning. First we randomly place two (or any other number, definitely less than the number of data point, and in this example, intuitively we pick number two) cluster center. Next, we're assigning all the data points to whichever cluster closest to them. TEXT LEARNING WITH SCIKIT-LEARN Text Learning with scikit-learn. Text Learning, is machine learning on broad area which incorporate text. Many search giants, like Google, Yahoo, Baidu, tried to to learn text from various search. In this example we take a look at bag of words, which contains words, and from the data, count the frequency of word occurs in the text. DECISION TREES WITH SCIKIT-LEARN Decision Trees with scikit-learn. Decision Trees is one of the oldest machine learning algorithm. It's extremely robutst, and it can traceback for decades. Decision Trees is the algorithm that without expensive kernel in SVM, able to solve non-linear problem with linear surface. For other information, please check this link. A/B TESTING SANITY CHECK Beside the population sizing as invariant metric, there's also could be other thing to pick depending on your case. Suppose Audacity running two experiment, changing order course list, and changing infrastructure to reduce the load time. OUTLIERS WITH SCIKIT-LEARN Outlier in datapoints is normally occurs. It probably some mistyped data from input of other people (e.g. 200 instead of 20). In this plot we see there's outliers that drawn outside the trend of the d HYPOTHESIS TESTING AND CONFIDENCE INTERVAL FOR CATEGORICAL In numerical variable, you want to take the average mean and infer the average and the differences. In categorical variable, you take the proportion of frequency, you may want to perform some contigency table.Studies that take percentage are likely categorical variables (XX% support vs XX% oppose same sex marriage). FEATURE SELECTION WITH SCIKIT-LEARN %%writefile visualize_new_feature.py import pickle from get_data import getData def computeFraction( poi_messages, all_messages ): """ given a number messages to/from POI (numerator) and number of all messages to/from a person (denominator), return the fraction of messages to/from that person that are from/to a POI """ ### you fill in this code, so that it returns either ## USING MAPREDUCE AND DESIGN PATTERN Using MapReduce and Design Pattern. MapReduce can handle all your process in the Hadoop File System. It broke your data into chunks that reside in each cluster, then perform your data in parallel way. Suppose we need hashtable and we use it to add key value to our hashtable. If there's millions data, we can be low on memory, out of

memory perhaps.

around them a bit.

K-MEANS WITH SCIKIT-LEARN K-Means ¶. Introduced K-Means, one of the most famous algorithm of Unsupervised Learning. First we randomly place two (or any other number, definitely less than the number of data point, and in this example, intuitively we pick number two) cluster center. Next, we're assigning all the data points to whichever cluster closest to them. TEXT LEARNING WITH SCIKIT-LEARN Text Learning with scikit-learn. Text Learning, is machine learning on broad area which incorporate text. Many search giants, like Google, Yahoo, Baidu, tried to to learn text from various search. In this example we take a look at bag of words, which contains words, and from the data, count the frequency of word occurs in the text. DECISION TREES WITH SCIKIT-LEARN Decision Trees with scikit-learn. Decision Trees is one of the oldest machine learning algorithm. It's extremely robutst, and it can traceback for decades. Decision Trees is the algorithm that without expensive kernel in SVM, able to solve non-linear problem with linear surface. For other information, please check this link. A/B TESTING SANITY CHECK Beside the population sizing as invariant metric, there's also could be other thing to pick depending on your case. Suppose Audacity running two experiment, changing order course list, and changing infrastructure to reduce the load time. OUTLIERS WITH SCIKIT-LEARN Outlier in datapoints is normally occurs. It probably some mistyped data from input of other people (e.g. 200 instead of 20). In this plot we see there's outliers that drawn outside the trend of the d HYPOTHESIS TESTING AND CONFIDENCE INTERVAL FOR CATEGORICAL In numerical variable, you want to take the average mean and infer the average and the differences. In categorical variable, you take the proportion of frequency, you may want to perform some contigency table.Studies that take percentage are likely categorical variables (XX% support vs XX% oppose same sex marriage). FEATURE SELECTION WITH SCIKIT-LEARN %%writefile visualize_new_feature.py import pickle from get_data import getData def computeFraction( poi_messages, all_messages ): """ given a number messages to/from POI (numerator) and number of all messages to/from a person (denominator), return the fraction of messages to/from that person that are from/to a POI """ ### you fill in this code, so that it returns either ## USING MAPREDUCE AND DESIGN PATTERN Using MapReduce and Design Pattern. MapReduce can handle all your process in the Hadoop File System. It broke your data into chunks that reside in each cluster, then perform your data in parallel way. Suppose we need hashtable and we use it to add key value to our hashtable. If there's millions data, we can be low on memory, out of

memory perhaps.

NAIVE BAYES

Let's dig deeper about Naive Bayes. Bayes is actually a religious man trying to prove the existing of God, the algorithm that he makes that makes it naive. Naive Bayes itself later will make decision boundary as the one in the picture. So the the incoming sample USING MAPREDUCE AND DESIGN PATTERN Using MapReduce and Design Pattern. MapReduce can handle all your process in the Hadoop File System. It broke your data into chunks that reside in each cluster, then perform your data in parallel way. Suppose we need hashtable and we use it to add key value to our hashtable. If there's millions data, we can be low on memory, out of

memory perhaps.

FUNDAMENTALS OF DATA VISUALIZATION The Visualization is important. Picture means thousands word is no joke.Data Visualization is about how we turned raw data, number in

table, row,

PCA WITH SCIKIT-LEARN PCA is used thoroughly for most of the time in visualization data, alongside feature set compression. It's hard (othwerwise impossible)

to interpret

PAIRED DATA AND BOOTSTRAPPING Let's take a look athe example of 2010 GSS, where among the variables are highest degree, categorical and hours, numerical discrete. We can used side-by-side boxplot to plot between categorical and numerical variables.But what we're currently concern about is whether they got college degree or not. EXPLORATORY DATA ANALYSIS Exploratory Data Analysis is important when you want to get better understanding about your data. This data comes from GapMinder, which consist of salary and life expectancy for each year in the given country.It's clear that the data that GapMinder has is Observational Studies, and one shouldn't infer causation and only observe the

correlation.

WRANGLING WITH VARIOUS DATA FORMATS # Your task is to read the input DATAFILE line by line, and for the first 10 lines (not including the header) # split each line on "," and then for each line, create a dictionary # where the key is the header title of the field, and the value is the value of that field in the row. # The function parse_file should return a list of dictionaries, # each data line in the file being a single list COMPARING CATEGORICAL PROPORTIONS AND CHI-SQUARE So eleven out of twelve people correctly guess the back of their hand. Since this is the data from our study, proportion hat is 0.9167. proportion that we are going to test is the proportion in population given that this is random choosing, 0.1 (1/10 proportion of successfully predicting 1 DATA SCIENCE, AI, & BUSINESS When you have single evaluation metric, you have to know what is the impact of your results to the business side. Analytical speaking, you want to find whether your results is significantly different. DATASETS AND QUESTION The Enron fraud is a big, messy and totally fascinating story about corporate malfeasance of nearly every imaginable type. The Enron email and financial datasets are also big, messy treasure troves of information, which become much more useful once you know your way

around them a bit.

A/B TESTING OVERVIEW A/B Testing is the test that we want to test for particular product. Usually A/B testing works for testing changes in elements in the web page. A/B testing framework is following sequence: Design a r TEXT LEARNING WITH SCIKIT-LEARN Text Learning with scikit-learn. Text Learning, is machine learning on broad area which incorporate text. Many search giants, like Google, Yahoo, Baidu, tried to to learn text from various search. In this example we take a look at bag of words, which contains words, and from the data, count the frequency of word occurs in the text.

NAIVE BAYES

table, row,

OUTLIERS WITH SCIKIT-LEARN Outlier in datapoints is normally occurs. It probably some mistyped data from input of other people (e.g. 200 instead of 20). In this plot we see there's outliers that drawn outside the trend of the d FEATURE SELECTION WITH SCIKIT-LEARN %%writefile visualize_new_feature.py import pickle from get_data import getData def computeFraction( poi_messages, all_messages ): """ given a number messages to/from POI (numerator) and number of all messages to/from a person (denominator), return the fraction of messages to/from that person that are from/to a POI """ ### you fill in this code, so that it returns either ## HYPOTHESIS TESTING AND CONFIDENCE INTERVAL FOR CATEGORICAL In numerical variable, you want to take the average mean and infer the average and the differences. In categorical variable, you take the proportion of frequency, you may want to perform some contigency table.Studies that take percentage are likely categorical variables (XX% support vs XX% oppose same sex marriage). USING MAPREDUCE AND DESIGN PATTERN Using MapReduce and Design Pattern. MapReduce can handle all your process in the Hadoop File System. It broke your data into chunks that reside in each cluster, then perform your data in parallel way. Suppose we need hashtable and we use it to add key value to our hashtable. If there's millions data, we can be low on memory, out of

memory perhaps.

around them a bit.

NAIVE BAYES

table, row,

memory perhaps.

A/B TESTING OVERVIEW A/B testing is used as a general method online to test features, decide audience control and experiment set, and which is better. A/B testing is used to find the global maximum significant of one changes, between control group and experiment group. K-MEANS WITH SCIKIT-LEARN K-Means ¶. Introduced K-Means, one of the most famous algorithm of Unsupervised Learning. First we randomly place two (or any other number, definitely less than the number of data point, and in this example, intuitively we pick number two) cluster center. Next, we're assigning all the data points to whichever cluster closest to them. DECISION TREES WITH SCIKIT-LEARN Decision Trees with scikit-learn. Decision Trees is one of the oldest machine learning algorithm. It's extremely robutst, and it can traceback for decades. Decision Trees is the algorithm that without expensive kernel in SVM, able to solve non-linear problem with linear surface. For other information, please check this link.

NAIVE BAYES

table, row,

EXPLORING-TWO-VARIABLES .main-container { max-width: 940px; margin-left: auto; margin-right: auto; } Exploring two variables in R with scatterplot, jitter and smoothing to handle overplotting In this lesson we A/B TESTING SANITY CHECK Beside the population sizing as invariant metric, there's also could be other thing to pick depending on your case. Suppose Audacity running two experiment, changing order course list, and changing infrastructure to reduce the load time. EXPLORATORY DATA ANALYSIS Exploratory Data Analysis is important when you want to get better understanding about your data. This data comes from GapMinder, which consist of salary and life expectancy for each year in the given country.It's clear that the data that GapMinder has is Observational Studies, and one shouldn't infer causation and only observe the

correlation.

COMPARING CATEGORICAL PROPORTIONS AND CHI-SQUARE So eleven out of twelve people correctly guess the back of their hand. Since this is the data from our study, proportion hat is 0.9167. proportion that we are going to test is the proportion in population given that this is random choosing, 0.1 (1/10 proportion of successfully predicting 1 WRANGLING WITH VARIOUS DATA FORMATS # Your task is to read the input DATAFILE line by line, and for the first 10 lines (not including the header) # split each line on "," and then for each line, create a dictionary # where the key is the header title of the field, and the value is the value of that field in the row. # The function parse_file should return a list of dictionaries, # each data line in the file being a single list DATA SCIENCE, AI, & BUSINESS When you have single evaluation metric, you have to know what is the impact of your results to the business side. Analytical speaking, you want to find whether your results is significantly different. DATASETS AND QUESTION The Enron fraud is a big, messy and totally fascinating story about corporate malfeasance of nearly every imaginable type. The Enron email and financial datasets are also big, messy treasure troves of information, which become much more useful once you know your way

around them a bit.

A/B TESTING OVERVIEW A/B Testing is the test that we want to test for particular product. Usually A/B testing works for testing changes in elements in the web page. A/B testing framework is following sequence: Design a r TEXT LEARNING WITH SCIKIT-LEARN Text Learning, is machine learning on broad area which incorporate text. Many search giants, like Google, Yahoo, Baidu, tried to to learn text from various search. In this example we take a REGRESSION WITH SCIKIT-LEARN Supervised Learning has divided into 2 major category, classfication and regression. The classfication is where the machine learning algorithm predict discrete output, the FUNDAMENTALS OF DATA VISUALIZATION The Visualization is important. Picture means thousands word is no joke.Data Visualization is about how we turned raw data, number in

table, row,

OUTLIERS WITH SCIKIT-LEARN Outlier in datapoints is normally occurs. It probably some mistyped data from input of other people (e.g. 200 instead of 20). In this plot we see there's outliers that drawn outside the trend of the d FEATURE SELECTION WITH SCIKIT-LEARN %%writefile visualize_new_feature.py import pickle from get_data import getData def computeFraction( poi_messages, all_messages ): """ given a number messages to/from POI (numerator) and number of all messages to/from a person (denominator), return the fraction of messages to/from that person that are from/to a POI """ ### you fill in this code, so that it returns either ## HYPOTHESIS TESTING AND CONFIDENCE INTERVAL FOR CATEGORICAL In numerical variable, you want to take the average mean and infer the average and the differences. In categorical variable, you take the proportion of frequency, you may want to perform some contigency table.Studies that take percentage are likely categorical variables (XX% support vs XX% oppose same sex marriage). USING MAPREDUCE AND DESIGN PATTERN Supppose we have mappers and reducers that acts as a person. We want to collect from our files that contains all the sales in 2012. We want

to know what is

around them a bit.

A/B TESTING OVERVIEW A/B Testing is the test that we want to test for particular product. Usually A/B testing works for testing changes in elements in the web page. A/B testing framework is following sequence: Design a r TEXT LEARNING WITH SCIKIT-LEARN Text Learning, is machine learning on broad area which incorporate text. Many search giants, like Google, Yahoo, Baidu, tried to to learn text from various search. In this example we take a REGRESSION WITH SCIKIT-LEARN Supervised Learning has divided into 2 major category, classfication and regression. The classfication is where the machine learning algorithm predict discrete output, the FUNDAMENTALS OF DATA VISUALIZATION The Visualization is important. Picture means thousands word is no joke.Data Visualization is about how we turned raw data, number in

table, row,

OUTLIERS WITH SCIKIT-LEARN Outlier in datapoints is normally occurs. It probably some mistyped data from input of other people (e.g. 200 instead of 20). In this plot we see there's outliers that drawn outside the trend of the d FEATURE SELECTION WITH SCIKIT-LEARN %%writefile visualize_new_feature.py import pickle from get_data import getData def computeFraction( poi_messages, all_messages ): """ given a number messages to/from POI (numerator) and number of all messages to/from a person (denominator), return the fraction of messages to/from that person that are from/to a POI """ ### you fill in this code, so that it returns either ## HYPOTHESIS TESTING AND CONFIDENCE INTERVAL FOR CATEGORICAL In numerical variable, you want to take the average mean and infer the average and the differences. In categorical variable, you take the proportion of frequency, you may want to perform some contigency table.Studies that take percentage are likely categorical variables (XX% support vs XX% oppose same sex marriage). USING MAPREDUCE AND DESIGN PATTERN Supppose we have mappers and reducers that acts as a person. We want to collect from our files that contains all the sales in 2012. We want

to know what is

NAIVE BAYES

Sebastian Thrun is the head of the project of google automatic driving car. He uses supervised classification to train the car. Supervised mean that we're giving lot of correct examples, and then give it as material lesson to the system as a student. K-MEANS WITH SCIKIT-LEARN Introduced K-Means, one of the most famous algorithm of Unsupervised Learning. First we randomly place two (or any other number, definitely less than the number of data point, and in this example, intuitively we pick number two) cluster center. EXPLORING-TWO-VARIABLES .main-container { max-width: 940px; margin-left: auto; margin-right: auto; } Exploring two variables in R with scatterplot, jitter and smoothing to handle overplotting In this lesson we FUNDAMENTALS OF DATA VISUALIZATION The Visualization is important. Picture means thousands word is no joke.Data Visualization is about how we turned raw data, number in

table, row,

VALIDATION WITH SCIKIT-LEARN Another method is using K-Fold, where you split our dataset into K units. You narrow the test set to 1 units, and K-1 units as training set. Then we take iterative K-steps with different test bin each steps, springing K units test results. A/B TESTING SANITY CHECK Beside the population sizing as invariant metric, there's also could be other thing to pick depending on your case. Suppose Audacity running two experiment, changing order course list, and changing infrastructure to reduce the load time. EXPLORATORY DATA ANALYSIS Exploratory Data Analysis is important when you want to get better understanding about your data. This data comes from GapMinder, which consist of salary and life expectancy for each year in the given country.It's clear that the data that GapMinder has is Observational Studies, and one shouldn't infer causation and only observe the

correlation.

* __

DEEP LEARNING FOR LETTER RECOGNITION WITH TENSORFLOW

3 years ago

Tags : * Data Science

* Deep Learning

* machine learning

* Neural Networks

* Tensorflow

* Udacity

It's been 6 months since my last blog. There are multiple blog drafts to be honest, but many of them took a really long time to finish. I guess I should have sliced the material a bit to multiple blogs. Anyway in this blog, I want to show how to achieve more than 95% accuracy with just Macbook Air. So let's dig down to details. This material originally comes Deep Learning Lecture 4 from Udacity

.

In this blog, Pickling, reformat, accuracy, and session are theirs, but the architecture is my own which is the core of Deep Learning. It's kind of refreshing because I have experience it before (yes, Andrew Ng's Coursera Machine Learning on Neural Network).

3 years ago

Tags : * Data Science

* machine learning

* Statistics

In a Data Science process, after Data Scientist question the data and extract many useful information, it's time to get into the modeling

process.

4 years ago

Tags : * Data Science

* machine learning

* Statistics

I just followed John Hopkin's Executive Data Science team. In the first chapter of the course

Jeff Leek said,

> In Data Science, the importance is science and not data. Data > Science is only useful when we use data to answer the question.

EDA ON PROSPER LOAN

4 years ago

Tags : * Data Analysis Fintech company is a place where you can borrow and lend a money. The power to send is limitless. As one of Forbes Articles described, “Fintech companies, as they’ve come to be called, are easing payment processes, reducing fraud, saving users money, promoting financial planning, and ultimately moving a giant industry forward.” When talking about fintech companies, one that comes to mind is Prosper . In this blog, I will use their data to perform the analysis.

TITANIC

4 years ago

In this article, I will try to investigate the following question, > Looking at socio-economic status, gender, and age, who's and who's > not survive the Titanic? Below is the description of titanic data, from the original link,

Kaggle .

4 years ago

Tags : * A/B Testing

* Data Analysis

* Statistics

* Udacity

One thing could be changed when you do multiple metrics instead of single metric. There could be one metric that could occur significantly different by chance. That is if you choose 5% fixed significant level, there could be one metric that significant, but only one time. When you do some experiment in any other day, it shouldn't be reoccured. One thing that we could do is perform multiple comparison, see which of the metric behave differently. We can use multiple comparison when for example we have automated alerting. See if suddenly metric that behave differently occurs. Or if we use automated framework in exploratory data analysis, you want to make sure that the metric is occurs and the different is repeatable.

4 years ago

Tags : * A/B Testing

* Data Analysis

* Statistics

* Udacity

When you have single evaluation metric, you have to know what is the impact of your results to the business side. Analytical speaking, you want to find whether your results is significantly different. You then also want to know about the magnitude and direction of your changes. If your results is statisically significant, then you can interpret the results based on the how you characterize the metric and build intuition from it, just as we have discussed in previous blog. You also want to check the variability of the metric that you experiment. If your results is not statiscally significant when it really should, then you can do two things. You could subset your experiment by platform, time (day of the week) see what went wrong or different significant if subset by those features. It could lead you to new hypothesis test and understand how your participants reacts. If you just begin in your experiment, you should cross-check your parametric hypothesis and non-parametric hypothesis test.

4 years ago

Tags : * A/B Testing

* Data Analysis

* Statistics

* Udacity

In this blog series, we're going to talk about analzying our results, what we interpret from the results of the experiment, what we can and can't conclude. We will use invariant metric for sanity check, as we will be discuss in this blog. Evaluate in single metric and multiple metric, also gotchas in analysis.

4 years ago

Tags : * A/B Testing

* Data Analysis

* Statistics

* Udacity

What is the duration of the experiment? Is it long time? How much long before participants gives any feedback? This our finalize subject of the experiment. We will also be talking about exposure. How much users you want them to see your experimental features, will affect the duration of your experiment.

SIZE OF EXPERIMENT

4 years ago

Tags : * A/B Testing

* Data Analysis

* Statistics

* Udacity

There's many things to take into account when choosing which size for your experiment. Practical significance level, statistical significance level, sensitivity, metric, cohort, population will result in different variability. Variability and the duration of your metric. Suppose you want to run an experiment that will affect global user. Running experiment worldwide is time consuming since you observe a lot of users. What you want to do is take subset of population, doing cohort for example. Choosing this will give you much smaller size and different variability. But it will give you some intuituion whether your experiment is actually have an effect. Suppose you know that from video latency example in previous blog, what you're really want is people with 90th percentile, that is people with slower internet connection. And because you want to have immediate feedback, you cohort based on users that last activity seen in 2 month. This experiment could give you decision whether you want to continue for worldwide experiment.

* Older posts

Details

Image Url

HTML Url

Moderation By

More Annotations

James Smith

2020-04-28 21:46:29

James Smith

2020-04-28 21:46:48

James Smith

2020-04-28 21:47:09

James Smith

2020-04-28 21:47:24

James Smith

2020-04-28 21:47:32

James Smith

2020-04-28 21:48:13

James Smith

2020-04-28 21:49:32

James Smith

2020-04-28 21:49:54

James Smith

2020-04-28 21:50:15

James Smith

2020-04-28 21:50:27

James Smith

2020-04-28 21:50:34

James Smith

2020-04-28 21:51:42

Favourite Annotations

James Smith

2021-06-02 18:54:58

James Smith

2021-06-02 18:54:59

James Smith

2021-06-02 18:55:09

James Smith

2021-06-02 18:55:11

James Smith

2021-06-02 18:55:12

James Smith

2021-06-02 18:55:12

James Smith

2021-06-02 18:55:17

James Smith

2021-06-02 18:55:18

James Smith

2021-06-02 18:55:22

James Smith

2021-06-02 18:55:23

James Smith

2021-06-02 18:55:30

James Smith

2021-06-02 18:55:39

Text

around them a bit.

around them a bit.

DIAMONDS-ANALYSIS

to interpret

memory perhaps.

around them a bit.

memory perhaps.

around them a bit.

memory perhaps.

NAIVE BAYES

memory perhaps.

table, row,

to interpret

correlation.

around them a bit.

NAIVE BAYES

table, row,

memory perhaps.

around them a bit.

NAIVE BAYES

table, row,

memory perhaps.

NAIVE BAYES

table, row,

correlation.

around them a bit.

table, row,

to know what is

around them a bit.