You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+54-1Lines changed: 54 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -680,7 +680,46 @@ A test of a statistical hypothesis, where the region of rejection is on both sid
680
680
**For example**, suppose the null hypothesis states that the mean is equal to 10. The alternative hypothesis would be that the mean is less than 10 or greater than 10. The region of rejection would consist of a range of numbers located on both sides of sampling distribution; that is, the region of rejection would consist partly of numbers that were less than 10 and partly of numbers that were greater than 10.
681
681
682
682
683
-
# 12. Testing the Data
683
+
# 12. Statistical Testing
684
+
685
+
Statistical Tests are intended to decide weather a hypothesis about distribution of one or more populations should be accepted or rejected.
686
+
687
+
Their are two type of statistical tests:
688
+
#### (1) Parametric Tests
689
+
#### (2) Non Parametric Tests
690
+
691
+
#### Why to use Statistical Testing?
692
+
* To calculate the difference in the sample and population means
693
+
* To find the difference in sample means
694
+
* To test the significance of association between two variables
695
+
* To calculate several population means
696
+
* To test the difference in proportions between two independent populations
697
+
* To test the difference in proporation between sample and population
698
+
699
+
#### What are parameters?
700
+
* Parameters are numbers which summarize the data for the entrire population, while statistics are numbers which summarize the data from a sample
701
+
* Parametric Testing is used for quanititve data and continuous variables
702
+
703
+
#### (1) Parametric Tests : A parametric test makes assumption regarding population parameters and distribution
704
+
##### (a) Z Testing
705
+
##### (b) Student T-Testing
706
+
##### (c) P Testing
707
+
##### (d) ANOVA Testing
708
+
709
+
#### (a) Z Testing:
710
+
The Z Test is used for testing significance difference between two point estimates
711
+
##### Assumptions for Z Test
712
+
* The sample must be randomly selected and data must be quantitative
713
+
* Sample should be larger
714
+
* Data should follow a normal distribution
715
+
716
+
#### (2) Non-Parametric Tests:
717
+
718
+
### A/B Testing:
719
+
720
+
721
+
722
+
684
723
685
724
##### Problem 1: Two-Tailed Test
686
725
@@ -748,6 +787,20 @@ Since we have a one-tailed test, the P-value is the probability that the z-score
748
787
Interpret results. Since the P-value (0.04) is less than the significance level (0.05), we cannot accept the null hypothesis.
749
788
Note: If you use this approach on an exam, you may also want to mention why this approach is appropriate. Specifically, the approach is appropriate because the sampling method was simple random sampling, the sample included at least 10 successes and 10 failures, and the population size was at least 10 times the sample size.
750
789
790
+
751
791
# 13. Data Clustering
752
792
793
+
#### Introduction to Data Clustering
794
+
Cluster is a group of objects that belongs to the same class. In other words, similar objects are grouped in one cluster and dissimilar objects are grouped in another cluster.
795
+
796
+
#### What is Clustering?
797
+
798
+
Clustering is the process of making a group of abstract objects into classes of similar objects.
799
+
800
+
#### Points to Remember
801
+
* A cluster of data objects can be treated as one group.
802
+
* While doing cluster analysis, we first partition the set of data into groups based on data similarity and then assign the labels to the groups.
803
+
* The main advantage of clustering over classification is that, it is adaptable to changes and helps single out useful features that distinguish different groups.
0 commit comments