# Statistics

## U-Shape Test using "Two Lines" - A Simple Solution for Discrete IV

Motivation Two-lines Approaches Issue with Discrete Dependent Variable Motivation In this paper by Uri Simonsohn (2017), the author proposed a noval method to test U-Shape relationship. In the literature, the popular way of testing U-shapeness relationship between x and y is to add a quadratic term in the regression (y=\beta_0+\beta_1 x + \beta_2 x^2 +\epsilon) ((\epsilon) is an i.i.d noise). If (\beta_1) is statistitally significant, then the relationship betewen x and y are U-shape.

## Personalized Data Summary Function Using "data.table"

One function I miss about Stata is its tabstat. By using just one line code, it can produce very useful summary statistics such as mean, and standard error by groups by conditions. R has its own built-in summary function – summary(), too, but in most cases in my research, I found the summaries produced is barely useful. Consider the following pseudo-data: library(data.table) set.seed(10) N = 120 DT = data.table(x = rnorm(N,1), y = rnorm(N,2), category = sample(letters[1:3], N, replace = T)) DT[1:10] ## x y category ## 1: 1.

## Is it the end of the world if $\alpha=0.005$ is the new norm?

In this paper by Benjamin et al (2017) on redefining statistical significance, they proposed to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries. That is the proposed p-value is one tenth of the conventional one!! Suppose the world changed to p=0.005. Do we need 10X more sample? As a researcher without sufficient funding, we care about how much additional sample we need suppose our hypothesis is true.