R

Personalized Data Summary Function Using "data.table"

One function I miss about Stata is its tabstat. By using just one line code, it can produce very useful summary statistics such as mean, and standard error by groups by conditions. R has its own built-in summary function – summary(), too, but in most cases in my research, I found the summaries produced is barely useful. Consider the following pseudo-data: library(data.table) set.seed(10) N = 120 DT = data.table(x = rnorm(N,1), y = rnorm(N,2), category = sample(letters[1:3], N, replace = T)) DT[1:10] ## x y category ## 1: 1.

Is it the end of the world if $\alpha=0.005$ is the new norm?

In this paper by Benjamin et al (2017) on redefining statistical significance, they proposed to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries. That is the proposed p-value is one tenth of the conventional one!! Suppose the world changed to p=0.005. Do we need 10X more sample? As a researcher without sufficient funding, we care about how much additional sample we need suppose our hypothesis is true.

Anonymize Individuals using digest()

When requesting individual level data from others (a company or a government agency), we usually need to properly anomymize the individuals to protect their privacy. The following is an example: (Data = data.frame(Name = c("John Smith", "Jenny Ford","Vivian Lee"), Secret = c("Hate dog","Afraid of ghost","A bathroom dancer"))) ## Name Secret ## 1 John Smith Hate dog ## 2 Jenny Ford Afraid of ghost ## 3 Vivian Lee A bathroom dancer One simple way is we can just drop the Name, and only keep the Secret since we are more interested in their secrets.