Data Science

U-Shape Test using "Two Lines" - A Simple Solution for Discrete IV

Motivation Two-lines Approaches Issue with Discrete Dependent Variable Motivation In this paper by Uri Simonsohn (2017), the author proposed a noval method to test U-Shape relationship. In the literature, the popular way of testing U-shapeness relationship between x and y is to add a quadratic term in the regression (y=\beta_0+\beta_1 x + \beta_2 x^2 +\epsilon) ((\epsilon) is an i.i.d noise). If (\beta_1) is statistitally significant, then the relationship betewen x and y are U-shape.

Fake News Consumption and Segregation on Twitter

To form accurate beliefs about the world (e.g., whether the earth is flat or a sphere, whether vaccination causes autism, etc), people must encounter diverse views and opinions which will sometimes contradict their pre-existing views. Many scholars concerned that the emergence of internet especially recent social media reduces the cost of acquiring information from a wide range of sources, facilitating consumers to self-segregate and limit themselves to the information sources that are likely to confirm their views.

Anonymize Individuals using digest()

When requesting individual level data from others (a company or a government agency), we usually need to properly anomymize the individuals to protect their privacy. The following is an example: (Data = data.frame(Name = c("John Smith", "Jenny Ford","Vivian Lee"), Secret = c("Hate dog","Afraid of ghost","A bathroom dancer"))) ## Name Secret ## 1 John Smith Hate dog ## 2 Jenny Ford Afraid of ghost ## 3 Vivian Lee A bathroom dancer One simple way is we can just drop the Name, and only keep the Secret since we are more interested in their secrets.

How much we can learn from Google search data

I just finished the book Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz, which is a highly rated book. The author devoted a great amount of text to the Google Trends data. My fun part of reading this book is that I could dig the results from the Google Trends website myself. Here is one example: in the book the author argues that Google search reveals that contemporary American parents are far more focused on their son’s intelligence than on their daughters.