# Data Science

## U-Shape Test using "Two Lines" - A Simple Solution for Discrete IV

Motivation Two-lines Approaches Issue with Discrete Dependent Variable Motivation In this paper by Uri Simonsohn (2017), the author proposed a noval method to test U-Shape relationship. In the literature, the popular way of testing U-shapeness relationship between x and y is to add a quadratic term in the regression (y=\beta_0+\beta_1 x + \beta_2 x^2 +\epsilon) ((\epsilon) is an i.i.d noise). If (\beta_1) is statistitally significant, then the relationship betewen x and y are U-shape.

## Fake News Consumption and Segregation on Twitter

To form accurate beliefs about the world (e.g., whether the earth is flat or a sphere, whether vaccination causes autism, etc), people must encounter diverse views and opinions which will sometimes contradict their pre-existing views. Many scholars concerned that the emergence of internet especially recent social media reduces the cost of acquiring information from a wide range of sources, facilitating consumers to self-segregate and limit themselves to the information sources that are likely to confirm their views.

## Anonymize Individuals using digest()

When requesting individual level data from others (a company or a government agency), we usually need to properly anomymize the individuals to protect their privacy. The following is an example: (Data = data.frame(Name = c("John Smith", "Jenny Ford","Vivian Lee"), Secret = c("Hate dog","Afraid of ghost","A bathroom dancer"))) ## Name Secret ## 1 John Smith Hate dog ## 2 Jenny Ford Afraid of ghost ## 3 Vivian Lee A bathroom dancer One simple way is we can just drop the Name, and only keep the Secret since we are more interested in their secrets.

## How much we can learn from Google search data

I just finished the book Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz, which is a highly rated book. The author devoted a great amount of text to the Google Trends data. My fun part of reading this book is that I could dig the results from the Google Trends website myself. Here is one example: in the book the author argues that Google search reveals that contemporary American parents are far more focused on their sonâ€™s intelligence than on their daughters.