How to split a sample according to a certain variable in Stata?

I'd like to split a sample according to a specific variable, creating 4 sub-samples each one related to a quartile of the variable's distribution. The aim is to demonstrate that the presence of different levels of this variable influences the outcome of a regression, making it significant or not.

36.9k 6 6 gold badges 34 34 silver badges 49 49 bronze badges asked Sep 25, 2012 at 16:20 51 1 1 gold badge 1 1 silver badge 3 3 bronze badges

2 Answers 2

The easiest way to do this is to use the egen command to cut your variable into four equally-spaced intervals.

. sysuse auto, clear (1978 Automobile Data) . sum price, detail Price ------------------------------------------------------------- Percentiles Smallest 1% 3291 3291 5% 3748 3299 10% 3895 3667 Obs 74 25% 4195 3748 Sum of Wgt. 74 50% 5006.5 Mean 6165.257 Largest Std. Dev. 2949.496 75% 6342 13466 90% 11385 13594 Variance 8699526 95% 13466 14500 Skewness 1.653434 99% 15906 15906 Kurtosis 4.819188 . egen price_cut = cut(price), group(4) . table price_cut, contents(n price min price max price) ---------------------------------------------- price_cut | N(price) min(price) max(price) ----------+----------------------------------- 0 | 18 3,291 4,187 1 | 19 4,195 4,934 2 | 18 5,079 6,303 3 | 19 6,342 15,906 ----------------------------------------------