* user = your-id * password = your-password * package = stata * project = lis ***Variable selection and data preparation*** local ccyy "" // enter here the country-year identifier, e.g. "us19" local hhvars "did hid hwgt nhhmem nhhmem17 dhi" use `hhvars' using $`ccyy'h, clear * select only records if dhi filled drop if dhi==. ***Bottom and top coding / outlier detection*** * create disposable household income in logs gen dhi_log=log(dhi) * keep negatives and 0 in the overall distribution of non-missing dhi replace dhi_log=0 if dhi_log==. & dhi!=. * detect interquartile range qui sum dhi_log [w=hwgt],de gen iqr=r(p75)-r(p25) * detect upper bound for extreme values gen upper_bound=r(p75) + (iqr * 3) gen lower_bound=r(p25) - (iqr * 3) * top code income at upper bound for extreme values replace dhi=exp(upper_bound) if dhi>exp(upper_bound) * bottom code income at lower bound for extreme values replace dhi=exp(lower_bound) if dhi=r(p50)*.5 & ey!=. quietly generate pov75=1 if ey=r(p50)*.75 & ey!=. quietly generate pov150=1 if ey=r(p50)*1.5 & ey!=. * store values for the percentage poor quietly sum pov50 [w=cwt] scalar define percpoor50=r(mean) quietly sum pov75 [w=cwt] scalar define percpoor75=r(mean) quietly sum pov150 [w=cwt] scalar define percpoor150=r(mean) ***Distribution of Children by Income Group*** display "Distribution of Children by Income Group (50-75%) = " percpoor75-percpoor50 display "Distribution of Children by Income Group (75-150%) = " percpoor150-percpoor75 display "Distribution of Children by Income Group (above 150%) = " 1-percpoor150