


Note that this is not tested, and it may contain some typos. A list is a space-separated set of elements listed one after the other. patients with breast cancer and are available in the file polymorphism.dta (Stata file) DUP 09. The first is checked for being n integer between 99 (just like HHID), and the second part is checked for being an integer between 1 and 20. To achieve this, we can remove the var.equalTRUE. The final two commands similarly deal with CASEID, except that now there are two "parts" to deal with. The HHID is valid if the resulting number is an integer between 99. The second two commands deal with HHID by forcing Stata to interpret it as a number. This eliminates the difficulties created by varying numbers of blank spaces. The first two commands eliminate any leading or trailing blanks, and reduce any sequences of internal blanks to a single blank. & inrange(part2, 1, 20) & part2 = int(part2)The logic is this. xtset without argumentsxtsetdisplays how the data are currently xtset. The operators will be interpreted as lagged and lead values within panel. Gen byte CASEID_ok = inrange(part1, 0, 999999) & part1 = int(part1) /// When you specify timevar, you may then use Stata’s time-series operators such as L. Gen byte HHID_ok = inrange(HHID_check, 0, 999999) & HHID = int(HHID) In both cases, the first six spaces in left are blank.Īfterwards, when I merge them I will only require the first digits of CASEID to match with HHID, so that's the reason what I thougt I would be good to keep the six digits from each "key variable" but also change the format they were placed. The HHID varible presents less observations than CASEID because this one refers in detail to househols members. Let me give you more details, HHID is a household identification variable using six numeric digits in a string format of 15, while CASEID is a personal identification variable that uses in the first part six digits, then a blank and the last part uses one or two digits -these two spaces can be filed with a number between 0 and 20, all of them in string format of 18. Summarizing a little about what I wanted to do is the following: I checked a bunch of files in which I noticed that "key variables" like HHID or CASEID presented some mistyped observations, then I decided to eliminate them because they would probably cause problems when merging these files into a single one. Maybe I have to point out that I'm using Stata 13.
