SAS Final

Card Set Information

SAS Final
2014-10-18 13:23:06
Programming SAS

For Judy
Show Answers:

  1. A SAS statement begins with a _____ and ends with a _______.
    • keyword
    • ;
  2. The three SAS programing windows are:
    • Log
    • Editor
    • output
  3. ls, ps, number are:
    statements or system options?
    system options
  4. What are the two types of SAS comments?
    Statement and group
  5. A comment statement starts with ____ and ends with ____
    A group statement starts with ____ and ends with ____
    • *      ;
    • /*   */
  6. a view table shows SAS data with rows corresponding to _____ and volumns corresponding to _______.
    • variables
    • observations
  7. T/F: A SAS program is a sequence of statements executed in order.
  8. T/F: A SAS statement must always start on a new line
  9. T/F: SAS keyword may be written in upper case only
  10. T/F: A sas library is where SAS statements are entered
  11. T/F: Related SAS statements are grouped together in a SAS step
  12. T/F: An options satement does no produce output directly
  13. T/F: SAS variable names may contian any printable character found on the PC keyboard
  14. T/F: A SAS program always ends with an ENDSAS statment
  15. T/F: Forgetting a semicolon (;) at the end of a SAS statement is a common mistake when writing SAS programs
  16. Two basic parts of a SAS program
    Data step & Procedure step
  17. Data Step does what? (5)
    • Inputs raw data
    • processes data observation-wise
    • creates new variables
    • performs calculations
    • creates a SAS data set
    • statements executed in order
  18. Procedure step does what?
    • Processes data
    • produces output or other results
    • all statments taken together to define task then excuted
  19. SAS data sets:
    • are stored in libraries
    • consist of variables and observations
  20. Variable name rules:
    • 32 characters
    • letters (up/lower case) or numbers
    • star with _ or letter
    • (when SAS makes variables usually have _)
  21. Variable types:
    • text
    •   Defult length = 8
    •   8 bytes = 64 bits
    • numberic
  22. Numeric Variables
    • length 64 bits (bits = powers of 2)
    • Numbers- stored in floating point, manitssa (dignifcant digits), exponents in powers of 2
    • Range: ~15 decimal digits, 10+/-300
    • Some can't be stored excatly: SQRT(2) or 1/3
  23. Special case numeric variables
    • Dates: days since 1/1/60
    • time: seconds from midnight
  24. formdlim='-'
    formdlim=' '
    • Row of **** instead of page break
    • page break
  25. leftmargin=nnunit
    how wide to make margins
  26. RUN; or QUIT;
    statement at the end of the SAS program to execute the lats statments in the file
  27. PROC Options group=listcontrol;
    print option names decrions and current values to the log window
  28. Data step processing: General process
    • execution phase:
    • lmplied DO loop - initialized iterations counter (_N_)
    • EOF?:
    • Yes - close file -advance to next procedure
    • No- Read a record- perform additional processing, output observation to data set, return to top of DATA step
  29. How SAS reads data
    Converts charters in input buffer to text and numeric variables
  30. Data processing: reading data
    • dsd: comma delimited file (just one)
    • firstobs=n: skip to line n before reading for data
    • Truncover: when less observations than variables saves them as a partial data field (not for numbers)
    • missover: when less obervations than variables makes them missing
    • lrecl=n: makes more columns
  31. Input statement forms
    • List: var1 var2 var3
    • formatted column: var1 $ coln1-coln2 var2
    • formatted informat: var1 format1 var2 format2
    • formatted informat: var1 var2 format1 fromat2
    • Dlimited: infile cards dsd OR List values
    • ** Don't mix forms
  32. @ & @@
    • @: input is not complete, keep reading for observation
    • @@: Completed obervation but keep reading from same line for next observation
  33. Informats and formats
    • all have a .
    • text: has $ can add length $10.
    • Numeric: length = 10. 10.2, COMMA10.2
    • Date: DDMMYY6.
  34. Label
    LABEL var1 = 'bla blah'
  35. Expressions
    • Define new variables
    • use current data
    • operators: * + - /
    • functions: LOG, SQRT, ROUND, etc.
    • missing data = missing results
    • error if math error
  36. Array
    • is an ordered group of similar items
    • used to control the order of variables
  37. Do:
    provide grouping of statements for repeated execution
  38. IF (comparison expression) THEN (statement) ELSE (statement);
  39. DATA AllYears;
    SET Year1 Year2;
    • combines observations from year1 with year2 to make AllYears
    • can add BY statement if they are sorted by that variable
  40. DATA AllData;
    MERGE Height Yield;
    reads height and yield observations in parallel
  41. Data AllData;
    Merge Height Yield;
    By Year Variety Plot;
    • Usually has a by statement
    • input data sets must be sorted in BY variable order
    • The BY variables much match to combine
    • can combine: one-to-on, one-to-many, or many-to-one
  42. Put variable=
    • good way to find errors
    • puts variable on output
  43. CLASS variables
    identify groups, categories, or classes of data (treatments) which are considered together, as parts of a whole to compute statistics
  44. VAR variables
    values are continuous numeric values usually dependent or response variables
  45. CONTENTS procedure
    describes features of variables in a SAS data set or objects in a class library
  46. FORMAT procedure
    • defines informates for reading variables or formats for writing variables
    • place it before any data or proc step that uses the informats or formats.
  47. Where
    • selects input observations from an existing data set
    • WHERE conditional expression
  48. ID statement
    identifies observations on some output reports
  49. FREQ
    each observation represents many
  50. SORT procedure
    • sorts sorts sas data sets
    • changes the order of the observations in a data set
    • missing values sort low
  51. PRINT procedure
    • prints a sas data set
    • can have var, by, pageby, ID, format statments
  52. MEANS procedure
    compute simple statiscts to output window (default) and or a sas data set
  53. PLOT procedure
    prints a charger scattergram to the output window
  54. GOPTIONS statment
    • set global options for graph output control
    • options may be (reset = all|goptions, device = win, colors = list/black, hsize,vsize = nn(in/cells/cm/pt), htext nn(unit), ftext = font-name
  55. AXIS statemtent
    • global statments for graphic
    • defines axis parameters
    • Options: put order first, ORDER = variable list, LABEL = none/angle= degrees/color= x/font = x
  56. Title statement
    add additional features for graphics output
  57. Symbol statement
    • global statment for graphics
    • defines symbol parameters
    • options: values= symbol-name, interpol = none/join/force through origin/etc, line= line number
  58. GPLOT procedure
    • produce scatter plot to graph out put window
    • options: overlay, RegEqn, haxis = axisn | values-list
    • RUN; (must have)
  59. UNIVARIATE procedure
    • output sample distribution statistics
    • class
    • var
    • histogram(opt: normal, kernal)
    • CDFPLOT(opt: normal)
    • probplot(opt: normal, sigma, line)
    • QQPLOT(opt: normal, sigma, line)
  60. Standard procdure
    • Reassign variable values according to a specified mean and/or standard deviation
    • Options: out = output data set, mean = n, std = n
    • VAR variable-list
  61. Rank procedure
    • assign rank values to a variable
    • options: out = out data set, descending, ties = high|low|mean
    • VAR
    • RANKS
    • missing data is not ranked
  62. Ttest Procedure
    • compares the means of 2 groups
    • compares the differenced between two means (or one mean) to 0 or a specified value
    • Class
    • Var
  63. Analysis of variance - General Concept
    • Isolate sums of squares, variance due to treatments (class variable values) from the total sums of squares of dependent (response) variable values
    • Leaves pooled error
    • requires homogenous variance components in the pooled error variation (MS)
  64. GLM procedure
    • generalized linear models
    • solves linear models
    • models may contain class and or continous numeric variables
    • GLM can be uesd for ANOVA, regression and covariance anaylsis
    • if model only has class variables = ANOVA
    • if model has no class variables = regression
    • model may contain products of numeric variables or interaction of class variables
  65. PROC GLM
    • Class
    • model
    • contrast
    • random
    • test
    • means
    • LSmeans
    • output
  66. Corr Porcedure
    • output pearson correlations
    • includes other simple statistics
    • options: nomiss, nosimple
    • Var
    • with variable-list
  67. Regression- general concept
    • determine the relationship between multiple variables
    • how much of the total variation in the dependent variable is explained by the variation in the independent variables
    • independent variables must be continuous numeric
    • choose the line with the smallest sums of squares
  68. Assumptions with regression
    • X values were measured without error
    • errors are indpendent
    • errors are normally distributed with a mean of 0 and variance of sigma2
  69. PROC REG
    • model dependents = independents
    • options: noint, ss1, ss2, alpha=p, clb, p .clm. r
    • for multiple reg: backward, forward, stepwise, rsquare
    • output
    • plot
  70. Mixed Procedure
    • solves linear models with fixed and random effects
    • can solve ANOVA, regression, covariance, etc.
    • computes the expected means squares and used them to correctly compute all tests: f-test, least squares means separation etc.
    • correctly accommodated unbalanced data as long as it isn't too bad
    • computes only least squares means not arithmetic means