SAS Final
Home > Flashcards > Print Preview
The flashcards below were created by user
MRK
on
FreezingBlue Flashcards. What would you like to do?

A SAS statement begins with a _____ and ends with a _______.

The three SAS programing windows are:

ls, ps, number are:
statements or system options?
system options

What are the two types of SAS comments?
Statement and group

A comment statement starts with ____ and ends with ____
A group statement starts with ____ and ends with ____

a view table shows SAS data with rows corresponding to _____ and volumns corresponding to _______.

T/F: A SAS program is a sequence of statements executed in order.
True

T/F: A SAS statement must always start on a new line
False

T/F: SAS keyword may be written in upper case only
False

T/F: A sas library is where SAS statements are entered
False

T/F: Related SAS statements are grouped together in a SAS step
True

T/F: An options satement does no produce output directly
True

T/F: SAS variable names may contian any printable character found on the PC keyboard
False

T/F: A SAS program always ends with an ENDSAS statment
False

T/F: Forgetting a semicolon (;) at the end of a SAS statement is a common mistake when writing SAS programs
T

Two basic parts of a SAS program
Data step & Procedure step

Data Step does what? (5)
 Inputs raw data
 processes data observationwise
 creates new variables
 performs calculations
 creates a SAS data set
 statements executed in order

Procedure step does what?
 Processes data
 produces output or other results
 all statments taken together to define task then excuted

SAS data sets:
 are stored in libraries
 consist of variables and observations

Variable name rules:
 32 characters
 letters (up/lower case) or numbers
 star with _ or letter
 (when SAS makes variables usually have _)

Variable types:
 text
 Defult length = 8
 8 bytes = 64 bits
 numberic

Numeric Variables
 length 64 bits (bits = powers of 2)
 Numbers stored in floating point, manitssa (dignifcant digits), exponents in powers of 2
 Range: ~15 decimal digits, 10^{+/300}
 Some can't be stored excatly: SQRT(2) or 1/3

Special case numeric variables
 Dates: days since 1/1/60
 time: seconds from midnight

formdlim=''
formdlim=' '
 Row of **** instead of page break
 page break

leftmargin=nnunit
how wide to make margins

RUN; or QUIT;
statement at the end of the SAS program to execute the lats statments in the file

PROC Options group=listcontrol;
run;
print option names decrions and current values to the log window

Data step processing: General process
 execution phase:
 lmplied DO loop  initialized iterations counter (_N_)
 EOF?:
 Yes  close file advance to next procedure
 No Read a record perform additional processing, output observation to data set, return to top of DATA step

How SAS reads data
Converts charters in input buffer to text and numeric variables

Data processing: reading data
Options:
dsd
firstobs=n
truncover
missover
lrecl=n
 dsd: comma delimited file (just one)
 firstobs=n: skip to line n before reading for data
 Truncover: when less observations than variables saves them as a partial data field (not for numbers)
 missover: when less obervations than variables makes them missing
 lrecl=n: makes more columns

Input statement forms
 List: var1 var2 var3
 formatted column: var1 $ coln1coln2 var2
 formatted informat: var1 format1 var2 format2
 formatted informat: var1 var2 format1 fromat2
 Dlimited: infile cards dsd OR List values
 ** Don't mix forms

@ & @@
 @: input is not complete, keep reading for observation
 @@: Completed obervation but keep reading from same line for next observation

Informats and formats
 all have a .
 text: has $ can add length $10.
 Numeric: length = 10. 10.2, COMMA10.2
 Date: DDMMYY6.

Label
LABEL var1 = 'bla blah'

Expressions
 Define new variables
 use current data
 operators: * +  /
 functions: LOG, SQRT, ROUND, etc.
 missing data = missing results
 error if math error

Array
 is an ordered group of similar items
 used to control the order of variables

Do:
...
End;
provide grouping of statements for repeated execution

IF (comparison expression) THEN (statement) ELSE (statement);

DATA AllYears;
SET Year1 Year2;
 combines observations from year1 with year2 to make AllYears
 can add BY statement if they are sorted by that variable

DATA AllData;
MERGE Height Yield;
reads height and yield observations in parallel

Data AllData;
Merge Height Yield;
By Year Variety Plot;
 Usually has a by statement
 input data sets must be sorted in BY variable order
 The BY variables much match to combine
 can combine: onetoon, onetomany, or manytoone

Put variable=
 good way to find errors
 puts variable on output

CLASS variables
identify groups, categories, or classes of data (treatments) which are considered together, as parts of a whole to compute statistics

VAR variables
values are continuous numeric values usually dependent or response variables

CONTENTS procedure
describes features of variables in a SAS data set or objects in a class library

FORMAT procedure
 defines informates for reading variables or formats for writing variables
 place it before any data or proc step that uses the informats or formats.

Where
 selects input observations from an existing data set
 WHERE conditional expression

ID statement
identifies observations on some output reports

FREQ
each observation represents many

SORT procedure
 sorts sorts sas data sets
 changes the order of the observations in a data set
 missing values sort low

PRINT procedure
 prints a sas data set
 can have var, by, pageby, ID, format statments

MEANS procedure
compute simple statiscts to output window (default) and or a sas data set

PLOT procedure
prints a charger scattergram to the output window

GOPTIONS statment
 set global options for graph output control
 options may be (reset = allgoptions, device = win, colors = list/black, hsize,vsize = nn(in/cells/cm/pt), htext nn(unit), ftext = fontname

AXIS statemtent
 global statments for graphic
 defines axis parameters
 Options: put order first, ORDER = variable list, LABEL = none/angle= degrees/color= x/font = x

Title statement
add additional features for graphics output

Symbol statement
 global statment for graphics
 defines symbol parameters
 options: values= symbolname, interpol = none/join/force through origin/etc, line= line number

GPLOT procedure
 produce scatter plot to graph out put window
 options: overlay, RegEqn, haxis = axisn  valueslist
 RUN; (must have)

UNIVARIATE procedure
 output sample distribution statistics
 class
 var
 histogram(opt: normal, kernal)
 CDFPLOT(opt: normal)
 probplot(opt: normal, sigma, line)
 QQPLOT(opt: normal, sigma, line)

Standard procdure
 Reassign variable values according to a specified mean and/or standard deviation
 Options: out = output data set, mean = n, std = n
 VAR variablelist

Rank procedure
 assign rank values to a variable
 options: out = out data set, descending, ties = highlowmean
 VAR
 RANKS
 missing data is not ranked

Ttest Procedure
 compares the means of 2 groups
 compares the differenced between two means (or one mean) to 0 or a specified value
 Class
 Var

Analysis of variance  General Concept
 Isolate sums of squares, variance due to treatments (class variable values) from the total sums of squares of dependent (response) variable values
 Leaves pooled error
 requires homogenous variance components in the pooled error variation (MS)

GLM procedure
 generalized linear models
 solves linear models
 models may contain class and or continous numeric variables
 GLM can be uesd for ANOVA, regression and covariance anaylsis
 if model only has class variables = ANOVA
 if model has no class variables = regression
 model may contain products of numeric variables or interaction of class variables

PROC GLM
 Class
 model
 contrast
 random
 test
 means
 LSmeans
 output

Corr Porcedure
 output pearson correlations
 includes other simple statistics
 options: nomiss, nosimple
 Var
 with variablelist

Regression general concept
 determine the relationship between multiple variables
 how much of the total variation in the dependent variable is explained by the variation in the independent variables
 independent variables must be continuous numeric
 choose the line with the smallest sums of squares

Assumptions with regression
 X values were measured without error
 errors are indpendent
 errors are normally distributed with a mean of 0 and variance of sigma^{2}

PROC REG
 model dependents = independents
 options: noint, ss1, ss2, alpha=p, clb, p .clm. r
 for multiple reg: backward, forward, stepwise, rsquare
 output
 plot

Mixed Procedure
 solves linear models with fixed and random effects
 can solve ANOVA, regression, covariance, etc.
 computes the expected means squares and used them to correctly compute all tests: ftest, least squares means separation etc.
 correctly accommodated unbalanced data as long as it isn't too bad
 computes only least squares means not arithmetic means