SAS Final
Card Set Information
Author:
MRK
ID:
285650
Filename:
SAS Final
Updated:
2014-10-18 13:23:06
Tags:
Programming SAS
Folders:
Description:
For Judy
Show Answers:
A SAS statement begins with a _____ and ends with a _______.
keyword
;
The three SAS programing windows are:
Log
Editor
output
ls, ps, number are:
statements or system options?
system options
What are the two types of SAS comments?
Statement and group
A comment statement starts with ____ and ends with ____
A group statement starts with ____ and ends with ____
* ;
/* */
a view table shows SAS data with rows corresponding to _____ and volumns corresponding to _______.
variables
observations
T/F: A SAS program is a sequence of statements executed in order.
True
T/F: A SAS statement must always start on a new line
False
T/F: SAS keyword may be written in upper case only
False
T/F: A sas library is where SAS statements are entered
False
T/F: Related SAS statements are grouped together in a SAS step
True
T/F: An options satement does no produce output directly
True
T/F: SAS variable names may contian any printable character found on the PC keyboard
False
T/F: A SAS program always ends with an ENDSAS statment
False
T/F: Forgetting a semicolon (;) at the end of a SAS statement is a common mistake when writing SAS programs
T
Two basic parts of a SAS program
Data step & Procedure step
Data Step does what? (5)
Inputs raw data
processes data observation-wise
creates new variables
performs calculations
creates a SAS data set
statements executed in order
Procedure step does what?
Processes data
produces output or other results
all statments taken together to define task then excuted
SAS data sets:
are stored in libraries
consist of variables and observations
Variable name rules:
32 characters
letters (up/lower case) or numbers
star with _ or letter
(when SAS makes variables usually have _)
Variable types:
text
Defult length = 8
8 bytes = 64 bits
numberic
Numeric Variables
length 64 bits (bits = powers of 2)
Numbers- stored in floating point, manitssa (dignifcant digits), exponents in powers of 2
Range
: ~15 decimal digits, 10
^{+/-300}
Some can't be stored excatly
: SQRT(2) or 1/3
Special case numeric variables
Dates
: days since 1/1/60
time
: seconds from midnight
formdlim='-'
formdlim=' '
Row of **** instead of page break
page break
leftmargin=nnunit
how wide to make margins
RUN; or QUIT;
statement at the end of the SAS program to execute the lats statments in the file
PROC Options group=listcontrol;
run;
print option names decrions and current values to the log window
Data step processing: General process
execution phase:
lmplied DO loop - initialized iterations counter (_N_)
EOF?:
Yes - close file -advance to next procedure
No- Read a record- perform additional processing, output observation to data set, return to top of DATA step
How SAS reads data
Converts charters in input buffer to text and numeric variables
Data processing: reading data
Options:
dsd
firstobs=n
truncover
missover
lrecl=n
dsd
: comma delimited file (just one)
firstobs=n
: skip to line n before reading for data
Truncover
: when less observations than variables saves them as a partial data field (not for numbers)
missover
: when less obervations than variables makes them missing
lrecl=n
: makes more columns
Input statement forms
List
: var1 var2 var3
formatted column
: var1 $ coln1-coln2 var2
formatted informat
: var1 format1 var2 format2
formatted informat
: var1 var2 format1 fromat2
Dlimited
: infile cards dsd OR List values
** Don't mix forms
@ & @@
@
: input is not complete, keep reading for observation
@@
: Completed obervation but keep reading from same line for next observation
Informats and formats
all have a .
text
: has $ can add length $10.
Numeric
: length = 10. 10.2, COMMA10.2
Date
: DDMMYY6.
Label
LABEL var1 = 'bla blah'
Expressions
Define new variables
use current data
operators
: * + - /
functions
: LOG, SQRT, ROUND, etc.
missing data = missing results
error if math error
Array
is an ordered group of similar items
used to control the order of variables
Do:
...
End;
provide grouping of statements for repeated execution
IF (comparison expression) THEN (statement) ELSE (statement);
DATA AllYears;
SET Year1 Year2;
combines observations from year1 with year2 to make AllYears
can add BY statement if they are sorted by that variable
DATA AllData;
MERGE Height Yield;
reads height and yield observations in parallel
Data AllData;
Merge Height Yield;
By Year Variety Plot;
Usually has a by statement
input data sets must be sorted in BY variable order
The BY variables much match to combine
can combine
: one-to-on, one-to-many, or many-to-one
Put variable=
good way to find errors
puts variable on output
CLASS variables
identify groups, categories, or classes of data (treatments) which are considered together, as parts of a whole to compute statistics
VAR variables
values are continuous numeric values usually dependent or response variables
CONTENTS procedure
describes features of variables in a SAS data set or objects in a class library
FORMAT procedure
defines informates for reading variables or formats for writing variables
place it before any data or proc step that uses the informats or formats.
Where
selects input observations from an existing data set
WHERE conditional expression
ID statement
identifies observations on some output reports
FREQ
each observation represents many
SORT procedure
sorts sorts sas data sets
changes the order of the observations in a data set
missing values sort low
PRINT procedure
prints a sas data set
can have var, by, pageby, ID, format statments
MEANS procedure
compute simple statiscts to output window (default) and or a sas data set
PLOT procedure
prints a charger scattergram to the output window
GOPTIONS statment
set global options for graph output control
options may be (reset = all|goptions, device = win, colors = list/black, hsize,vsize = nn(in/cells/cm/pt), htext nn(unit), ftext = font-name
AXIS statemtent
global statments for graphic
defines axis parameters
Options
: put order first, ORDER = variable list, LABEL = none/angle= degrees/color= x/font = x
Title statement
add additional features for graphics output
Symbol statement
global statment for graphics
defines symbol parameters
options
: values= symbol-name, interpol = none/join/force through origin/etc, line= line number
GPLOT procedure
produce scatter plot to graph out put window
options
: overlay, RegEqn, haxis = axisn | values-list
RUN; (must have)
UNIVARIATE procedure
output sample distribution statistics
class
var
histogram(opt
: normal, kernal)
CDFPLOT(opt
: normal)
probplot(opt
: normal, sigma, line)
QQPLOT(opt
: normal, sigma, line)
Standard procdure
Reassign variable values according to a specified mean and/or standard deviation
Options
: out = output data set, mean = n, std = n
VAR variable-list
Rank procedure
assign rank values to a variable
options
: out = out data set, descending, ties = high|low|mean
VAR
RANKS
missing data is not ranked
Ttest Procedure
compares the means of 2 groups
compares the differenced between two means (or one mean) to 0 or a specified value
Class
Var
Analysis of variance - General Concept
Isolate sums of squares, variance due to treatments (class variable values) from the total sums of squares of dependent (response) variable values
Leaves pooled error
requires homogenous variance components in the pooled error variation (MS)
GLM procedure
generalized linear models
solves linear models
models may contain class and or continous numeric variables
GLM can be uesd for ANOVA, regression and covariance anaylsis
if model only has class variables = ANOVA
if model has no class variables = regression
model may contain products of numeric variables or interaction of class variables
PROC GLM
Class
model
contrast
random
test
means
LSmeans
output
Corr Porcedure
output pearson correlations
includes other simple statistics
options
: nomiss, nosimple
Var
with variable-list
Regression- general concept
determine the relationship between multiple variables
how much of the total variation in the dependent variable is explained by the variation in the independent variables
independent variables must be continuous numeric
choose the line with the smallest sums of squares
Assumptions with regression
X values were measured without error
errors are indpendent
errors are normally distributed with a mean of 0 and variance of sigma
^{2}
PROC REG
model dependents = independents
options
: noint, ss1, ss2, alpha=p, clb, p .clm. r
for multiple reg
: backward, forward, stepwise, rsquare
output
plot
Mixed Procedure
solves linear models with fixed and random effects
can solve ANOVA, regression, covariance, etc.
computes the expected means squares and used them to correctly compute all tests
: f-test, least squares means separation etc.
correctly accommodated unbalanced data as long as it isn't too bad
computes only least squares means not arithmetic means