Basic Science - Biostatistics - pvalue sampling

The flashcards below were created by user prem77 on FreezingBlue Flashcards.

  1. Difference between rate and proportion. Explain with examples. [TU 2067/2]

    What is rate, ratio and proportion?
    Ratios: ratios are one number expressed in relation to another by dividing the one number by the other. For example, the sex ratio of Delaware in 1990 was: 343,200 females to 322,968 males or 1.06.

    Proportions: proportions are special kinds of ratios where the denominator is the total while the numerator is a subpart of the total. This tells us what part the numerator is of the total. Thus, while the ratio of females to males in Delaware is 1.06, females represent .515 proportion of the total. Percentages are just a form of the proportion based on 100 people. To calculate a percentage we simply multiply a proportion by 100 (females are 51.5% of the total).

    Rates: rates are a special form of a ratio which represents the probability of a certain event. The numerator is the number of occurrences of an event during a time period, and the denominator is the number of persons exposed to that event in the time period. To be a true rate we must try to have only those at risk in the denominator. Sometimes this is difficult to do so we use approximations or use the total population. In the latter case we generally call it a crude rate.
  2. Explain different methods of epidemiological studies. [TU 2073]
    1. Observational study

    • a. Descriptive studies
    • b. Analytical studies:
    •  Ecological,
    •  Cross-sectional study
    •  Case-control study
    •  Cohort (longitudinal) study

    • 2. Experimental study/Intervention study
    •  Randomized controlled trials/Controlled Clinical trial: patients as unit of study
    •  Field trials  – with healthy people as unit of study
    •  Community trials
  3. Types of Scales in Statistics?
    A. Qualitative/Catagorical data - 

    - Nominal or categorical scale
    - A nominal scale puts people into boxes, without specifying the relationship between the boxes. Gender is a common example of a nominal scale with two groups, male and female. Anytime you can say,"It's either this or that:' you are dealing with a nominal scale. Other examples: cities, drug versus control group

    - Ordinal scale - Numbers can also be used to express ordinal or rank order relations. For example, we say Benis taller than Fred. Now we know more than just the category in which to place someone. We know something about the relationship between the categories (quality). What we do not know is how different the two categories are (quantity). Class rank in medical school and medals at the Olympics are examples of ordinal scales.

    B. Numerical/Quantitative data - 

    - Interval scale - 
    Uses a scale graded in equal increments. In the scale of length, we know that one inch is equal to any other inch. Interval scales allow us to say not only that two things are different, but by how much. If a measurement has a mean and a standard deviation, treat it as an interval scale. It is sometimes called a "numeric scale."

    - Ratio scale - The best measure is the ratio scale. This scale orders things and contains equal intervals, like the previous two scales. But it also has one additional quality: a true zero point. In a ratio scale, zero is a floor, you can't go any lower. Measuring temperature using the Kelvin scale yields ratio scale measurement. [@ Rati0 contains 0

    [@ NOIR, DisCo] 

    Image Upload 1
  4. Measure of variability or dispersion. [TU 2070/12]
    • - Range 
    • - Quartile deviation
    • - Mean deviation 
    • - Root mean square deviation
  5. What are different measures of disease frequencies in cross sectional, case control and cohort studies 2063/12
  6. What statistical tools are suitable to measure association among different variables? Give an example. 2056
    What statistic tools are suitable to measure association among different variables? Give examples 2054
  7. What do you mean by normal curve. Explain its properties and importance. [TU 2073,68/7]
    What is a a normal curve. Describe the features and applications [TU 2063,64/12]
    List the different characteristics of a normal curve. [TU 2073]
    • The standard distribution curve (Normal distribution) is a perfectly symmetrical, bell shaped curve such that the mean, median and mode, all have the same value and coincide at the center. 50% of data fall on right and 50% on left of the mean.
    • Normal curve is also called  as Gaussian Curve. 

    The characteristic features of standard normal distribution curve are

    • Bell shaped - The two tails never touch the x-axis because there is always some probability that more extreme values will occur
    • Continuous
    • Normal distributions are symmetric around their mean.
    • The mean, median, and mode of a normal distribution are equal.
    • The area under the normal curve is equal to 1.0.
    • Normal distributions are denser in the center and less dense in the tails.
    • Normal distributions are defined by two parameters, the mean (μ) and the standard deviation (σ).
    • 68.2%, 95.5% and 99.7% lie in 1,2 and 3 SD. 

    Image Upload 2

    • Uses  of normal curve -
    • 1. To calculate confidence interval in research. 
    • 2. To calculate probability in research
    • 3. To calculate statistical inference in research

    • Importance of normal distribution:
    • 1) It has one of the important properties called central theorem. Central theorem means relationship between shape of population distribution and shape of sampling distribution of mean. This means that sampling distribution of mean approaches normal as sample size increase.

    2) In case the sample size is large the normal distribution serves as good approximation.

    3) Due to its mathematical properties it is more popular and easy to calculate.

    4) It is used in statistical quality control in setting up of control limits.

    5) The whole theory of sample tests t, f and chi-square test is based on the normal distribution.
  8. What is p-value? [TU 2054,56,60,63,64,67,68,70,73,]
    The P value, or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H0) of a study question is true – the definition of ‘extreme’ depends on how the hypothesis is being tested. P is also described in terms of rejecting H0 when it is actually true, however, it is not a direct probability of this state.

    The null hypothesis is usually an hypothesis of "no difference" e.g. no difference between blood pressures in group A and group B. Define a null hypothesis for each study question clearly before the start of your study.

    If your P value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample gives reasonable evidence to support the alternative hypothesis. It does NOT imply a "meaningful" or "important" difference; that is for you to decide when considering the real-world relevance of your result.

    The choice of significance level at which you reject H0 is arbitrary. Conventionally the 5% (less than 1 in 20 chance of being wrong), 1% and 0.1% (P < 0.05, 0.01 and 0.001) levels have been used. These numbers can give a false sense of security.

    At this point, a word about error. Type I error is the false rejection of the null hypothesis and type II error is the false acceptance of the null hypothesis. As an aid memoir: think that our cynical society rejects before it accepts.
  9. What is the significance of p-value? Explain with suitable example. [TU 2056]
  10. How is p-value derived from different tests of significance. [TU 2067,68/2]
    How is p-value derived from different test of significance for differences of two means and proportions. [TU 2064,65/12]
  11. Discuss in short the significance of p-value in the analysis of clinical data with a suitable example. [TU 2060/4]
    How is p-value derived for different test of significance for difference of two means and proportions. [TU 2063/2]
  12. How it p-value applied in a research? [TU 2070,73/5]
    What is  the significance of p-value? Explain with suitable example. [TU 2054]
    One should keep in mind the difference between statistical significance and clinical significance. The  results of a study can be statistically significant but still  be too small to be of any practical value.

    We might encounter a situation where new treatment shows reduction in pain as statistically significant with a p value of 0.0001. The extremely low p value in this situation indicates that we are really sure that the results are not accidental-- the improvement is really due to the treatment and not just chance. However, it is important  to note the magnitude of improvement while interpreting it clinically.

    Usually, the extremes are easy to recognize and agree upon. If with the new treatment being investigated, patients get 90% pain relief, we can all agree that this is an effective, worthwhile treatment. But if reduction in pain is just 3%, such a paltry effect may not be of any importance.

    It is thus important to decide whether the observed treatment effect is large enough to make a difference.

    But what would constitute the smallest amount of improvement (say δ) that would be considered worthwhile. After all we want the new treatment to make a difference. This is tricky. The researchers doing the study should explicitly state what this minimal amount of clinically important improvement is.

    One can find the absolute amount of improvement in critical outcome measure, to decide if the improvement looked very large and can cross reference it with other sources to decide whether it met the Medically Important Clinical Difference (MICD). The null and alternative hypothesis could be appropriately framed to detect the Medically Important Clinical Difference as significant. It is therefore advised, not to just look at the p value but also try to examine if the results are robust enough to be clinically significant with desired improvements (MICD).
  13. What is confidence interval? Why confidence interval is prefered to P-value in statistical inference? [TU 2072]
    A range of values so constructed that there is specified probability of including the true value of a parameter within it. 

    • A confidence interval specifies how far above or below a sample-based value the
    • population value lies within a given range, from a possible high to a possible low.The
    • true mean, therefore, is most likely to be somewhere within the specified range
  14. Dr singh is conducting a trial to decide whether a particular type of pneumonia responds to injections of (a)long acting penicillin (b)crystalline penicillin. What will be the null hypothesis and its alternatives. [TU 2066/1]

    Test of significance [TU 2070/6]

    Steps of Hypothesis testing?
    Data are collected and analyzed by the appropriate statistical test.

    • a. p-Value: to interpret output from a statistical test, focus on the p-value. The term p-value refers to two things. In its first sense, the p-value is a standard against which we compare our results. In the second sense, the p-value is a result of computation.
    • i. The computed p-value is compared with the p-value criterion to test statistical significance. If the computed value is less than the criterion, we have achieved statistical significance. In general, the smaller the p the better.
    • ii. The p-value criterion is traditionally set at p ≤0.05. (Assume that these are the criteria if no other value is explicitly specified.) Using this standard
    • - If p ≤0.05, reject the null hypothesis (reached statistical significance)
    • . If P > 0.05, do not reject the null hypothesis (has not reached statistical
    • significance).

    b. Types of errors

    • c. Meaning of the p-value
    • i. Provides criterion for making decisions about the null hypothesis
    • ii. Quantifies the chances that a decision to reject the null hypothesis will be wrong
    • iii. Tells statistical significance, not clinical significance, nor likelihood of benefit. 

    • d. Limits to the p-value: the p-value does NOT tell us
    • i. The chance that an individual patient will benefit
    • ii. The percentage of patients who will benefit
    • iii. The degree of benefit expected for a given patient

    e. Statistical power

    F. Statistical Tests
  15. Short note on errors in statistics. [TU 2068,70/5]

    Type I error?
    • is the incorrect rejection of the null hypothesis
    • maximum probability is set in advance as alpha
    • is not affected by sample size as it is set in advance
    • increases with the number of tests or end points (i.e. do 20 rejections of H0 and 1 is likely to be wrongly significant for alpha = 0.05)
  16. Type II error?
    • is the incorrect acceptance of the null hypothesis
    • probability is beta
    • beta depends upon sample size and alpha
    • can't be estimated except as a function of the true population effect
    • beta gets smaller as the sample size gets larger
    • beta gets smaller as the number of tests or end points increases

    [@ @ Think that our society first rejects before it accepts, Type I - Incorrectly rejecting null hypothesis, Type II - Incorrectly accepting null hypothesis]
  17. Types of Scales and Basic Statistical Tests?
    • For 2 groups only, t-test, for 2 or more groups, one way ANOVA test.
    • Image Upload 3

    • Remember, default choices are:
    • Correlation for interval data
    • Chi-square for nominal data
    • t-test for a combination of nominal and interval data

    Fischer exact test - same as for Chi-square test, but if sample size of any parameter in <5.

    • Nominal data 
    • - Chi-square test 
    • - Fischer exact test 
    • - McNemar test 
    • - Odds ratio/Relative risk 

    • Numerical data 
    • - t-test 
    • - Pearson correlation coefficient (for parametric)
    • - Spearman's rho test (for non-parametric)
    • - Wilcoxon test (for non-parametric)
    • - Matched pair test (for parametric)
    • - Mann Whitney U test - Does not follow normal distrubution
  18. What is chi square test. Give suitable examples being used in surgery. [TU 2061/5]
    • Chi-square test offers an alternate method of testing the significance of difference between two proportions. 
    • Chi-square test is a non-parametric test. It follows a specific distribution known as Chi-square distribution.

    • The three essential requirements for Chi-square test are:
    • A random sample
    • Qualitative data
    • Lowest expected frequency not less than 5

    • The calculation of Chi-square value is as follows:
    • - Make the contingency tables
    • - Note the frequencies observed (O) in each class of one event, row-wise and the number in each group of the other event, column-wise.
    • - Determine the expected number (E) in each group of the sample or the cell of table on the assumption of null hypothesis.
    • - The hypothesis that there was no difference between the effect of the two frequencies, and then proceed to test the hypothesis in quantitative terms is called the Null hypothesis.
    • - Find the difference between the observed and the expected frequencies in each cell (O – E).
    • - Calculate the Chi-square values by the formula - Σ(O-E)2/E
    • - Sum up the Chi-square values of all the cells to get the total Chi-square value.
    • - Calculate the degrees of freedom which are related to the number of categories in both the events.
    • - The formula adopted in case of contingency table is Degrees of freedom (d.f.) = (c – 1 ) (r – 1) Where c is the number of columns and r is the number of rows. 

    • Applications of Chi-square
    • Chi-square test is most commonly used when data are in frequencies such as the number of responses in two or more categories.
    • Chi-square test is very useful in research. The important applications of Chi-square in medical statistics are :
    • - Test of proportion
    • - Test of association
    • - Test of goodness of fit
  19. What is Correlation?
    • Correlation is a statistical technique that can show whether and how strongly pairs of variables are related.
    • It shows the degree of relationship between two quantitative variables such as change in one is directly or inversely followed by change in the other. 

    For example, height and weight are related; taller people tend to be heavier than shorter people. The relationship isn't perfect. People of the same height vary in weight, and you can easily think of two people you know where the shorter one is heavier than the taller one. Nonetheless, the average weight of people 5'5'' is less than the average weight of people 5'6'', and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how much of the variation in peoples' weights is related to their heights.

    The main result of a correlation is called the correlation coefficient (or "r"). It ranges from -1.0 to +1.0. The closer r is to +1 or -1, the more closely the two variables are related.

    If r is close to 0, it means there is no relationship between the variables. If r is positive, it means that as one variable gets larger the other gets larger. If r is negative it means that as one gets larger, the other gets smaller. 

    • Types of coorelation
    • 1. Perfectly positive co-relation - If the data points make a straight line going from near the origin out to high y-values, then the variables are said to have a positive correlation. e.g height and weight
    • Image Upload 4

    2.  Perfectly negative corelation - If the data points start at high y-values on the y-axis down to low values, the variables have a negative correlation. e.g - age and vital capacity. 

    • 3. No coorelation- e.g height and pulse rate.
    • Image Upload 5
  20. What is Simple Linear Regression?
    Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables.
  21. Briefly describe the terms correlation and simple linear regression and explain their differences. [TU 2066/1]
    Difference between correlation and simple linear regresssion

    • Correlation quantifies the degree to which two variables are related. Correlation does not fit a line through the data.
    • With correlation you don't have to think about cause and effect. You simply quantify how well two variables relate to each other. With regression, you do have to think about cause and effect as the regression line is determined as the best way to predict Y from X.
    • With correlation, it doesn't matter which of the two variables you call "X" and which you call "Y". You'll get the same correlation coefficient if you swap the two. With linear regression, the decision of which variable you call "X" and which you call "Y" matters a lot, as you'll get a different best-fit line if you swap the two. The line that best predicts Y from X is not the same as the line that predicts X from Y (unless you have perfect data with no scatter.)
    • Correlation is almost always used when you measure both variables. It rarely is appropriate when one variable is something you experimentally manipulate. With linear regression, the X variable is usually something you experimentally manipulate (time, concentration...) and the Y variable is something you measure.
  22. Sample size. [TU 2070/12]

    Explain the difference between one tail and two tail in statistical test of significance. [TU 2072]
  23. What do you mean by sampling technique. Describe briefly two sampling technique. [TU 2054,56, 61,63,64,65/12,66/6,70/5,73/5]
    Sampling is the process of selecting a finite subset (sample) from population with objective of investigating its properties. 

    • Importance of sampling - 
    • - When properly selected, it gives idea of the whole population on the basis of small part taken out of it. 
    • - Sampling saves time, effort and money in collecting data, as well as processing them. 
    • - It is the only suitable method when population is very large and infinite. 

    Sampling are two types - Probability and non probability.
  24. What is probability sampling?
    Probability samples are samples in which the researcher can specify the probability of anyone element in the population being included.

    • There are four basic kinds of Probability samples.
    • - Simple Random samples
    • - Stratified Random samples
    • - Cluster samples
    • - Systemic samples
  25. What is simple random sampling?
    • o This is the most basic (simplest) kind of probability sample. 
    • o Elements are selected at random from the entire sampling frame without any stratification or segregation of the sampling frame into subgroups or strata with similar characteristics.
    • o Requires a complete list of items within the sampling frame.
    • o Suitable for small homogenous populations (large heterogenous populations will require stratification).
    • o Every element in the population has an equal probability of being included.
  26. What is stratified random sampling?
    A random sample of a population in which the population is first divided into distinct subpopulations or strata, and random samples are then taken separately from each stratum. An example of stratified sampling is population of a city is grouped in many groups according to their characters, and the sample is taken from each group
  27. What is cluster sampling?
    • In cluster sampling, target population is identified.
    • Target population is divided into naturally occurring subpopulations or clusters (e.g. regional population is divided into villages).
    • Next, random sample of clusters is selected (5 villages out of 50 villages are selected for the purpose of study) by simple random sampling or systemic random sampling. Individuals within the selected clusters may be further selected in total or by random sampling.
    • Stratification of population is done based on naturally occurring clusters. It is not done on the basis of one or more important characteristics.
  28. What is systematic random sampling?
    • o A common way of selecting members for a sample population using systematic sampling is simply to divide the total number of units in the general population by the desired number of units for the sample population. The result of the division serves as the marker for selecting nth sample [IOM 01] from within the general population.
    • o For example, if you wanted to select a random group of 1,000 people from a population of 50,000 using systematic sampling, you would simply select every 50th person, since 50,000/1,000 = 50.
  29. What is nonprobability sampling?
    Nonprobability sampling is any sampling method where some elements of the population have no chance of selection (these are sometimes referred to as 'out of coverage'/'undercovered'), or where the probability of selection can't be accurately determined.

    Convenience, haphazard or accidental sampling - members of the population are chosen based on their relative ease of access. To sample friends, co-workers, or shoppers at a single mall, are all examples of convenience sampling. Such samples are biased because researchers may unconsciously approach some kinds of respondents and avoid others, and respondents who volunteer for a study may differ in unknown but important ways from others.

    • Snowball sampling - The first respondent refers an acquaintance. The friend also refers a friend, and so on. Such samples are biased because they give people with more social connections an unknown but higher chance of selection, but lead to higher response rates.
    • Image Upload 6

    Judgmental sampling or purposive sampling - The researcher chooses the sample based on who they think would be appropriate for the study. This is used primarily when there is a limited number of people that have expertise in the area being researched, or when the interest of the research is on a specific field or a small group. 

    Quota Sampling
  30. Short note on sample size?
    • Representative sample has two main characteristics 
    • - Precision 
    • - Unbiased character 

    • Precision depends on sample sie 
    • - Sample sie should be large enough to represent the universe and display its characteristics very well. 
    • - Sample size should not be less than 30. 

    • For qualitative data 
    • N = Z2p (1-P)/d2, 
    • Z = Confidence interval [1.96 for 95% CI]
    • P = Prevalance = 10% 
    • d = Margin of error
  31. What are the applications of sampling techniques in clinical trial.   [TU 2054,56,61/5,73/5]

    Mention justification which sampling technique is the best for a clinical trial and justify. [TU 2068/5]
    Sampling is the process of selecting units (e.g., people, organizations) from a population of interest so that by studying the sample we may fairly generalize our results back to the population from which they were chosen.
  32. Short note on Surgical Audit. [TU 2065/5,64/5,63/12, 68/2]
    Clinical audit is a process used by clinicians who seek to improve patient care. The process involves comparing aspects of care (structure, process and outcome) against explicit criteria and defined standards. Keeping track of personal outcome data and contributing to a clinical database ensures that a surgeon’s own performance is monitored continuously and can be compared with a national dataset to ensure compliance with agreed standards. Clinical audit includes the following - 

    • Elements of Audit - 
    • - Measure 
    • - Compare 
    • - Evaluate 

    • Designed and conducted to produce information to inform the delivery of best care.
    • Designed to answer the question: ‘Does this service reach a predetermined standard?’
    • Measures against a standard. 
    • Involves an intervention in use only. (The choice of treatment is that of the clinician and patient according to guidance, professional standards and/or patient preference)
    • Usually involves analysis of existing data, but may include administration of simple interviews or questionnaires.
    • No allocation to interventions: the health professional and patient have chosen intervention before clinical audit. 

    • Audit cycle - 
    • - Determine scope 
    • - Select standards - Guidelines 
    • - Collect data 
    • - Present and interpret results and peer groups 
    • - Make change and monitor progress
Card Set:
Basic Science - Biostatistics - pvalue sampling
2017-06-20 03:35:11

Show Answers: