The flashcards below were created by user
on FreezingBlue Flashcards.
What are the bins?
What is their purpose when constructing a histogram?
Bins are intervals into which the values of a continuous variable are placed. There are usually too many potential values of a continuous variable to plot each individual because most values would have a frequency of 0, so the resulting graph would not be very meaningful.
Why do bar charts but not histograms have gaps between the bars?
The gaps between the bars on a bar chart reflect the non-numeric lack of continuity. The bars on a histogram are joined to indicate the underlying continuity.
What is the distinction between nominal and ordinal data?
- Nominal data are unordered categories, such as the numerals of football player jerseys.
- Ordinal data are ordered categories. Still not numeric, but ”Severe Injury” indicates a greater amount of injury than ”Minor Injury”.
What is a csv file?
What are its properties, its primary advantage and its primary disadvantage?
A csv file is a standard text file, that is, just text characters, the letters of the alphabet, digits, punctuation and a few control codes such as line return. It is a specific type of text file in which adjacent values on the same line are separated by comas.
Identify and describe two properties of histogram bins that lead to arbitrariness in the histogram display.
Bin width: vary the width of the bins and the shape of the histogram changes Bin shift: vary the starting point of the first bin and the shape of the histogram changes
What is histogram undersmoothing and how does it relate to sample size?
Describe a histogram that is undersmoothed?
How can the problem be fixed?
Undersmoothing occurs when the bin width is too narrow for the given sample size. The result is a histogram that has lots of jaggedness with too many ups and downs that reflect sampling error more than the actual shape of the distribution. Fix the problem by increasing bin width.
What is histogram oversmoothing and how does it relate to sample size? Describe a histogram that is oversmoothed? How can the problem be fixed?
Oversmoothing occurs when the bin width is too wide for the given sample size. The result is a histogram that does not display as much detail as is available. Too much information is discarded. Fix the problem by decreasing bin width.
Provide three examples of how marketing research helps marketing personnel make sound managerial decisions.
Marketing research assess what your customers want and need before, during, and after the launch of a product or service. Also, marketing research is is objective, which derives from the instruments like questionnaires used to collect data. And, marketing research is systematic in that the marketing research process has a clearly defined set of steps leading to an agreed upon outcome.
What improvements in marketing planning can be attributed to the results obtained from consumer satisfaction studies?
Customer satisfaction is at the core of brand loyalty and repeat purchase behavior. The results obtained from customer satisfaction studies can improve strategic planning since they tap into consumer attitudes, such as toward switching brands, the image of the company marketing the product/service, purchase intention and repeat buying (loyalty). In this way a company can consider dropping ineffective products, repositioning brands and the company’s image, exploring new segments, and/or developing new products.
Discuss the importance of target market analysis.
How does it affect the development of market planning for a particular company?
It is difficult for a company to produce a product or service that appeals to everybody. Defining exactly who one wishes to serve, and configuring a research strategy to connect with a well-defined group of buyers takes us to the doorstep of the saying: “You can please some of the people some of the time, but you can’t please all the people all of the time.” Target market analysis gets at answers to key questions like: “What new products can we introduce to our buyers?”, “What are our buyers like, what are their usage patterns for our products and services?”, “Is our current marketing program reaching them where they live and meeting their needs?”
What are the advantages and disadvantages for companies maintaining an internal mar- keting research department?
When marketing research is done internally the upside is that there is normally a reduc- tion in cost because expenses can be spread across functional areas within the corporate structure. The downside is that internal marketing research may lack the third-party ob- jectivity of an outside vendor, and sometimes, the expertise the host company may lack. The downside is that external firms are independent, typically involved with a portfolio of projects. When a company contracts with an external vendor it’s important to set a sched- ule of deliverables and establish a price for work performed since sourcing data collection, tabulation, and analysis is more expensive.
Identify the three major groups of people involved in the marketing research process, and then give an example of an unethical behavior sometimes practiced by each group.
The three major groups of people involved in the marketing research process are the research information providers, the research information users, and the respondents. A research provider may abuse respondents in that they may not provide the promise incentives for participating in the project/study. Within the client/research buyer group it is unethical to request a project/research proposal when you know you have no intent to purchase. The respondents can be dishonest by answering without regard to the truth just to obtain some incentive, or providing the answers they believe the researcher desires.
What is a deviation score?
How is the concept of a deviation score related to the mean as the middle value in a distribution?
A deviation score is the distance of a data value from its mean. The mean is the middle of a distribution in the sense that the deviations scores about the mean all sum to zero.
Define “standard deviation” with words (not a formula).
Standard deviation: Square root of the average squared deviation score, defining average on the basis of the degrees of freedom for the sample standard deviation calculated using the sample mean.
What is the relation between the standard deviation and the variance?
Why do we typically use the standard deviation in applications of statistical analysis?
The variance is the square of the standard deviation. The standard deviation is usually preferred because it is in the metric of the variable, that is, the same measurement units, and it is directly related to normal curve probabilities.
How do a cross-tabulation table and a scatterplot
Under what situation is each one used?
Both forms of statistical output provide a comparison of the relation between two variables. A cross-tabulation table shows the joint frequencies of the values of two categorical variables. That is, the cross-tabulation table, or pivot table, shows the relation between two categorical variables. A scatterplot shows the relation between two continuous, that is, numeric, variables.
What are the marginal frequencies in a cross-tabulation or pivot table?
How do marginal frequencies compare to the joint frequencies?
The marginal frequencies are the row and column sums of the corresponding joint frequencies. They represent the frequencies for each of the two categorical variables considered separately, that is, as if the frequencies were counted for the values of each categorical variable separately.
How does an ellipse drawn around most of the values of a scatterplot relate to the strength of the relationship?
The more scatter in the scatterplot, the less strongly the two variables are related. The more scatter, the more circular (or at least not tilted) is the shape of the points. The less scatter, the points are more closely clustered together, becoming closer and closer to a line. So an ellipse drawn around most of the points moves from a line to thin ellipse to a wider ellipse to a circle as a relationship moves from perfect to nonexistent.
Which correlation represents a stronger relationship, and why: r = .45 or r = .83?
The sign of the correlation coefficient, + or −, indicates the direction of the relationship. The size of the coefficient, irrespective of its sign, indicates the strength of the relationship. So a correlation of −.83 represents a stronger relationship than one of .45.
Compare internal and external secondary data.
The primary difference between internal and external sources relate to the purpose. Internal sources were collected within the company or organ and the secondary data can be evaluated because of the organizational knowledge about the data. This is not the case for external data sources.
What is the distinction between brand awareness and brand attitudes?
Brand awareness refers to how well known is the brand, that is, to the percentage of respondents who have heard of the brand. An attitude is the amount of attraction or feeling toward an object, so a brand attitude refers to the degree of positive associations one has with a certain brand.
What is the distinction between analyzing data and interpreting data?
Analyzing data is generally conducted with mathematical operations, formulas and graphical analysis. In practice the data analysis is accomplished with the computer. The purpose of data analysis is to provide information that the researcher and marketer can interpret so as to ascribe meaning and guide management decisions.
Identify the significant changes taking place in today’s business environment that are forcing management decision makers to rethink their views
of marketing research. Also discuss the potential impact that these changes might have on marketing research activities.
- The most significant and “first” change occurring today stems from the Internet. More people are using the Internet to gather and disseminate information and purchase goods (e-commerce). Therefore, management decision makers view the
- Internet as both a place to display their business (create a website) and collect information from buyers (on-line marketing research).
- Other technological advances include company databases that store vast amounts of information, such as the scanned results of every purchase made in every retail outlet of a company. All of this information must be organized and summarized,
- requiring computing resources and the knowledge of how to use them and perform the needed analyses.
In the business world of the 21st Century, will it be
possible to make critical marketing decisions without marketing research?
Why or why not?
It is always possible to do marketing without investing in marketing research, with good reason. If Company X lacks the funds to do marketing research, or the research endeavor is motivated by purely political reasons one might argue it might be better to proceed with the decision problem without committing dollars to marketing research. Still, information is power, and marketing research provides much of this information.
How are management decision makers and information researchers alike? How are they different? How might the differences be reduced between these two types of professionals?
Management decision makers and information researchers are similar in a fundamental sense; namely, they both share an interest in determining what needs to be done to, “find out how satisfied our customers are overall and what can be done to improve our image?” Researchers and managers are different in the sense that managers expect (and enjoy) being able to make a decision on the basis of information as soon as possible. A researcher, on the other hand, may prefer to probe problems and/or opportunities in more detail, collect and analyze the data. Both parties value the importance and power of information and want the corporation to make the “right” choice. There is a growing realization that more often than not decisions based on information research fare better in an increasingly competitive marketplace than choices made on the basis of pure intuition.
Does the sample mean equal the population mean? Why or why not?
This is the fundamental issue for statistical inference. Due to the presence of sampling error, the sample mean randomly fluctuates around the population mean over repeated sampling from the population.
Which value would you like to know, the sample mean, Y ̄, or the population mean μ? Why?
The population mean, μ, is the underlying value of interest. The sample mean, Y ̄ , is based on μ, but also contains sampling error. So to understand the future, that is, to forecast the value of the variable, the forecast is based on the underlying stable μ, which describes the stable aspects of the underlying system.
Why is information regarding characteristics of the population, such as the population mean, μ, more useful for business decision making than the corresponding information that describes a sample from that population, such as the sample mean, Y ̄ ?
Management decisions are made regarding the future, an implicit or explicit forecast as to values of variables as they will exist in the future. What projects into the future are the stable population values that underlie the sampling process by which past data were collected. Sample statistics are influenced by the population values, but also by inherently unstable sampling error. Better forecasts are made from the underlying population values.
Why is the conceptualization of Y ̄ as a variable an abstraction? What does it mean to say that the value of Y ̄ varies?
We typically do not observe any variation in Y ̄ , we typically only see 1 such value. Hence its variation is abstraction of what would happen over repeated sampling, something that would be true if it were actually done.
What is a standard error and why is the standard error of the mean so important to statistical inference of the mean?
A standard error is the standard deviation of statistic, such as the sample mean, Y ̄ . Usually the term “standard deviation” refers to the variability of the data, so the standard error describes how much a statistic varies from sample to (usually hypothetical) sample. A statistic varies from sample to sample and the extent of this variation is at the core of statistical inference.
What is the role of a cutoff or critical value, such as
t.025, in the construction of a confidence interval?
The purpose of a cutoff value, or pair of values, such as t.025 and −t.025, is to determine the range of sampling variability of the sample mean, Y ̄ . The t-distribution is a distribution of standardized values, which expresses the range of variability in terms of estimated standard errors. To express the range of variability of Y ̄ directly, unstandardize, that is, multiply that t-value by the estimated standard error.
What is the primary information provided by the confidence interval of the mean?
The confidence interval of the mean specifies the plausible range of values that likely contain the population mean, at a specified level of confidence.
What does the word “confidence” mean in the construction of a confidence interval? That is, on what logic is its meaning based?
The meaning of the of “confidence” is based on the concept of repeated sampling. For the 95% confidence interval, on average, 95% of the confidence intervals calculated from samples drawn from the same population would contain the true, underlying value of the population mean, μ.
How is sample size related to the width of a confidence interval?
The larger the sample size, everything else held the same such as the confidence level, the smaller the standard error and t-cutoffs, so the smaller the confidence interval.
What is the relation between the null hypothesis and the alternative hypothesis?
The alternative hypothesis is the range of values not specified by the null hypothesis.
In hypothesis testing, why is the sampling distribution
of the mean centered on the hypothesized value?
The hypothesis test is based on the assumption of the null hypothesis being the true value of the population mean. The population mean is at the center of the corresponding distribution of the sample mean over repeated samples from the same process or population.
Why is the p-value considered more useful than the observed value of tY ̄ when evaluating an hypothesis test?
The p-value and the t-value both provide the identical result, but people more intuitively understand probabilities, and their range from 0 to 1, than they do t-values.
Why does a large p-value indicate greater consistency
with the null hypothesis than a small p-value?
If the null hypothesis is true, what is the probability of the obtained Y ̄ being as far or farther from the hypothesized mean? This is the p-value. If this number is large, then the probability is high, so the result is consistent with the hypothesized value. If this result is small, then the Y ̄ is a low probability event assuming the null hypothesis.
Confidence intervals and hypothesis tests both involve the construction of intervals approximately 2 estimated standard errors in width. Define and compare these intervals for these two forms of statistical inference.
Statistical inference of the mean is based on a given range of sampling variation, such as the 95% range of variation given by t.025. Put this range about the sample mean Y ̄, and the result is the confidence interval. Put this same interval about the hypothesized mean, μ0, and the result is the “acceptance” region of the hypothesis test.
Why is the confidence interval a more general form of
statistical inference than an hypothesis test?
A confidence interval is a “million” hypothesis tests. Literally, every μ0 outside of the region would be rejected and every value inside would be “accepted”. Moreover, the confidence interval also provides the range of plausible values, whereas a hypothesis test only indicates if the true population mean is larger than or smaller than the hypothesized mean, or consistent with it.
Consider wishing to know if Red or Yellow packaging leads to more sales? What would the data look like for a study designed to answer this question, and what statistical technique would you use?
The data set would contain at least the following two variables: Package.Color and Sales. Package.Color would be a grouping variable with two values, Red and Yellow. Sales would be a continuous, that is, numerical, response variable. The corresponding statistical technique would be a t-test for two groups and/or a confidence interval of the mean difference. Both analyses are provided by the R function t.test and the lessR function smd.t.test.
What value is estimated by the confidence interval of a
mean difference? How is this value interpreted according to its sign?
The confidence interval for the mean difference estimates the population mean difference μA − μB , which indicates how much larger one mean is than the other, in the original units of measurement. A positive mean difference indicates how much larger μA is than μB. A negative mean difference indicates how much larger μB is than μA. A value of 0 indicates equality.
How is the confidence interval about a mean difference interpreted if the interval includes zero?
There is no consistent difference among the two population means, the population mean of one group can be smaller, equal or larger than the population mean of another group.
How is the confidence interval about a mean difference interpreted if the interval does not include zero?
If zero is not in the confidence interval of the mean difference, then all the values in the interval are positive or all are negative. The result is that a difference between group means was detected, and the confidence interval shows how much larger one mean is likely to be than the other.
Why is the following equation the reason for statistical
inference it the analysis of the mean difference?
Y ̄ 1 − Y ̄ 2 ≈ μ 1 − μ 2 The sample mean is a statistic, so it is calculated from a data sample. The result is that the sample mean for each group is contaminated by sampling error, as is, then, the resulting 1sample mean difference. For this reason, the sample mean difference only approximates the desired population mean difference. Statistical inference is the technique used to estimate the population mean difference from the data.
What is the meaning of the p-value for the t-test of the
The p-value for a two-sample hypothesis test: Probability of obtaining a difference between sample means as large or larger than the observed difference, assuming the null hypothesis specifies the true value of the mean difference.
What is the standardized mean difference, d? How does it compare to the regular (i.e., undstandardized) mean difference? What information does d provide beyond the usual t- statistic?
The unstandardized mean difference is expressed in the original units of measurement. The standardized mean difference takes this difference and divides by the pooled standard deviation of the two samples. As such, the standardized index indicates effect size disentangled from the specific unit of measurement. This unit-less index also allows results to be more easily compared across studies as it directly indicates the number of standard deviations that separate the two means. Values larger than .8 are considered to indicate a strong effect, and values less than .2 are considered to indicate a trivial effect, which also translate directly into the percent of overlap of the two distributions of data values if the distributions are normal.