PBBSC SY INTRODUCTION TO NURSING RESEARCH AND STATISTICS UNIT 7
Descriptive Statistics
Descriptive Statistics
Descriptive statistics involves summarizing and organizing data to provide a clear understanding of its characteristics. It is the foundation of data analysis, offering insights into patterns, trends, and distributions within a dataset.
Purpose of Descriptive Statistics
Summarize Data:
Reduces large datasets into concise, interpretable summaries.
Highlight Patterns:
Identifies trends and distributions in the data.
Facilitate Comparison:
Compares different datasets or groups.
Prepare for Inferential Analysis:
Provides a foundation for further statistical testing.
Types of Descriptive Statistics
1. Measures of Central Tendency
Indicates the center or typical value of a dataset.
Mean (Average):
Sum of all data points divided by the total number.
Example: Average patient age = (25 + 30 + 35 + 40) ÷ 4 = 32.5 years.
Median:
The middle value when data is arranged in ascending order.
Example: Median of {25, 30, 35, 40} = 32.5 (average of middle two values).
Mode:
The most frequently occurring value in the dataset.
Example: Mode of {25, 25, 30, 35} = 25.
2. Measures of Dispersion
Indicates the spread or variability of data.
Range:
Difference between the highest and lowest values.
Example: Range of {10, 20, 30, 40} = 40 – 10 = 30.
Variance:
Average squared deviation from the mean.
Example: Variance = Σ (X – Mean)² / N.
Standard Deviation (SD):
Square root of the variance, representing average deviation from the mean.
Example: SD of {10, 20, 30, 40} ≈ 12.91.
3. Measures of Distribution
Describes the shape and spread of data.
Frequency Distribution:
Counts the number of occurrences of each value or category.
Example:Age GroupFrequency20–301031–401541–505
Skewness:
Measures asymmetry in data distribution.
Positive skew: Tail is longer on the right.
Negative skew: Tail is longer on the left.
Kurtosis:
Describes the peakedness or flatness of data distribution.
4. Graphical Representation
Visual tools to summarize and present data.
Bar Graphs:
Displays categorical data.
Histograms:
Shows frequency distribution for continuous data.
Pie Charts:
Represents proportions or percentages.
Box Plots:
Highlights data spread, median, and outliers.
Steps in Using Descriptive Statistics
Collect Data:
Gather raw data through surveys, experiments, or observations.
Organize Data:
Arrange data in a logical order, such as tables or spreadsheets.
Calculate Measures:
Compute central tendency, dispersion, and distribution metrics.
Visualize Data:
Use graphs or charts to present insights.
Interpret Results:
Analyze patterns and trends for meaningful conclusions.
Applications of Descriptive Statistics
In Nursing Research
Patient Demographics:
Summarizing age, gender, or medical history of patients.
Clinical Outcomes:
Analyzing treatment success rates or recovery times.
Survey Results:
Presenting feedback on healthcare services.
In Education Research
Student Performance:
Summarizing test scores or attendance records.
Teacher Feedback:
Analyzing responses to professional development programs.
Advantages of Descriptive Statistics
Simplifies Data:
Makes large datasets manageable.
Enhances Understanding:
Provides insights into patterns and relationships.
Supports Decision-Making:
Helps identify trends for actionable steps.
Limitations of Descriptive Statistics
No Cause-and-Effect Analysis:
Cannot determine relationships between variables.
Limited to Summary:
Provides no inference about the population beyond the dataset.
frequencies,
Frequencies in Data Analysis
Frequencies represent the count of occurrences of a particular value or category in a dataset. It is one of the simplest and most fundamental tools in descriptive statistics, often used to summarize and analyze data.
Types of Frequencies
Absolute Frequency:
The actual count of occurrences for each value or category.
Example:
Number of patients in each age group:Age GroupFrequency18–301031–501551–705
Relative Frequency:
The proportion or percentage of the total occurrences for each value or category.
Formula: Relative Frequency=Frequency of CategoryTotal Frequency×100\text{Relative Frequency} = \frac{\text{Frequency of Category}}{\text{Total Frequency}} \times 100Relative Frequency=Total FrequencyFrequency of Category×100
Example:
For age group 18–30: 1030×100=33.33%\frac{10}{30} \times 100 = 33.33\%3010×100=33.33%
Cumulative Frequency:
The running total of frequencies up to a certain value or category.
A frequency distribution table is a structured way to display frequencies. It organizes data into categories or intervals, showing the count (absolute frequency), percentage (relative frequency), or cumulative count.
Example of a Frequency Distribution Table
Dataset: Patient ages in a clinic.
Age Interval
Frequency
Relative Frequency (%)
Cumulative Frequency
18–30
10
33.33
10
31–50
15
50.00
25
51–70
5
16.67
30
Graphical Representation of Frequencies
Bar Graph:
Displays absolute or relative frequencies for categorical data.
Example: Number of patients in different age groups.
Histogram:
Represents frequencies for continuous data, grouped into intervals.
Example: Distribution of patient weights.
Pie Chart:
Represents relative frequencies as slices of a circle.
Example: Percentage of patients by gender.
Line Graph (Cumulative Frequency Curve):
Shows cumulative frequencies over categories or intervals.
Steps to Calculate Frequencies
Organize the Data:
Arrange data in ascending order (for continuous or ordinal data).
Define Categories or Intervals:
Create appropriate categories or intervals for grouping data.
Count Occurrences:
Count how often each value or interval occurs.
Calculate Relative Frequencies (if needed):
Use the formula provided above.
Calculate Cumulative Frequencies (if needed):
Add frequencies cumulatively.
Applications of Frequencies
In Nursing Research
Patient Demographics:
Summarizing age, gender, or diagnosis frequencies.
Clinical Outcomes:
Counting occurrences of specific recovery rates or side effects.
In Education Research
Test Scores:
Summarizing the number of students in each score range.
Attendance Records:
Counting the number of students with different attendance rates.
Advantages of Using Frequencies
Simplicity:
Easy to calculate and understand.
Highlights Trends:
Quickly identifies the most common values or categories.
Facilitates Further Analysis:
Prepares data for more advanced statistical tests.
class interval
Class Interval
A class interval is a range of values used to group continuous data into categories for easier analysis and representation. It is commonly used in frequency distribution tables to summarize large datasets.
Key Components of a Class Interval
Lower Limit:
The smallest value in the interval.
Example: In the interval 10–20, the lower limit is 10.
Upper Limit:
The largest value in the interval.
Example: In the interval 10–20, the upper limit is 20.
Class Width:
The difference between the upper and lower limits of a class.
Example: For 10–20, the midpoint is 10+202=15\frac{10 + 20}{2} = 15210+20=15.
Steps to Create Class Intervals
Identify the Range of Data:
Calculate the range: Range=Maximum Value−Minimum Value\text{Range} = \text{Maximum Value} – \text{Minimum Value}Range=Maximum Value−Minimum Value
Decide the Number of Classes:
The number of intervals depends on the dataset size, typically between 5 and 20.
Determine Class Width:
Formula: Class Width=RangeNumber of Classes\text{Class Width} = \frac{\text{Range}}{\text{Number of Classes}}Class Width=Number of ClassesRange
Create Intervals:
Start with the minimum value and add the class width to define each interval.
Assign Frequencies:
Count the number of data points falling into each interval.
Example: Creating Class Intervals
Dataset:
10, 12, 15, 18, 20, 22, 25, 27, 30, 32
Range:
32−10=2232 – 10 = 2232−10=22
Number of Classes:
Assume 5 classes.
Class Width:
225=4.4\frac{22}{5} = 4.4522=4.4 (round to 5)
Class Intervals:
Start at 10 with a width of 5:
10–15, 16–20, 21–25, 26–30, 31–35
Frequency Table:
Class Interval
Frequency
10–15
3
16–20
2
21–25
2
26–30
2
31–35
1
Applications of Class Intervals
In Nursing Research
Patient Data:
Grouping patient ages or lab test results into intervals.
Recovery Times:
Analyzing recovery durations by intervals.
In Education Research
Test Scores:
Summarizing students’ scores into ranges (e.g., 0–10, 11–20).
Attendance:
Categorizing students based on attendance rates.
Advantages of Using Class Intervals
Simplifies Large Datasets:
Reduces complexity for better visualization.
Highlights Patterns:
Makes trends and distributions easier to identify.
Facilitates Further Analysis:
Prepares data for histograms and statistical measures.
graphic methods of describing frequency
Graphical Methods of Describing Frequency
Graphical methods provide visual representations of frequency data, making it easier to identify trends, patterns, and distributions. Below are some commonly used graphical methods to describe frequency.
1. Bar Graph
Definition: A bar graph represents categorical frequency data using rectangular bars.
Application:
Used for discrete data, such as survey responses or patient categories.
Characteristics:
Bars are separated.
The height of the bar indicates the frequency.
Example:
Frequency of patients visiting different hospital departments:
Department
Frequency
Outpatient
50
Emergency
30
Surgery
20
Bar Graph:
X-axis: Departments.
Y-axis: Frequency.
Each department is represented by a bar proportional to its frequency.
2. Histogram
Definition: A histogram represents the frequency distribution of continuous data using adjoining bars.
Application:
Used for data grouped into class intervals, such as patient ages or test scores.
Characteristics:
Bars touch each other to indicate continuous data.
X-axis represents class intervals; Y-axis represents frequency.
Example:
Patient ages grouped into intervals:
Age Group
Frequency
10–20
5
21–30
15
31–40
10
3. Frequency Polygon
Definition: A frequency polygon connects points plotted at the midpoints of class intervals, with the frequency on the Y-axis.
Application:
Used to show the shape of the distribution and compare multiple datasets.
Characteristics:
The polygon starts and ends at the baseline (X-axis).
Easier to compare datasets than a histogram.
Example:
Using the same data as the histogram example, plot the midpoints (e.g., 15, 25, 35) against the frequencies.
4. Pie Chart
Definition: A pie chart represents frequency data as slices of a circle, showing proportions.
Application:
Used for relative frequency or percentage data.
Characteristics:
Each slice represents a category, proportional to its frequency.
Example:
Distribution of disease cases in a hospital:
Disease
Frequency
Percentage
Diabetes
40
40%
Hypertension
30
30%
Others
30
30%
The pie chart will have slices of 40%, 30%, and 30%.
5. Line Graph
Definition: A line graph uses points connected by a line to show trends or changes over time.
Application:
Used for time-series data, such as weekly patient admissions.
Characteristics:
X-axis: Time intervals.
Y-axis: Frequency.
Example:
Weekly patient admissions:
Week
Admissions
Week 1
20
Week 2
30
Week 3
25
6. Ogive (Cumulative Frequency Graph)
Definition: An ogive represents cumulative frequency data, either less than or greater than a given value.
Application:
Used to determine percentiles or medians.
Characteristics:
X-axis: Class intervals.
Y-axis: Cumulative frequency.
Example:
Cumulative frequency data for patient ages:
Age Group
Frequency
Cumulative Frequency
10–20
5
5
21–30
15
20
31–40
10
30
7. Scatter Plot
Definition: A scatter plot represents the relationship between two continuous variables.
Application:
Used to visualize correlations (positive, negative, or none).
Characteristics:
Each point represents an observation.
Example:
Relationship between BMI and blood pressure readings.
BMI
Blood Pressure
25
120
30
140
35
160
8. Box Plot
Definition: A box plot shows the distribution of a dataset, including its median, quartiles, and outliers.
Application:
Used to identify variability and outliers.
Characteristics:
Box represents the interquartile range.
Whiskers extend to the smallest and largest values within 1.5 times the IQR.
Comparison of Graphical Methods
Graph Type
Best For
Example
Bar Graph
Categorical data
Frequency of diseases in departments.
Histogram
Continuous data
Age distribution of patients.
Frequency Polygon
Comparing distributions
Test score comparisons.
Pie Chart
Proportions/percentages
Disease case distribution.
Line Graph
Time-series data
Weekly admissions in a hospital.
Ogive
Cumulative frequencies
Median income levels.
Scatter Plot
Relationships between variables
Correlation between BMI and blood pressure.
Box Plot
Distribution and outliers
Recovery time variability in treatments.
Tips for Effective Graphical Representation
Choose the Right Graph:
Match the graph type to the data and objectives.
Label Clearly:
Use meaningful titles, axis labels, and legends.
Simplify:
Avoid clutter; focus on key insights.
Use Colors Judiciously:
Use consistent colors to differentiate categories or variables.
Validate Data:
Ensure accuracy of data before plotting.
Measures of central tendency –Mode, Median and mean.
Measures of Central Tendency: Mode, Median, and Mean
Measures of central tendency describe the center or typical value of a dataset. These measures summarize data into a single value, which represents the “average” or “middle” of the distribution.
1. Mode
Definition
The mode is the value or category that appears most frequently in a dataset.
Key Characteristics
For Categorical Data:
Identifies the most common category.
Example: In a survey of favorite colors: Red (4), Blue (5), Green (3). Mode = Blue.
For Numerical Data:
Can have one mode (unimodal), two modes (bimodal), or more (multimodal).
Example: In {1, 2, 2, 3, 3, 4}, Modes = 2 and 3 (bimodal).
Advantages
Easy to identify.
Useful for categorical data.
Disadvantages
May not exist in some datasets.
Not helpful for datasets with uniform frequencies.
2. Median
Definition
The median is the middle value of a dataset when arranged in ascending or descending order.
Calculation Steps
Arrange the data in order.
Find the middle value:
For odd numbers: The middle value is the median.
For even numbers: The average of the two middle values.
Example:
Odd Dataset: {3, 7, 8, 12, 15} Median = 8.
Even Dataset: {4, 6, 8, 10, 12, 14} Median = 8+102=9\frac{8+10}{2} = 928+10=9.
Key Characteristics
Insensitive to extreme values (outliers).
Represents the 50th percentile.
Advantages
Robust to outliers.
Suitable for ordinal data.
Disadvantages
Does not consider all values in the dataset.
3. Mean
Definition
The mean (average) is the sum of all values divided by the total number of values.
Formula
Mean=Sum of all valuesNumber of values\text{Mean} = \frac{\text{Sum of all values}}{\text{Number of values}}Mean=Number of valuesSum of all values
Identifying the most common symptoms reported by patients.
Median:
Analyzing recovery times to find the “typical” patient experience.
Mean:
Calculating the average heart rate of patients during treatment.
When to Use
Measure
Best Used When
Mode
Categorical data or finding the most common occurrence.
Median
Skewed data or when outliers are present (e.g., income distribution).
Mean
Normally distributed numerical data for advanced statistical analysis.
Measures of variability : Range, standard deviation
Measures of Variability: Range and Standard Deviation
Measures of variability describe the extent to which data values differ from each other or from the central tendency. They provide insights into the spread or dispersion of the data.
1. Range
Definition
The range is the difference between the largest and smallest values in a dataset.
Standard Deviation: 8≈2.83\sqrt{8} \approx 2.838≈2.83
Key Characteristics
Reflects data spread more accurately than the range.
Used for advanced statistical calculations.
Advantages
Considers all data points.
Suitable for comparing variability across datasets.
Disadvantages
Complex to calculate manually for large datasets.
Sensitive to outliers.
Comparison of Range and Standard Deviation
Aspect
Range
Standard Deviation
Definition
Difference between max and min.
Average deviation from the mean.
Sensitivity to Outliers
Highly sensitive.
Moderately sensitive.
Calculation Simplicity
Simple to compute.
Complex to calculate.
Information Provided
Limited (extreme values only).
Comprehensive (all data points).
Application
Quick dispersion estimate.
Detailed variability analysis.
Applications in Nursing Research
Range:
Quick assessment of patient recovery times (e.g., shortest and longest durations).
Standard Deviation:
Analyzing blood pressure readings to evaluate variability within patient groups.
Range is best for quick and simple variability estimates.
Standard deviation provides a deeper understanding of data dispersion and is crucial for detailed analysis.
Introduction to normal probability
Introduction to Normal Probability
The normal probability concept is rooted in the normal distribution, which is a fundamental statistical tool used to model a wide range of natural phenomena. The normal distribution is also known as the Gaussian distribution or bell curve due to its characteristic shape.
Key Characteristics of Normal Distribution
Symmetry:
The curve is perfectly symmetrical around its mean.
Example: In a class test, if most students score around the average, the distribution of scores is likely symmetric.
Mean, Median, and Mode:
All three measures of central tendency are equal and located at the center of the curve.
Shape:
The curve is bell-shaped, with a peak at the mean and tails extending infinitely in both directions but never touching the X-axis.
Standard Deviation:
Defines the spread or variability of the distribution.
A smaller standard deviation results in a narrower curve; a larger one results in a wider curve.
Area Under the Curve:
The total area under the curve is 1 (or 100%), representing the probability of all outcomes.
Probability in Normal Distribution
The probability of a specific range of values in a normal distribution is determined by the area under the curve for that range.
Empirical Rule (68-95-99.7 Rule):
68% of data falls within 1 standard deviation of the mean.
95% of data falls within 2 standard deviations of the mean.
99.7% of data falls within 3 standard deviations of the mean.
Example:
Mean height of adults = 170 cm, standard deviation = 10 cm:
68% of adults have heights between 160 cm160 \, \text{cm}160cm and 180 cm180 \, \text{cm}180cm.
95% have heights between 150 cm150 \, \text{cm}150cm and 190 cm190 \, \text{cm}190cm.
99.7% have heights between 140 cm140 \, \text{cm}140cm and 200 cm200 \, \text{cm}200cm.
Standard Normal Distribution
The standard normal distribution is a special case where:
Mean (μ\muμ) = 0.
Standard deviation (σ\sigmaσ) = 1.
Z-Score:
A z-score measures how many standard deviations a data point is from the mean.
Formula: Z=X−μσZ = \frac{X – \mu}{\sigma}Z=σX−μ
XXX: Data point.
μ\muμ: Mean.
σ\sigmaσ: Standard deviation.
Example:
A student scored 85 in a test where the mean score is 75 and the standard deviation is 10. Z=85−7510=1Z = \frac{85 – 75}{10} = 1Z=1085−75=1
Interpretation: The student’s score is 1 standard deviation above the mean.