Welcome to Codes With Pankaj!
In this tutorial, we’ll dive deep into one of the most versatile and commonly used data structures in R - the data frame. By the end of this guide, you will understand what data frames are, how to create and manipulate them, and master essential methods with examples.
What is a Data Frame in R?
A data frame in R is a two-dimensional table-like structure that stores data in rows and columns. It’s similar to an Excel spreadsheet or a SQL table, where:
Each column can hold data of a specific type (numeric, character, factor, etc.).
Each row represents a single observation or record.
Key Features of Data Frames
Heterogeneous Columns: Columns can hold different data types.
Row and Column Names: Both rows and columns can be named.
Indexed: You can access elements using row and column indices.
Creating a Data Frame
You can create a data frame using the data.frame() function.
Example: Creating a Simple Data Frame
# Create vectors
names <- c("Pankaj", "Amit", "Ravi", "Priya")
ages <- c(25, 30, 22, 28)
marks <- c(85.5, 90.2, 88.0, 92.3)
# Combine vectors into a data frame
student_data <- data.frame(Name = names, Age = ages, Marks = marks)
# Print the data frame
print(student_data)
Output :
Name Age Marks
1 Pankaj 25 85.5
2 Amit 30 90.2
3 Ravi 22 88.0
4 Priya 28 92.3
Accessing Data in a Data Frame
1. Accessing Specific Columns
You can access a column using the $ operator or square brackets [].
# Using $
student_data$Name
# Using []
student_data[["Name"]]
student_data[, "Name"]
2. Accessing Specific Rows
Use square brackets with a row index.
# Access the second row
student_data[2, ]
3. Accessing Specific Elements
Specify both row and column indices.
# Access the element in the 3rd row and 2nd column
student_data[3, 2]
Modifying a Data Frame
1. Adding a New Column
student_data$Grade <- c("A", "A+", "A", "A+")
print(student_data)
Output :
Name Age Marks Grade
1 Pankaj 25 85.5 A
2 Amit 30 90.2 A+
3 Ravi 22 88.0 A
4 Priya 28 92.3 A+
2. Adding a New Row
Use the rbind() function.
# Add a new row
new_student <- data.frame(Name = "Ankit", Age = 26, Marks = 89.0, Grade = "A")
student_data <- rbind(student_data, new_student)
print(student_data)
Common Methods for Data Frames
1. View the Structure
The str() function provides an overview of the data frame.
str(student_data)
Output :
'data.frame': 5 obs. of 4 variables:
$ Name : chr "Pankaj" "Amit" "Ravi" "Priya" ...
$ Age : num 25 30 22 28 ...
$ Marks: num 85.5 90.2 88 92.3 ...
$ Grade: chr "A" "A+" "A" "A+"
2. Summary Statistics
The summary() function gives a statistical summary.
summary(student_data)
Name Age Marks Grade
Length:5 Min. :22.0 Min. :85.5 A :3
Class :character 1st Qu.:25.0 1st Qu.:88.0 A+ :2
Mode :character Median :26.0 Median :89.0
Mean :26.2 Mean :89.0
3rd Qu.:28.0 3rd Qu.:90.2
Max. :30.0 Max. :92.3
3. Head and Tail
head() shows the first 6 rows.
tail() shows the last 6 rows.
head(student_data)
tail(student_data)
4. Subset Data
Use the subset() function to filter rows.
# Get students with Marks greater than 90
top_students <- subset(student_data, Marks > 90)
print(top_students)
Operations on Data Frames
1. Sorting
Sort by a specific column using order().
# Sort by Marks in descending order
sorted_data <- student_data[order(-student_data$Marks), ]
print(sorted_data)
2. Merging
Use merge() to combine data frames.
# Create another data frame
extra_data <- data.frame(Name = c("Pankaj", "Amit", "Ravi"),
Hobby = c("Reading", "Cycling", "Traveling"))
# Merge on Name
merged_data <- merge(student_data, extra_data, by = "Name")
print(merged_data)
Additional Functions
1. Check Dimensions
dim() gives the dimensions of the data frame.
nrow() and ncol() give the number of rows and columns, respectively.
dim(student_data)
nrow(student_data)
ncol(student_data)
2. Renaming Columns
Use colnames() to rename columns.
colnames(student_data) <- c("StudentName", "StudentAge", "StudentMarks", "StudentGrade")
print(student_data)
3. Removing Rows/Columns
Remove a column using NULL.
Remove a row using negative indexing.
# Remove the Grade column
student_data$StudentGrade <- NULL
# Remove the 2nd row
student_data <- student_data[-2, ]
Real-World Applications
Analyzing Customer Data: Store customer details like age, gender, and purchase history.
Student Records: Keep track of marks, grades, and attendance.
Health Data: Analyze patient information such as age, symptoms, and diagnosis.
Practice Questions
Create a data frame containing details of 10 employees, including their names, salaries, and departments. Write R code to:
Display employees with a salary greater than 50,000.
Add a new column for performance ratings.
Merge two data frames containing product details and sales data based on product IDs.
Sort a data frame containing weather data by temperature in ascending order.
Conclusion
Data frames are the backbone of data analysis in R. By mastering these operations, you’ll have a strong foundation for exploring, analyzing, and visualizing data. Keep practicing with real-world datasets, and you’ll soon become an R data wizard!
Stay tuned for more tutorials at Codes With Pankaj.
Kommentarer