top of page

R Data Frame: A Comprehensive Guide

Welcome to Codes With Pankaj!

In this tutorial, we’ll dive deep into one of the most versatile and commonly used data structures in R - the data frame. By the end of this guide, you will understand what data frames are, how to create and manipulate them, and master essential methods with examples.


 

What is a Data Frame in R?

A data frame in R is a two-dimensional table-like structure that stores data in rows and columns. It’s similar to an Excel spreadsheet or a SQL table, where:

  1. Each column can hold data of a specific type (numeric, character, factor, etc.).

  2. Each row represents a single observation or record.


 

Key Features of Data Frames

  1. Heterogeneous Columns: Columns can hold different data types.

  2. Row and Column Names: Both rows and columns can be named.

  3. Indexed: You can access elements using row and column indices.


Creating a Data Frame

You can create a data frame using the data.frame() function.


Example: Creating a Simple Data Frame



# Create vectors

names <- c("Pankaj", "Amit", "Ravi", "Priya")

ages <- c(25, 30, 22, 28)

marks <- c(85.5, 90.2, 88.0, 92.3)



# Combine vectors into a data frame

student_data <- data.frame(Name = names, Age = ages, Marks = marks)



# Print the data frame

print(student_data)

Output :


     Name Age Marks 
1   Pankaj  25  85.5
2     Amit  30  90.2
3     Ravi  22  88.0
4    Priya  28  92.3

Accessing Data in a Data Frame


1. Accessing Specific Columns


You can access a column using the $ operator or square brackets [].



# Using $

student_data$Name



# Using []

student_data[["Name"]]

student_data[, "Name"]

2. Accessing Specific Rows


Use square brackets with a row index.


# Access the second row
student_data[2, ]

3. Accessing Specific Elements


Specify both row and column indices.



# Access the element in the 3rd row and 2nd column
student_data[3, 2]

Modifying a Data Frame


1. Adding a New Column



student_data$Grade <- c("A", "A+", "A", "A+")
print(student_data)

Output :


      
      Name Age Marks Grade

1   Pankaj  25  85.5     A

2     Amit  30  90.2    A+

3     Ravi  22  88.0     A

4    Priya  28  92.3    A+


2. Adding a New Row


Use the rbind() function.



# Add a new row
new_student <- data.frame(Name = "Ankit", Age = 26, Marks = 89.0, Grade = "A")
student_data <- rbind(student_data, new_student)
print(student_data)

 




 

Common Methods for Data Frames


1. View the Structure


The str() function provides an overview of the data frame.


str(student_data)

Output :



'data.frame':  5 obs. of  4 variables:

 $ Name : chr  "Pankaj" "Amit" "Ravi" "Priya" ...

 $ Age  : num  25 30 22 28 ...

 $ Marks: num  85.5 90.2 88 92.3 ...

 $ Grade: chr  "A" "A+" "A" "A+"

2. Summary Statistics


The summary() function gives a statistical summary.



summary(student_data)
 
 Name                Age          Marks        Grade  

 Length:5           Min.   :22.0   Min.   :85.5   A  :3  

 Class :character   1st Qu.:25.0   1st Qu.:88.0   A+ :2  

 Mode  :character   Median :26.0   Median :89.0         

                    Mean   :26.2   Mean   :89.0         

                    3rd Qu.:28.0   3rd Qu.:90.2         

                    Max.   :30.0   Max.   :92.3

3. Head and Tail


  • head() shows the first 6 rows.

  • tail() shows the last 6 rows.



head(student_data)
tail(student_data)

4. Subset Data


Use the subset() function to filter rows.



# Get students with Marks greater than 90
top_students <- subset(student_data, Marks > 90)
print(top_students)


 

Operations on Data Frames


1. Sorting


Sort by a specific column using order().



# Sort by Marks in descending order
sorted_data <- student_data[order(-student_data$Marks), ]
print(sorted_data)

2. Merging


Use merge() to combine data frames.



# Create another data frame

extra_data <- data.frame(Name = c("Pankaj", "Amit", "Ravi"),

                         Hobby = c("Reading", "Cycling", "Traveling"))



# Merge on Name

merged_data <- merge(student_data, extra_data, by = "Name")

print(merged_data)

Additional Functions


1. Check Dimensions


  • dim() gives the dimensions of the data frame.

  • nrow() and ncol() give the number of rows and columns, respectively.




dim(student_data)

nrow(student_data)

ncol(student_data)

2. Renaming Columns


Use colnames() to rename columns.



colnames(student_data) <- c("StudentName", "StudentAge", "StudentMarks", "StudentGrade")
print(student_data)

3. Removing Rows/Columns


  • Remove a column using NULL.

  • Remove a row using negative indexing.



# Remove the Grade column

student_data$StudentGrade <- NULL



# Remove the 2nd row

student_data <- student_data[-2, ]


 

Real-World Applications


  1. Analyzing Customer Data: Store customer details like age, gender, and purchase history.

  2. Student Records: Keep track of marks, grades, and attendance.

  3. Health Data: Analyze patient information such as age, symptoms, and diagnosis.


 



 


Practice Questions


  1. Create a data frame containing details of 10 employees, including their names, salaries, and departments. Write R code to:

    • Display employees with a salary greater than 50,000.

    • Add a new column for performance ratings.

  2. Merge two data frames containing product details and sales data based on product IDs.

  3. Sort a data frame containing weather data by temperature in ascending order.


    Solutions


  4. View More Advance question


Conclusion


Data frames are the backbone of data analysis in R. By mastering these operations, you’ll have a strong foundation for exploring, analyzing, and visualizing data. Keep practicing with real-world datasets, and you’ll soon become an R data wizard!

Stay tuned for more tutorials at Codes With Pankaj.

Related Posts

See All

Kommentarer


Kommentering har blitt slått av.
bottom of page