top of page

Working with CSV files in R Programming

CSV files are basically the text files wherein the values of each row are separated by a delimiter, as in a comma or a tab. In this article, we will use the following sample CSV file: sample.csv

id, name, department, salary, projects
1,   A,     IT,        60754,   4
2,   B,     Tech,      59640,   2
3,   C,     Marketing, 69040,   8
4,   D,     Marketing, 65043,   5
5,   E,     Tech,      59943,   2
6,   F,     IT,        65000,   5
7,   G,     HR,        69000,   7

Reading a CSV file

The contents of a CSV file can be read as a data frame in R using the read.csv(…) function. The CSV file to be read should be either present in the current working directory or the directory should be set accordingly using the setwd(…) command in R. The CSV file can also be read from a URL using read.csv() function.


Examples:
csv_data <- read.csv(file = 'sample.csv')
print(csv_data)
  
# print number of columns
print (ncol(csv_data)) 
  
# print number of rows
print(nrow(csv_data))  

Output:


       id,   name,    department,    salary,      projects
 1      1      A         HR          60754         14
 2      2      B        Tech         59640         3
 3      3      C       Marketing     69040         8
 4      4      D         HR          65043         5
 5      5      E        Tech         59943         2  
 6      6      F         IT          65000         5
 7      7      G         HR          69000         7
 [1] 4
 [1] 7

The header is by default set to a TRUE value in the function. The head is not included in the count of rows, therefore this CSV has 7 rows and 4 columns.


Querying with CSV files

SQL queries can be performed on the CSV content, and the corresponding result can be retrieved using the subset(csv_data,) function in R. Multiple queries can be applied in the function at a time where each query is separated using a logical operator. The result is stored as a data frame in R.


# Examples:
csv_data <- read.csv(file ='sample.csv')
min_pro <- min(csv_data$projects)
print (min_pro)

Output:

2

Aggregator functions (min, max, count etc.) can be applied on the CSV data. Here the min() function is applied on projects column using $ symbol. The minimum number of projects which is 2 is returned.


csv_data <- read.csv(file ='sample.csv')
new_csv <- subset(csv_data, department == "HR" & projects <10)
print (new_csv)

Output:

      
      id,   name,    department,    salary,      projects
4      4      D         HR          65043         5
7      7      G         HR          69000         7

The subset of the data that is created is stored as a data frame satisfying the conditions specified as the arguments of the function. The employees D and G are HR and have the number of projects<10. The row numbers are retained in the resultant data frame.


Writing into a CSV file

The contents of the data frame can be written into a CSV file. The CSV file is stored in the current working directory with the name specified in the function write.csv(data frame, output CSV name) in R.


#Examples:
csv_data <- read.csv(file ='sample.csv')
new_csv <- subset(csv_data, department == "HR" & projects <10)
write.csv(new_csv, "new_sample.csv")
new_data <-read.csv(file ='new_sample.csv')
print(new_data)

Output:

     
      X    id,   name,    department,    salary,      projects
1     4      4      D         HR          65043         5
2     7      7      G         HR          69000         7

The column X contains the row numbers of the original CSV file. In order to remove it, we can specify an additional argument in the write.csv() function that set row names to FALSE.

csv_data <- read.csv(file ='sample.csv')
new_csv <- subset(csv_data, department == "HR" & projects <10)
write.csv(new_csv, "new_sample.csv", row.names = FALSE)
new_data <-read.csv(file ='new_sample.csv')
print(new_data)

Output:

    
      id,   name,    department,    salary,      projects
1     4      D         HR          65043         5
2     7      G         HR          69000         7

The original row numbers are removed from the new CSV.

Related Posts

See All

R - CSV Files

Getting and Setting the Working Directory You can check which directory the R workspace is pointing to using the getwd() function. You...

Commentaires


bottom of page