CSV files are basically the text files wherein the values of each row are separated by a delimiter, as in a comma or a tab. In this article, we will use the following sample CSV file: sample.csv
id, name, department, salary, projects
1, A, IT, 60754, 4
2, B, Tech, 59640, 2
3, C, Marketing, 69040, 8
4, D, Marketing, 65043, 5
5, E, Tech, 59943, 2
6, F, IT, 65000, 5
7, G, HR, 69000, 7
Reading a CSV file
The contents of a CSV file can be read as a data frame in R using the read.csv(…) function. The CSV file to be read should be either present in the current working directory or the directory should be set accordingly using the setwd(…) command in R. The CSV file can also be read from a URL using read.csv() function.
Examples:
csv_data <- read.csv(file = 'sample.csv')
print(csv_data)
# print number of columns
print (ncol(csv_data))
# print number of rows
print(nrow(csv_data))
Output:
id, name, department, salary, projects
1 1 A HR 60754 14
2 2 B Tech 59640 3
3 3 C Marketing 69040 8
4 4 D HR 65043 5
5 5 E Tech 59943 2
6 6 F IT 65000 5
7 7 G HR 69000 7
[1] 4
[1] 7
The header is by default set to a TRUE value in the function. The head is not included in the count of rows, therefore this CSV has 7 rows and 4 columns.
Querying with CSV files
SQL queries can be performed on the CSV content, and the corresponding result can be retrieved using the subset(csv_data,) function in R. Multiple queries can be applied in the function at a time where each query is separated using a logical operator. The result is stored as a data frame in R.
# Examples:
csv_data <- read.csv(file ='sample.csv')
min_pro <- min(csv_data$projects)
print (min_pro)
Output:
2
Aggregator functions (min, max, count etc.) can be applied on the CSV data. Here the min() function is applied on projects column using $ symbol. The minimum number of projects which is 2 is returned.
csv_data <- read.csv(file ='sample.csv')
new_csv <- subset(csv_data, department == "HR" & projects <10)
print (new_csv)
Output:
id, name, department, salary, projects
4 4 D HR 65043 5
7 7 G HR 69000 7
The subset of the data that is created is stored as a data frame satisfying the conditions specified as the arguments of the function. The employees D and G are HR and have the number of projects<10. The row numbers are retained in the resultant data frame.
Writing into a CSV file
The contents of the data frame can be written into a CSV file. The CSV file is stored in the current working directory with the name specified in the function write.csv(data frame, output CSV name) in R.
#Examples:
csv_data <- read.csv(file ='sample.csv')
new_csv <- subset(csv_data, department == "HR" & projects <10)
write.csv(new_csv, "new_sample.csv")
new_data <-read.csv(file ='new_sample.csv')
print(new_data)
Output:
X id, name, department, salary, projects
1 4 4 D HR 65043 5
2 7 7 G HR 69000 7
The column X contains the row numbers of the original CSV file. In order to remove it, we can specify an additional argument in the write.csv() function that set row names to FALSE.
csv_data <- read.csv(file ='sample.csv')
new_csv <- subset(csv_data, department == "HR" & projects <10)
write.csv(new_csv, "new_sample.csv", row.names = FALSE)
new_data <-read.csv(file ='new_sample.csv')
print(new_data)
Output:
id, name, department, salary, projects
1 4 D HR 65043 5
2 7 G HR 69000 7
The original row numbers are removed from the new CSV.
Commentaires