As you can see from the snap for variable assignment we can use <- or = or ->
# is used for commenting
Data structure
Selecting a data structure to hold data is an important task. In R, the data source can include text files, spreadsheets, statistical packages and database etc.
R contains wide variety of structures for holding data including scalars, vectors, arrays, data frames and lists. Unlike java, variables are not required to declare as data type.
We can get to know about the data type using below command
> flag <- TRUE
> print(class(flag))
[1] "logical"
|
Vectors
Vectors are one dimensional arrays. Combine function c() is used to form the vector.
> a<- c(11,21,31,41,51)
> print(a)
[1] 11 21 31 41 51
> a[3]
[1] 31
> a[2:4]p
[1] 21 31 41
|
Note: Scalars are one element vector.
Matrices
A matrix is a two dimensional array where each element has the same type like numeric, character or logical.
> rownames<-c("Row1","Row2","Row3","Row4","Row5")
> colnames<-c("Column1","Column2","Column3","Column4")
> X<-matrix(1:20,nrow=5,ncol=4,byrow=TRUE,dimnames=list(rownames,colnames))
> x
Error: object 'x' not found
> print(x)
Error in print(x) : object 'x' not found
Please note variables are case sensitives which causes the error in RED.
> X
Column1 Column2 Column3 Column4
Row1 1 2 3 4
Row2 5 6 7 8
Row3 9 10 11 12
Row4 13 14 15 16
Row5 17 18 19 20
dimnames: is used for labels. Optional.
Arrays
Arrays are similar to matrices and can have more than 2 dimensions.
> X<-array(1:20,c(2,3,4))
> X
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
, , 3
[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18
, , 4
[,1] [,2] [,3]
[1,] 19 1 3
[2,] 20 2 4
Data frames
Data frames are mostly used data structure in R. It can contain different modes of data like numeric, character etc. But one point to remember that each column must have only one mode.
> studentID<- c(101,102,103,104)
> age<-c(25,24,26,25)
> grade<-c("good","poor","improved","excellent")
> score<-c(70,45,60,90)
> studentDetails<-data.frame(studentID,age,grade,score)
> studentDetails
studentID age grade score
1 101 25 good 70
2 102 24 poor 45
3 103 26 improved 60
4 104 25 excellent 90
> studentDetails[1:3]
studentID age grade
1 101 25 good
2 102 24 poor
3 103 26 improved
4 104 25 excellent
> studentDetails$score
[1] 70 45 60 90
> studentDetails[c("studentID","score")]
studentID score
1 101 70
2 102 45
3 103 60
4 104 90
> table(studentDetails$score,studentDetails$grade)
excellent good improved poor
45 0 0 0 1
60 0 0 1 0
70 0 1 0 0
90 1 0 0 0
> max(studentDetails$score)
[1] 90
Now if we use plot(studentDetails$studentID,studentDetails$score)
Execute plot(studentDetails$studentID,studentDetails$score,type = "o") in R and see the result J .
List
List can gather any kind of objects/ structure we have seen so far.
listExample<- list(obj1,obj2,…)
Importing Data into R
- edit() function can be used to take input from the user
It's important to store the data in variable otherwise all entered data will be lost. See above image.
- Import data from text file
If you have R- Studio installed then you can take advantage of the help predictions like below
> library("sqldf")
Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite
Loading required package: DBI
> studentID<- c(101,102,103,104)
> age<-c(25,24,26,25)
> grade<-c("good","poor","improved","excellent")
> score<-c(70,45,60,90)
> studentDetails<-data.frame(studentID,age,grade,score)
> QueryData<-sqldf("select * from studentDetails where studentId=101",row.names=TRUE)
Loading required package: tcltk
> QueryData
studentID age grade score
1 101 25 good 70
R Code Sample
R syntax is different but if you have good grasp on any languages like JAVA then it will not take time to take a grip on R basic syntax like conditions, loop,functions etc. Below are some use of R sample code which can be useful.
> new.function <- function(a) { # defining new function
+ if(a%in%8:12){ # checks whether a is exist between 8 to 12
+ for(i in 1:a) { # for loop will iterate till 1 to value of a
+ if(i==3){
+ next # used same as continue
+ }
+ else{
+ b <- i^2
+ print(b)
+ }
+ }
+ }
+ }
> new.function(9) # call the new function
[1] 1
[1] 4
[1] 16
[1] 25
[1] 36
[1] 49
[1] 64
[1] 81
Almost every sectors like Retail, Healthcare & Life sciences, Banking etc. can leverage the benefits of Machine Learning. But we need to identify/understand where cxactly we can maximize the benefits out of it.
In ECM space we can use these techniques to provide better insight of audit trail data to the end users or auditors.
Keep me posted your valuable thoughts and happy learning ;-).