Python and R are the two widely used programming languages by data scientist. Both are free to download and open source languages.
They can be downloaded from Download Python and Download R
Both languages are suitable for many data science tasks from data manipulation to big data analysis.
What is R?
It is a programming language well suited for statistical analysis and data visualization. RStudio is a most popular IDE for using R.
R has been widely used in academics and research by statisticians and scientists.
Advantages of R
- Statistical analysis can be done with few lines of code.
- It can be used for making very informative graphs and visulaizations.
- It has many packages and libraries for data manipulation and visualization.
Popular Libraries of R
dplyr, tidyr, data.table - to manipulate data
ggplot2 - to visualize data
caret - for machine learning tasks
What is Python?
Python is a general purpose object oriented programming language. It is an incredibly simple and easy to learn language. Its programming syntax and its commands are similar to writing the English language.
IDLE is a default editor that comes with Python. There are many other IDE's available for Python. The most widely used are PyCharm, Spyder , Jupyter Notebook.
Advantages of Python
- It is an object oriented programming.
- Its simple syntax make coding and debugging easier.
- It can be used for web development and other applications.
- It is faster in execution.
- It has vast collection of libraries.
Popular Libraries of Python
Numpy - for efficient storage and manipulate homogenous array based data
Pandas - for manipulate heterogenous and labelled data
SciPy - for scientific computational tasks
Matplotlib, Seaborn - for quality data visualization
Scikit-learn - for machine learning tasks.
It is understood that both the languages are capable of dealing with the majority of data science problems and the choice of it depends on the requirements of the project.