Skip to main content

R vs Python for Data Analysis :



what's going on everybody welcome back to another interesting blog today we are gonna be comparing python versus R we're gonna see which one is better now before i start this presentation yes i made an entire presentation for this blog i have to address the elephant in the room about a month ago i made a somewhat controversial post i don't think it's controversial some people did apparently  and it's right here hopefully on your screen at this time all it says is python is better than R that's my opinion but it stirred up a lot of emotions for a lot of people.

All right so some of the things that we're going to be discussing in our python versus R blog is we're going to talk about descriptions different libraries the code syntax pros and cons of both and my final answer  i will say before we get into it i'm not trying to go super in depth I tried to make it as user-friendly as possible if you know you guys are really wanting a more in-depth presentation on just one of these i can absolutely do that i plan on doing that at some point but this is going to be kind of high level and more talking about my thoughts and my feelings regarding this because it is a very emotional thing. 




Description of both again keeping it more high level and kind of getting to some specifics and then my conclusion so let's look at the description of both python and R starting with R is a programming language developed for statistical analysis and the people who mostly used it for a long long time where statisticians and just recently within the past you know five ten years has really been used for data science and data analysis and visualizations and all of those things it was developed in 1993 again like i just said primarily for statisticians data miners and analysts and it's used by a ton of very large companies some of them are uber facebook and google but there are tons of companies and even small companies that use r and so if your company does any type of statistics or statistical analysis there's a good chance that your company has either used R in the past or is currently using R as a programming language now onto python is a general purpose programming language it's used for almost anything you can imagine it may not be the best thing for every single thing it can do but it can do almost anything and so it's very general very broad it is quickly becoming the most popular programming language in the world and it is used by companies like google facebook and netflix now if you notice in the companies that use python and are both facebook and google are on that list then that wasn't by accident i did that on purpose because i wanted to show that these companies large companies are going to use both programming languages for what they're good for which obviously we will talk about later but i wanted to just kind of put that there for  i guess foreshadowing now before we look at libraries and packages i just want to say that if i did not highlight your favorite library or package on here I am sorry there are so many especially with r there's just hundreds and thousands of different packages and libraries  i just can't possibly put them all on here and so these are just a highlight of some of the more popular ones the ones that i have personally used and so i hope that you are not offended by that but let's start with r for data collection you can use things like our crawler read excel read rl and r curl for data wrangling exploration there's dplyr sql df data.table read r and tidyr and for data visualization there's ggplot2 ggviz plotly squis and shiny and over to python for a data collection there's pandas requests and beautiful soup for data wrangling and exploration there's pandas numpy and scipy and for data visualization there is matplotlib seaborn and plotly again this is just a high level overview of some of the packages in each of these programming languages if you have never used r or python i think these packages are a really good place to start now for the code and the syntax on both of these i tried to stay neutral on this i tried to just kind of say what everyone else was saying because i have my own very strong thoughts and opinions on this  but you know i wanted to stay somewhat unbiased at least for this one.

For R it's easy pretty difficulty to pick up and start working from from scratch you know if you've never picked up r it can be kind of difficult to pick up  a little bit more advanced it can be difficult to maintain your code especially as you start to scale your code and so that is a big problem that a lot of people have addressed or talked about with r with python again it's easy medic difficulty to pick up and learn I think it can be about the same difficulty as r in my opinion and that's what a lot of people said and so that's not just my opinion but it's easier to write and maintain larger scale code and so as you start building larger projects or join larger teams or take on more data it's just easier to scale up now into some syntax examples a 100 cherry pick these but I do feel like they're pretty representative of what the code looks like as a whole.

A lot of people are probably gonna get mad at me saying no R is much easier than this and you may be right in some aspects but for the most part I feel like this is fairly accurate we're just reading in a csv file and then trying to find the mean on a column or a field and that's about it and as you can tell r is just a little bit more difficult a little bit more complicated python's a little bit more cleaner it's a little bit more easy to read and pick up and that's something that a lot of people say about python it's very easily readable.

Pros and Cons:

now let's look at some of the pros and cons of both we're going to be starting with r some of the pros are that it is open source it is fantastic for statistical analysis has hundreds of packages and libraries purely for analytics and that's what R is it's purely for statistics and analyzing data and lastly it is easy to build visualizations with R now for the cons it can't be embedded in web applications and from what I've read that's purely for security reasons and so that is a big downside of using R you need to know a large amount of packages and libraries you can't just know like one or two kind of like in python you can know pandas and you can do a lot of different things with it r doesn't really have that you have to know several things in order to get kind of one task done and lastly r can run slow because of how they store their data so those are some of the pros and the cons of r now let's move on to python some of the pros for python it's open source it's easy to read and learn especially if you're just picking it up for the first time it can be embedded into web applications which can be very important for a lot of people and there's a growing number of libraries for data analysis there are of course growing number of of libraries and packages for r as well but those are quite more well established while python is still growing and they're coming out and they're catching up to r fairly quickly for the cons the processing speed can be slow especially depending on what library or package you're using but you know i think that's a con in both r and python on some level they're going to run slow it uses a large amount of memory kind of part of the why it's running slow it's simple to learn  and simple to use and sometimes that's an issue actually because it's so simple when you need to do really complicated things it can be kind of hard to do where an r that's what it's built for it's made for those complex calculations and so that's why those packages and libraries are built the way they are and lastly the libraries for all analytics needs are still being developed and so yes it is a pro that those numbers are growing  but it's still a con that they're you know behind our and so r has more being developed and more already developed in terms of all their libraries and packages being built out or python it is still growing now on to my final answer.

 

Which is better?

which is better python or R it really depends  but going back to my link din post that we talked about the very beginning I will say that i still 100 believe that because to me for my type of work the stuff that I do python is 100 times better it's 100 times more useful and so to me python is better than R but it really does depend on what you're using it for and so if you're doing purely statistical work R is going to be the better choice if you're doing machine learning python is arguably much better in my opinion r is harder to learn but it has more features while python is easier to learn but isn't as developed yet and so what i genuinely think you should do is I think you should try both I think you really need to get some hands-on experience take a course in both just see what you think and and determine for yourself what you think is better I really will go back to that LinkedIn for a second I believe that for me personally python is just better I can use it for so many things it is in my opinion much better suited for me and what i do for my job and so for me python is way better but for other positions and other people are maybe the programming language of choice and I'm totally okay with that there were a lot of people in the comments who were writing you know it just depends and and you know why don't you think that one why do you think that one is better than the other you know why can't it be both and I really wanted to respond and be like i agree with you  but I didn't because again  thought it was more fun and I knew I was making this blog and so I genuinely in the bottom of my heart to all those people I agree with you and so I want you to feel some vindication some sense of you know you you you were right and so I hope that this was  hopefully a good outcome for what you're hoping for   have nothing against r I have used it  and I and I've taken a few courses on it  I have not used that much art in my actual job although the data scientists that are in my department use it quite a bit i mostly stick with python and so again that's why I like it better but i can honestly say that I've given both a fair chance and so i think that you should do the same I think you really should test out which one that you personally think is better thank you guys so much for reading i really appreciate.

 

Comments