My Journey Into The World Of Data Science
Hello, my Medium fellows!😊 It’s my first work here, and I’d be making some updates on my journey thus far into Data Science.
Getting into Tech from a non-Tech (or even a Mathematics or Statistical) background could be quite difficult, especially when there isn't any proper exposure.
The best way to dive in would have to be undoubtedly via an Introductory course. I would be taking a course on Udemy in conjunction with 365 DataScience (the complete data science bootcamp).
General Overview of Data Science
Data science is just basically all about storytelling (visualizing data) and making sense of numbers.
Data is the foundation of data science. It's essentially the material on which all analyses are based; that is, data science is completely reliant on data availability.
Data can be said to be of two (2) types:
- Traditional data: These are structured and stored in databases, and can be managed from just one computer. They are arranged in tables and contain numeric or text values.
- Big data: These are extremely large data distributed across a network of computers. It could be structured, semi-structured, or even unstructured. Its characteristics aren't only in volume (data measured in terabytes, petabytes, zettabytes, exabytes, hence its name, "big" data), but also variety (various formats like images, numbers, texts, audio, video, mobile data, etc), and velocity (extraction of data done as quickly as possible).
The field of data science has evolved over the years, as can be seen below:
Statistics => Data mining => Predictive analysis => Data science
...and it still keeps on evolving.
There are five (5) main steps in converting raw data into a more understandable data format useful for further processing. These processes could also be collectively known as "data wrangling" or "data munging".
- Class-labeling the observations- arranging data by category or format
- Data cleansing/data scrubbing- dealing with inconsistent data (e.g. misspelled words, etc)
- Data balancing- extracting an equal number of observations for each category
- Data shuffling- rearranging data points to eliminate unwanted patterns and improve predictive performance
- Data masking (for big data)- ensuring and maintaining confidentiality and data privacy
Analysis vs Analytics
Analysis simply deals with data from the past while Analytics takes on the data from the past and uses it to explore potential future events. Analytics works on the computations gotten from Analysis. Data science incorporates analysis and analytics.
Data Science Tools
Data science is an interdisciplinary field that combines
- statistical tools- includes SPSS, MS-Excel, Stata, and MATLAB for statistical methods (such as linear regression analysis, logistic regression analysis, cluster analysis, factorial analysis, and time series)
- mathematical tools- mathematics is the bedrock of data science, and they include calculus, algebra, and probability
- programming tools- majorly R and Python
- problem-solving tools
- data management tools
Data Science Roles
According to 365 Data Science, data science roles are grouped into five:
- Traditional data — data architect, data engineer, database administrator
- Big data — big data architect, big data engineer
- Business intelligence (BI) — BI analyst, BI consultant, BI developer
- Traditional methods (statistics) — data scientist, data analyst
- Machine learning — data scientist, machine learning engineer
Several misconceptions have been made as regards the data science field and its role divisions and even tools applied. As I make progress into further sections, clarity would be achieved, in terms of role diversification and proper application of the data science tools, hence, becoming a Data Science Amazon would be easily attainable.