As a data scientist, I have worked for La Trobe University, Data 61 CSIRO and Officeworks. My work mainly includes:
Data pre-processing: includes data cleaning, integration, filtering, etc. I have processed massive data with a variety of types such as text, geographic data and time series.
Data analysis: focuses on feature selection, feature correlation analysis, tendency analysis, outlier detection, etc.
Data modelling, comparison, and design: uses and designs models for prediction, clustering and classification. The algorithms that I usually use are deep learning, random forest, association rule, generalized linear regression, SVM, PCA, K-means and others. The theories that I always employ include Bayesian, clustering, classification and diverse statistic distribution like Gaussian, Poisson, Weibull and zero-inflated.
Data visualization: mainly uses R, Python, Matlab and Excel to visualize data. Except for the common figure types like bar or curve, I always generate violin graph, heat map, time series, clustering/classification expression, geography and so forth.
Usually I use the following tools for data analysis: R, Python, Matlab, SQL, Tableau