Python

WeRateDogs Analysis

WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. The goal of this project is to wrangle the data and showcase the efforts through analyses and visualizations.

Dog image ratings

Looking at the distribution of dog image ratings, it is clear that the distribution is left skewed, with most values falling in the 0.75-1.25 range (very good ratings, generally!) and some outliers with extreme value of 2.0.

Number of retweets and favorite counts

By the following scatter plot, there is a strong linear relation between retweets and favorite count. The correlation between these two variables is 0.927, supporting the observed correlation in the plot.

Dog breeds

The insight comes from the results of the image classifier by looking at the most common dog breeds identified in the dataset. The focus is on the prediction associated with the highest probability value. The 10 most common breeds found by the neural network for prediction #1 are in the following table. The clear winner here is the golden retriever, a dog breed that appears in 150 images.

References

About the dataset

The data for this project comes from three different datasets. Twitter archives, containing roughly 2.5k tweets and other information; image predictions, which stores about 2,000 entries of image predictions; and tweet details, that were obtained quering the Twitter API.

About the documents

There are three documents to tackle this project. A Jupyter notebook that comprehends the data gathering; data assessment; data cleaning; and the storage, analysis and visualization. One report that briefly describes the wrangling efforts. And another report that communicates the insights and displays the visualizations produced from the wrangled data.