Shaking up Mental Models

R
Author

Kevin

Published

2021-09-14

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.8     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Once you see it; you can’t unsee it.🤔

One benefit of learning a new framework is that it gives you the opportunity to compare and contrast with an established mental model.

I have been taking a Pandas Course from Kaggle and the lesson in Pandas involved establishing new variables in order to extract information from the data frame.

indices = [0, 1, 10, 100]
var = ['country', 'province', 'region_1', 'region_2']
df = reviews.loc[indices, var]

This seemed like a pain in order to select certain rows and columns, but it did open my perspective to a challenge I was having.

I have been working on reading data from a Qualtrics survey and there are nearly 147 columns and only about 117 are needed. (Long story on templated survey tools). To parse the data frame down, I had been using indexes for selections. Using an index is okay but frustrating as you are testing because the index selection breaks when there is a change to the survey. It was also a pain to write out all those terrible column names. The python script above made me think to create a vector to reference in a select statement.

Is this possible–yes it is, and now I seem to see it everywhere.

Below is a minimal example with the ‘mtcars’ data set.

remove <- mtcars %>%
  dplyr::select(drat, wt, qsec)

remove <- names(remove) #create a vector with the names of the columns you eventually want to exclude


new_mtcars <- mtcars %>%
  dplyr::select(-all_of(remove))  #within the select statement us the helper 'all_of' with the - operator to deselect the vector of interest. 


new_mtcars
                     mpg cyl  disp  hp vs am gear carb
Mazda RX4           21.0   6 160.0 110  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110  0  1    4    4
Datsun 710          22.8   4 108.0  93  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175  0  0    3    2
Valiant             18.1   6 225.0 105  1  0    3    1
Duster 360          14.3   8 360.0 245  0  0    3    4
Merc 240D           24.4   4 146.7  62  1  0    4    2
Merc 230            22.8   4 140.8  95  1  0    4    2
Merc 280            19.2   6 167.6 123  1  0    4    4
Merc 280C           17.8   6 167.6 123  1  0    4    4
Merc 450SE          16.4   8 275.8 180  0  0    3    3
Merc 450SL          17.3   8 275.8 180  0  0    3    3
Merc 450SLC         15.2   8 275.8 180  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205  0  0    3    4
Lincoln Continental 10.4   8 460.0 215  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230  0  0    3    4
Fiat 128            32.4   4  78.7  66  1  1    4    1
Honda Civic         30.4   4  75.7  52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65  1  1    4    1
Toyota Corona       21.5   4 120.1  97  1  0    3    1
Dodge Challenger    15.5   8 318.0 150  0  0    3    2
AMC Javelin         15.2   8 304.0 150  0  0    3    2
Camaro Z28          13.3   8 350.0 245  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175  0  0    3    2
Fiat X1-9           27.3   4  79.0  66  1  1    4    1
Porsche 914-2       26.0   4 120.3  91  0  1    5    2
Lotus Europa        30.4   4  95.1 113  1  1    5    2
Ford Pantera L      15.8   8 351.0 264  0  1    5    4
Ferrari Dino        19.7   6 145.0 175  0  1    5    6
Maserati Bora       15.0   8 301.0 335  0  1    5    8
Volvo 142E          21.4   4 121.0 109  1  1    4    2

Conclusion

Learning python helped me shake up my mental model and apply it to my R workflow.


About

Kevin is a nonprofit data professional operating out of Lakeland, Florida.
My expertise is helping nonprofits collect, manage and analyze their program data.