This is a short note about how I set up my "data scientist" environment. Different people have different tastes, but what I use, and what I set up is:
- conda for environment and package management (equivalent to virtualenv and pip to say)
- Latest Python (yes, Python 3)
- Jupyter (aka IPython notebook)
- Disable all the autocomplete quotes and brackets stuff, that comes by default with Jupyter
- Set the IPython backend for matplotlib
So, we download Anaconda from: https://www.continuum.io/downloads (Linux 64 bits, Python 3, in my case). We install it by:
bash Anaconda3-2.4.1-Linux-x86_64.sh
We can either restart the terminal, or type the next command, so we start using conda environment:
. ~/.bashrc
We can update conda and all packages:
conda update conda && conda update --all
Then we create a new conda environment (this way we can change package versions without affecting the main conda packages). We name it myenv and specify the packages we want (numpy, pandas...).
conda create --name myenv jupyter numpy scipy pandas matplotlib scikit-learn bokeh
We activate the new environment:
source activate myenv
Now we have everything we wanted installed, let's change the configuration.
We start by creating a default ipython profile.
ipython profile create
Then we edit the file ~/.ipython/profile_default/ipython_kernel_config.py and we add the next lines to make matplotlib display the images with the inline backend, and with a decent size:
c.InteractiveShellApp.matplotlib = 'inline' c.InlineBackend.rc = {'font.size': 10, 'figure.figsize': (18., 9.), 'figure.facecolor': 'white', 'savefig.dpi': 72, 'figure.subplot.bottom': 0.125, 'figure.edgecolor': 'white'}
To disable autoclosing brackets, run in a notebook:
from notebook.services.config import ConfigManager c = ConfigManager() c.update('notebook', {"CodeCell": {"cm_config": {"autoCloseBrackets": False}}})