Virtual Environments with Conda for Researchers

For better reproducibility and mental health

Why Should I Use Virtual Environments?

Researchers often utilize multiple languages and packages with ever changing versions. Because the algorithms and functions in a package may change over time, code written today may yield slightly different results or not run at all tomorrow. For this reason, it’s often no longer enough to simply provide your data and code for replication. Virtual environments provide an easy solution to this problem by making computing environments that can be shared and reproduced. The best part though is that they are more than just an extra step to help researchers replicating your work. Virtual environments will make your life easier and save you time and headaches as you update packages and move from project to project. Old scripts don’t break, you'll never have to worry about dependencies, and code is more portable between local machines or the cloud.

In this tutorial I'm going to cover:

  • What environments are
  • How to set up a virtual environment for Python or R
  • How to share your environments with others
  • How to maintain your environments

What are Virtual Environments?

Think of virtual environments as isolated research labs within your computer. Each lab maintains its own equipment and when you want to work on a specific project, you go to the lab associated with that project. Each virtual environment is an isolated space that maintains its own copies of languages and packages. As the researcher you can switch between these environments quickly and easily depending on what you’re working on. For example, let’s say you’re working on some original research and you create your visualizations with ggplot 3.3. Simultaneously, you’re working on a replication project that uses ggplot 2.2. Unfortunately the visualizations written in ggplot 2.2 don’t work properly when run the code with ggplot 3.3. Troubleshooting other people’s code can be frustrating and time consuming. So rather than trying to update the 2.2 code so it runs with 3.3 packages, you can spin up a new virtual environment, install ggplot 2.2, and run the code as is. 

As a rule of thumb, everything you do in Python and R should be inside a virtual environment. This may sound complicated, but it’s incredibly easy if you use the right tools!

How to set up a virtual environment

There are several other language specific environment managers such as virtualenv for Python and renv for R, but I highly recommend using Anaconda’s conda environment manager. Conda is more intuitive than the alternatives, makes maintenance a breeze, and is widely adopted so that's what I'll use in this tutorial. 

Step 1: Install or update Anaconda

Before you start, you’re going to need Anaconda installed. If you don’t already have Anaconda installed head on over to the  Anaconda website and follow the directions for your operating system. If you already have Anaconda installed, update to version 4.6 or later. Some of the commands won't work otherwise.

Step 2: Create your environment

It’s handy to create an environment for miscellaneous tasks not attached to any project. I call it my sandbox. It has everything I regularly use, and I spend no effort worry about compatibility. If anything breaks I just delete it and create a fresh one. Let’screate a sandbox to get started. In your terminal type:

conda create --name sandbox

"sandbox" can be anything you want to name your environment. That’s it! If you want to use a specific version of python (see the section below for R versions), you can specify that when creating the environment: 

conda create --name sandbox python=3.6

Step 3: Activate/deactivate your environment

To work within your environment, you need to activate it. To do so, enter in the terminal:

conda activate sandbox

To exit the environment, enter: 

conda deactivate

How to manage and use your environments

Installing packages

To install the latest version of a package or language to you environment, use the conda install command. For example, to install the latest version of the Pandas package for Python type:

conda install pandas

R packages use the format "r-packagename". For example, to install the R language to your environment enter:

conda install r-base

To install a specific version of the package or language, just specify then installing:

conda install r-ggplot2=2.2

If conda does not have a package you need, you can install the package through normal functions such as a pip install for python or using the install.packages() function in R. Always try to install from conda first though to avoid potential compatibility issues. Additionally, creating/deleting environments and installing new packages can often leave artifacts on your drive that take up space but aren't doing anything. To clean these up, simply run the following two commands every once in a while:

conda clean -t
conda clean -p

How to use an IDE in your environment

Most of us use an IDE such as Jupyter or R Studio when programming. You can use these in your environment as well! Applications you open in your virtual environment will use the packages and versions installed in that environment. To do so, simply activate your environment and then launch your IDE from that terminal. For example, If you have R Studio installed on your computer, you can launch it in your environment by simply typing “rstudio” in the terminal. If you just Jupyter Lab, you can either launch Jupyter by entering “jupyter lab” within your virtual environment, or you can select your environment from the Jupyter launcher if you add the kernel to the Jupyter.

How to share environments

Environments are shared by exporting a list of packages to a text file which are then used to automatically install packages. There are two formats for doing this. The first is the YML file which is used when reinstalling an environment on your own system or sharing an environment with someone using the same operating system. To create a YML file use the following command within your environment:

conda env export --name sandbox > sandbox.yml

To install an environment from a yml file, navigate your terminal to the directory with the YML file and use the following command:

conda env create --file sandbox.yml

YML files represent a best case scenario because they can install both Python and R packages from a single file. When sharing environments with users on a different operating system, users can either open the yml file and manually install the specific package versions, or use a language specific package manager. For python, environments can be exported to a requirements.txt file with pip:

pip freeze > sandbox.txt

To install an environment from a requirements.txt file use:

pip install -r requirements.txt

If you need an R specific environment I recommend either a manual install from the .yml file or using a package called renv. Renv handles environments a little bit differently so I recommend reading the documentation here if this is what you need.

Final thoughts

Keep in mind this is a rough guide. There’s more you can with environments, and there are multiple platforms outside of Conda for managing environments. You’ll undoubtedly encounter some errors while you learn. Once you figure it out though, environments are a breeze to work with and make your life easier!