TIL: Launching Jupyter with a Custom Modal Image and Volume
Yesterday I learned of the Modal docs example showing how to start a jupyter server via a Modal tunnel. I was elated to see this because it solved my problem of not being able to specify a custom image when using modal launch jupyter
.
I have a Dockerfile which installs colbert-ai
from the main
branch of the stanford-futuredata/ColBERT repo with a specific PyTorch and Transformers version:
FROM mambaorg/micromamba:latest
USER root
RUN apt-get update && apt-get install -y git nano curl wget build-essential && apt-get clean && rm -rf /var/lib/apt/lists/*
RUN git clone https://github.com/stanford-futuredata/ColBERT.git /ColBERT && \
cd /ColBERT && \
micromamba create -n colbert python=3.11 cuda -c nvidia/label/11.7.1 -c conda-forge && \
micromamba install -n colbert faiss-gpu -c pytorch -c conda-forge && \
micromamba run -n colbert pip install -e . && \
micromamba run -n colbert pip install torch==1.13.1 transformers==4.38.2 pandas
ENV CONDA_DEFAULT_ENV=colbert
ENV PATH=/opt/conda/envs/colbert/bin:$PATH
WORKDIR /
RUN echo "eval \"\$(micromamba shell hook --shell bash)\"" >> ~/.bashrc && \
echo "micromamba activate colbert" >> ~/.bashrc
CMD ["/bin/bash"]
I then modified the Modal documentation example as follows (jupyter_inside_modal.py
) to use my Dockerfile to create an image and use an existing Modal Volume:
import subprocess
import time
import modal
from modal import Image, App, Secret, Volume
import datetime
import os
= os.environ.get("SOURCE", "")
SOURCE = Volume.from_name("colbert-maintenance", create_if_missing=True)
VOLUME = "/colbert-maintenance"
MOUNT = Image.from_dockerfile(f"Dockerfile.{SOURCE}", gpu="L4")
image
= App("jupyter-tunnel", image=image.pip_install("jupyter"))
app = "" # some list of characters you'll enter when accessing the Modal tunnel
JUPYTER_TOKEN
@app.function(max_containers=1, volumes={MOUNT: VOLUME}, timeout=10_000, gpu="L4")
def run_jupyter(timeout: int):
= 8888
jupyter_port with modal.forward(jupyter_port) as tunnel:
= subprocess.Popen(
jupyter_process
["jupyter",
"notebook",
"--no-browser",
"--allow-root",
"--ip=0.0.0.0",
f"--port={jupyter_port}",
"--NotebookApp.allow_origin='*'",
"--NotebookApp.allow_remote_access=1",
],={**os.environ, "JUPYTER_TOKEN": JUPYTER_TOKEN},
env
)
print(f"Jupyter available at => {tunnel.url}")
try:
= time.time() + timeout
end_time while time.time() < end_time:
5)
time.sleep(print(f"Reached end of {timeout} second timeout period. Exiting...")
except KeyboardInterrupt:
print("Exiting...")
finally:
jupyter_process.kill()
@app.local_entrypoint()
def main(timeout: int = 10_000):
=timeout) run_jupyter.remote(timeout
I then run the following locally form my terminal:
SOURCE="0.2.22.main.torch.1.13.1" modal run jupyter_inside_modal.py
Where my Dockerfile is in the same folder as jupyter_inside_modal.py
and titled Dockerfile.0.2.22.main.torch.1.13.1
. I can then access the cloned repo as well as my mounted volume and use a Jupyter Notebook to explore data, iterate on function definitions, compare model weights, add hooks to ColBERT models, and so on. This unlocks a ton of productivity and iteration velocity that I was scratching my head on how to obtain without the use of notebooks.