Monday, March 16, 2015

pip - Repeated installation of a package inside docker image


I built a python package called my-package. I have no intension to make it public so installation is mostly through our internal servers. Recently one senior developer built an architecture using docker where the application is hosted and my-package is a dependency.


The problem is in order to test the package, I REPEATEDLY need to COPY my code into docker image, then uninstall old version of package and re-install from the local code.



  1. Rebuilding entire image again takes half an hour. - Not an option.

  2. Create another Dockerfile FROM existing image and run only specific commands to COPY and install the pip package. - My current solution yet not very efficient.


I am pretty sure the docker users would have come across this issue so need an expert opinion on the most efficient way to handle this.


UPDATE: The Dockerfile


# VERSION 1.8.2
# AUTHOR: Matthieu "Puckel_" Roisil
# DESCRIPTION: Basic Airflow container
# BUILD: docker build --rm -t puckel/docker-airflow .
# SOURCE: https://github.com/puckel/docker-airflow
FROM ubuntu:17.10
MAINTAINER Puckel_
# Never prompts the user for choices on installation/configuration of packages
ENV DEBIAN_FRONTEND noninteractive
ENV TERM linux
# Airflow
ARG AIRFLOW_VERSION=1.8.9
ARG AIRFLOW_HOME=/usr/local/airflow
# Define en_US.
ENV LANGUAGE en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LC_ALL en_US.UTF-8
ENV LC_CTYPE en_US.UTF-8
ENV LC_MESSAGES en_US.UTF-8
ENV LC_ALL en_US.UTF-8
ENV MATPLOTLIBRC /etc
RUN set -ex \
&& buildDeps=' \
python3.6-dev \
libkrb5-dev \
libsasl2-dev \
libssl-dev \
libffi-dev \
build-essential \
libblas-dev \
liblapack-dev \
libpq-dev \
git \
wget \
' \
&& apt-get update -yqq \
&& apt-get dist-upgrade -yqq \
&& apt-get install -yqq --no-install-recommends \
$buildDeps \
python3.6 \
python3.6-tk \
apt-utils \
curl \
netcat \
locales \
ca-certificates \
sudo \
libmysqlclient-dev \
&& ln -s /usr/bin/python3.6 /usr/bin/python \
&& sed -i 's/^# en_US.UTF-8 UTF-8$/en_US.UTF-8 UTF-8/g' /etc/locale.gen \
&& locale-gen \
&& update-locale LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 \
&& useradd -ms /bin/bash -d ${AIRFLOW_HOME} -u 1500 airflow \
&& mkdir ${AIRFLOW_HOME}/logs \
&& wget https://bootstrap.pypa.io/get-pip.py \
&& python get-pip.py \
&& rm -rf get-pip.py \
&& python -m pip install Cython \
&& python -m pip install requests \
&& python -m pip install pytz \
&& python -m pip install pyOpenSSL \
&& python -m pip install ndg-httpsclient \
&& python -m pip install pyasn1 \
&& python -m pip install Flask-OAuthlib \
&& python -m pip install apache-airflow[crypto,celery,postgres,ldap,jdbc,mysql,s3,samba]==$AIRFLOW_VERSION \
&& python -m pip install celery[redis]==4.1.0 \
&& python -m pip install boto3 \
&& python -m pip install pymongo \
&& python -m pip install statsd \
&& apt-get remove --purge -yqq $buildDeps \
&& apt-get clean \
&& rm -rf \
/var/lib/apt/lists/* \
/tmp/* \
/var/tmp/* \
/usr/share/man \
/usr/share/doc \
/usr/share/doc-base \
&& apt-get autoremove -yqq

The important part is in the end.


ARG CACHEBUST=1
COPY config/matplotlibrc /etc/matplotlibrc
COPY script/entrypoint.sh /entrypoint.sh
COPY script/shell.sh /shell.sh
COPY config/airflow.cfg ${AIRFLOW_HOME}/airflow.cfg
RUN chown -R airflow: ${AIRFLOW_HOME}
RUN pip install matplotlib seaborn xlsxwriter pandas Jinja2
#Add custom PIP repo - THIS IS OF INTEREST
COPY config/pip.conf /etc/pip.conf
RUN python -m pip install my-package
COPY my-package2 /usr/local/my-package2
# RUN pip uninstall my-package2
RUN python -m pip install /usr/local/my-package2
EXPOSE 8080 5555 8793
USER airflow
WORKDIR ${AIRFLOW_HOME}
ENTRYPOINT ["/entrypoint.sh"]

As you can see, I copy my-package2 from my local machine to the image and run pip install.



  1. The image size is getting bigger every time I rebuild the image.

  2. Volumes is definitely an option I haven't tried yet. I already make use of script/shell.sh which just has $@. I set that as entry point and run any command I wish to run inside the image without much haggle.

  3. I use docker-compose so every time I rebuild with the new tag, I need to update in docker-compose as well. Over the time it gets annoying to do this for a single line change in code.


Answer



You will need to share some of your dockerfile so we could understand why it
takes so long to install the pip package.
If you wish to optimize it, these references might help :


An alternate solution is, instead of building an image for testing,
just use the package from the host via the Docker parameter of
-v /host/directory:/container/directory.


This will let you immediately test your package in the context of the
container, so you will only create the production image when the testing
is complete.


Much more information can be found, for example :
Understanding Volumes in Docker.




From your posted dockerfile, it seems that almost all of it is for
installing dependencies.
For testing, you can create an image where all these dependencies are
already installed,
then just repeat the last step for installing your application each time
for testing.


For readability, you could finally write the dockerfile as
multi-stage,
for separating dependencies-building from production, and perhaps also
to only generate a final minimal production build.
The
ONBUILD instruction might be useful here.


Only you know what you are trying to achieve and what are your constraints.
The above links can serve as a starting-point, and there are many more
articles to be found on the subject.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...