Create running user proxies / containers in advance

Hello,

In order not to loose time and have my app start almost immediately (as R library loading can be quite time-consumming -up to 1 minute), I would like to find a way to preload and prerrun (let’s say every night) a bunch of running containers for each user / app combination.

I read that that this was feasible using the API but unfortunately, as we rely on an active directory and as I do not know the password (of course) of the users, I cannot create the proxies linked to a user.

Would there be an another way to (re-)create the containers (e.g. with my account) and then attribute them to a given user? Which are the parameters that shinyproxy is looking for when it checks whether it must instantiate an image or re-use a container?

Thank you for your help,

Sylvain

I think I found a solution to my issue. It is far from being elegant but at least it increases the loading speed of the app.

The main idea is to have a docker image that contains a “freezed” instance of R (with the help of CRIU). All the time consuming preliminary tasks (e.g. library loading) have to be performed before the image creation. When those tasks have been performed, CRIU dumps (criu dump) the R process and save the R running instance as a set of files. The docker image containing these image files is then created and will be used by ShinyProxy.

When the container of this image is created, instead of launching a new instance of R (R -e "shiny::runApp('path/to/myApp')") and reloading the libraries, it uses CRIU to restore the image with the criu restore command.

This is the main idea but of course there are remaining tweaks that I spend quite a few hours to solve.

As it could be of interest for someone (who is not afraid of ugly-but-working code), I join the example code I have been struggling with since one week.

  1. Code of a simple app (app.R)
    This simple app simply displays histograms. I also included the Bioconductor library liftOver which takes more than 10 seconds of loading time (even if it not used by the app itself).
library(shiny)
library(liftOver)
# Define UI for app that draws a histogram ----
ui <- fluidPage(
  # App title ----
  titlePanel("Hello Shiny!"),
  # Sidebar layout with input and output definitions ----
  sidebarLayout(
    # Sidebar panel for inputs ----
    sidebarPanel(
      # Input: Slider for the number of bins ----
      sliderInput(inputId = "bins",
                  label = "Number of bins:",
                  min = 1,
                  max = 50,
                  value = 30)
    ),

    # Main panel for displaying outputs ----
    mainPanel(
      # Output: Histogram ----
      plotOutput(outputId = "distPlot")
    )
  )
)

# Define server logic required to draw a histogram ----
server <- function(input, output) {
  # Histogram of the Old Faithful Geyser Data ----
  # with requested number of bins
  # This expression that generates a histogram is wrapped in a call
  # to renderPlot to indicate that:
  #
  # 1. It is "reactive" and therefore should be automatically
  #    re-executed when inputs (input$bins) change
  # 2. Its output type is a plot
  output$distPlot <- renderPlot({
    x    <- faithful$waiting
    bins <- seq(min(x), max(x), length.out = input$bins + 1)
    hist(x, breaks = bins, col = "#75AADB", border = "white",
         xlab = "Waiting time to next eruption (in mins)",
         main = "Histogram of waiting times")
    })
}

# Create Shiny app ----
shinyApp(ui = ui, server = server)
  1. Code of the classical “shinyproxy” docker file (mycriu.dockerfile).

It must contain

  • R and a few (long time loading) libraries
  • criu
  • some libs (used by R or its librairies)
  • the running app

It is build in the classical way, provided you are in the directory containing the file (app.R). In our example, we call it criur:latest.

docker build -f mycriudockerfile.R -t criur .

FROM ubuntu:22.04

RUN apt update
ENV DEBIAN_FRONTEND=noninteractive
RUN ln -fs /usr/share/zoneinfo/Europe/Brussels /etc/localtime
RUN apt-get -y install tzdata
RUN dpkg-reconfigure -f noninteractive tzdata
RUN apt install -y r-base-core screen criu
RUN apt-get update && apt-get install -y \
    sudo \
    pandoc \
    pandoc-citeproc \
    libcurl4-gnutls-dev \
    libcairo2-dev \
    libxt-dev \
    libssl-dev \
    libssh2-1-dev \
    libssl-dev \
    libxml2-dev
RUN Rscript -e 'install.packages(c("BiocManager"),repos="https://lib.ugent.be/CRAN/")'
RUN Rscript -e 'BiocManager::install("liftOver")'
RUN Rscript -e 'install.packages(c("shinybusy", "data.table", "BiocManager", "shiny", "Rmpfr"),repos="https://lib.ugent.be/CRAN/")'


RUN mkdir -p /root/myapp
RUN echo "local({options(shiny.port = 3838, shiny.host = '0.0.0.0')})" > /etc/R/Rprofile.site
COPY app.R /root/myapp/
  1. Code of the docker file that will be used to build the running instance of R (testspeed.dockerfile)

This docker file mainly creates 3 files :

  • /rstart.R : which is run at the start of R, it loads the R libraries and then waits … unless it is run within a shinyproxy launched container. In this case, it starts the shiny app.
    Of note, there are some little R instructions that are used to catch the log of R as well as the potentially necessary shinyproxy environment variables. Indeed, I could not run R as an interactive tool from a terminal (so I redirected the log to a file). Moreover, we have to re-read the shinyproxy environment variables as when the R process is created (before the dump), those are not present and the R process is not aware of them. The shinyproxy variables must then be reloaded.
  • /launchShinyApp.sh : which starts R using rstart.R, waits that the library are loaded and dump the session in the directory /criu_dumps
  • /restore.sh : which will be run when the shinyproxy starts a new container from the image. The restore file starts by copying the shinyproxy environment variables to files so that R can the read them.
FROM criur:latest

# CREATE DUMPDIR
RUN mkdir -p /criu_dumps

# CREATE R INIT FILE
RUN cat /root/myapp/*.R | grep '^library' >> rstart.R # naive way of obtaining all library loading instruction in my app

RUN echo "buildContainer  <- basename(readLines('/proc/1/cpuset')); execContainer  <- buildContainer ; while (execContainer == buildContainer) { print ('waiting....'); execContainer  <- basename(readLines('/proc/1/cpuset'));  print (execContainer); print('Library loaded'); Sys.sleep(0.5)}" >> rstart.R
RUN echo "system(paste('mkdir -p', tempdir()))" >> rstart.R
RUN echo "system('tail --follow=name /criu_dumps/rlogs.log > /criu_dumps/rlogs_container.log  &');"  >> rstart.R 
RUN echo "shiny_user  <- basename(readLines('/shiny_user.txt'));"  >> rstart.R
RUN echo "shiny_groups  <- basename(readLines('/shiny_usergroups.txt'));"  >> rstart.R 
RUN echo "print(paste('User', shiny_user))" >> rstart.R
RUN echo "Sys.setenv('SHINYPROXY_USERNAME' = shiny_user, 'SHINYPROXY_USERGROUPS' = shiny_groups)" >> rstart.R
RUN echo "print(Sys.getenv('SHINYPROXY_USERNAME'))"  >> rstart.R
RUN echo "print(Sys.getenv('SHINYPROXY_USERGROUPS'))"  >> rstart.R
RUN echo "runApp('/root/myapp')"  >> rstart.R

# CREATE RESTORE FILE
RUN echo "#!/bin/bash"  >> ./restore.sh
RUN echo "tail -f --retry  /criu_dumps/rlogs_container.log  & "  >> ./restore.sh
RUN echo "echo \$SHINYPROXY_USERNAME > shiny_user.txt" >> ./restore.sh
RUN echo "echo \$SHINYPROXY_USERGROUPS > shiny_usergroups.txt" >> ./restore.sh
RUN echo "criu-ns restore  -vvvv  -D /criu_dumps -o /criu_dumps/restore.log " >> ./restore.sh
RUN chmod a+x ./restore.sh

# CREATE THE INITIALIZATION SCRIPT (LAUNCH R + DUMP R)
RUN echo "#!/bin/bash"  >> launchShinyApp.sh
RUN echo "setsid R -f rstart.R &> /criu_dumps/rlogs.log & " >> launchShinyApp.sh
RUN echo "LIBRARYLOADED='0'" >> launchShinyApp.sh
RUN echo "while [[ \$LIBRARYLOADED == '0' ]]" >> launchShinyApp.sh
RUN echo "do" >> launchShinyApp.sh
RUN echo "  LIBRARYLOADED=\$(grep -c 'Library loaded' /criu_dumps/rlogs.log)" >> launchShinyApp.sh
RUN echo "  echo \$LIBRARYLOADED " >> launchShinyApp.sh
RUN echo "  echo 'Waiting library loading...' " >> launchShinyApp.sh
RUN echo "  sleep 5" >> launchShinyApp.sh
RUN echo "done" >> launchShinyApp.sh

RUN echo "rpid=\$(pidof R) " >> launchShinyApp.sh
RUN echo "criu-ns dump -t \$rpid -vvv -o /criu_dumps/dump.log -D /criu_dumps/ && echo OK" >> launchShinyApp.sh
RUN echo "touch /criu_dumps/dump.done" >> launchShinyApp.sh
RUN echo "read -p 'Waiting for input'" >> launchShinyApp.sh
RUN chmod a+x launchShinyApp.sh
  1. Code of a bash script that combines all these scripts together (criu_instructions.sh)
  • From the main image with R and the shiny app, this script build the image that contains the scripts described above (rstart.R, launchShinyApp.sh and restore.sh).
  • It runs a detached container that start R with its libraries and dumps it.
  • When the R process within the container is dumped, the container is committed, i.e., an image is created from it (testspeedcriu:latest) that will be used by shinyproxy (and described in the application.yml file)
docker build --no-cache --progress=plain -f testspeed.dockerfile -t testspeed:latest .
CID=$(docker run -d --rm  --privileged -it testspeed:latest /bin/bash launchShinyApp.sh) 
DUMPDONE=""
while [[ $DUMPDONE != "/criu_dumps/dump.done" ]]
do
    DUMPDONE=$(docker exec $CID ls /criu_dumps/dump.done)
    echo $DUMPDONE
    echo "Waiting for dump..."
    sleep 5
done
docker exec $CID ls /criu_dumps/dump.done
docker commit $CID testspeedcriu:latest
docker container stop $CID
  1. Remark

shinyproxy must run the image testspeedcriu:latest in privileged mode

  - id: myapp_with_criu
    display-name: myapp_with_criu
    description: test criu with shinyproxy
    container-cmd: ["/bin/bash", "/restore.sh"] 
    container-cpu-limit: 1
    container-image: testspeedcriu:latest
    container-privileged: true
    access-groups: [bioinfo]      
5 Likes

Sylvain,

this is very interesting approach. How much seconds have you saved with this example on your environment? What was starting time with classic approach and what with your approach? Any other example from other apps with more libraries?

I have never measured time needed for container to start and compared it with time to load libraries and data, so I do not have clear idea how much of time loading of libraries takes (our apps usually loads tons of libraries).

Regards
Dusan

Hi Dusan,

From what I know, starting the container is quite fast (about one second) and what takes time is the loading of the libraries (at least for my apps).

The example hereabove makes uses of the Bioconductor liftOver library which loads a lot of data in memory. In a normal (non shiny) R session, the command library(liftOver) takes about 13 seconds.

From what I measured, the approach with a preloaded R takes about 4 seconds while the approach where everything is loaded when the container starts takes more than 15 seconds (at least on my test old server).

The figure heareafter shows the performances of the approach without (left) or with (right) the criu preloading.

Finally, keep in mind that this is only a proof of concept, I am pretty sure there are many ways to improve the use of criu (which I only discovered 2 weeks ago) or shinyproxy.

I am currently building a criu R session for one of my “real” app.

Sylvain

1 Like

That is an interesting approach. I also am dealing with extended load/launch times, however I have found that the library load time isn’t actually the bottle neck.

I documented the evidence in this thread: Shiny Performance Bottle Necks

I would be curious

You seem to work with docker swarm which I am not familiar with.

At least in my case, I know that the bottleneck was the library loading time (especially the Bioconductor library liftOver).

This is interesting indeed and I’m curious if now you have ironed this out and are happy with the results.

As an aside, I’m curious why you can’t ask each user to use a new password and tell you it? Is the Shiny app accessing data you should not have access to?