Docker swarm: container unreponsive (at worker node)

This is similar to an earlier question ‘Docker Toolbox - Container unreponsive’ but I have the error only when the app runs on the worker node, but no error if the app runs on the manager node. This only happens when the swarm is in the cloud. On my local dev machine when I make a swarm of virtual machines, both manager and workers serve the app correctly. I’ve checked the ports 2376, 2377, 7946 and 4789, and they seem to be open.

This is the error from ‘docker service logs shinyproxy_shinyproxy’:

shinyproxy_shinyproxy.1.wnzoxs50rrk8@pg-manager1    | 2020-08-31 12:18:26.968  INFO 1 --- [  XNIO-2 task-3] e.o.containerproxy.service.UserService   : User logged in [user: Test User]
shinyproxy_shinyproxy.1.wnzoxs50rrk8@pg-manager1    | 2020-08-31 12:18:40.907  WARN 1 --- [  XNIO-2 task-6] e.o.shinyproxy.ShinyProxyTestStrategy    : Container unresponsive, trying again (2/150): http://5fe74a7f77e7:3838
shinyproxy_shinyproxy.1.wnzoxs50rrk8@pg-manager1    | 2020-08-31 12:18:42.908  WARN 1 --- [  XNIO-2 task-6] e.o.shinyproxy.ShinyProxyTestStrategy    : Container unresponsive, trying again (3/150): http://5fe74a7f77e7:3838
shinyproxy_shinyproxy.1.wnzoxs50rrk8@pg-manager1    | 2020-08-31 12:18:44.909  WARN 1 --- [  XNIO-2 task-6] e.o.shinyproxy.ShinyProxyTestStrategy    : Container unresponsive, trying again (4/150): http://5fe74a7f77e7:3838
shinyproxy_shinyproxy.1.wnzoxs50rrk8@pg-manager1    | 2020-08-31 12:18:46.911  WARN 1 --- [  XNIO-2 task-6] e.o.shinyproxy.ShinyProxyTestStrategy    : Container unresponsive, trying again (5/150): http://5fe74a7f77e7:3838

My docker swarm stack deploy file is:

version: '3.3'

services:
  shinyproxy:
    image: presstofan/shinyproxy-example
    ports:
      - 8080:8080
    networks:
      - traefik-public
      - sp-net
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.role==manager
      labels:
          - traefik.enable=true
          - traefik.docker.network=traefik-public
          - traefik.constraint-label=traefik-public
          - traefik.http.routers.shinyproxy.rule=Host(`${APP_DOMAIN?Variable not set}`)
          - traefik.http.routers.shinyproxy.entrypoints=http
          - traefik.http.middlewares.shinyproxy.redirectscheme.scheme=https
          - traefik.http.middlewares.shinyproxy.redirectscheme.permanent=true
          - traefik.http.routers.shinyproxy-secured.rule=Host(`${APP_DOMAIN?Variable not set}`)
          - traefik.http.routers.shinyproxy-secured.entrypoints=https
          - traefik.http.routers.shinyproxy-secured.tls.certresolver=le
          - traefik.http.services.shinyproxy-secured.loadbalancer.server.port=8080
    volumes:
      - ./application/application.yml:/opt/shinyproxy/application.yml
      - /var/run/docker.sock:/var/run/docker.sock:ro
  
networks:
  traefik-public:
    external: true
  sp-net:
    external: true

My application YAML for shinyproxy is:

proxy:
  title: Awesome OmicsPlayground Portal
  port: 8080

  authentication: keycloak
  keycloak:
    realm: myrealm
    auth-server-url: http://auth.example.com/auth
    resource: shinyproxy
    credentials-secret: XXXXXXXX

  container-backend: docker-swarm
  docker:
    internal-networking: true
  
  specs:
  - id: euler
    display-name: Euler's number
    container-cmd: ["R", "-e", "shiny::runApp('/root/euler')"]
    container-image: presstofan/shiny-euler-app
    container-network: sp-net

server:
  useForwardHeaders: true # this is very important to make the AWS Cognito auth works

Solved! It appeared I used a wrong advertise address. I was using the public IP (from docker-machine ip manager1) as advertise-address like this:

docker swarm init --advertise-addr 50.120.64.xxx

But the right thing to do was just using swarm init with default autodiscovery, which in fact used a different IP address seemingly on the overlay(?) network. Like this:

docker swarm init

This only works if you have only one IP (one ethernet adapter). I have always thought the advertise-addr should be the machine public IP but that is seemingly not true. Solved!

I have the same question. When using docker swarm init --advertise-addr XXXX(public IP) in cloud, apps can not run on the worker node. But docker swarm init works.
However, docker swarm init used the private IP. That means we could not add other ECS to join the docker swarm through this private IP. We do need public IP to initiate the docker swarm. How could we do?