Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout to opm registry serve #370

Merged
merged 3 commits into from
Jul 31, 2020

Conversation

Jobava
Copy link

@Jobava Jobava commented Jun 29, 2020

This relates to our project to serve operator metadata via pyxis:
CLOUDWF-1739

Description of the change:
Add new flag --timeout-seconds/-s to opm registry serve
which allows specifying that the server should be shut down
after a number of seconds.

The server is gracefully shut down.

Motivation for the change:

We are extracting information from the iib container and normally need this container up for a very short time
(1 min or less). Having a daemon complicates our setup (i.e. we'd need to handle server start/stop manually).

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /docs
  • Commit messages sensible and descriptive

Some manual testing done:

[opm (master *%=)]$ ./bin/opm registry serve --database scratchpad/database/index.db -s 1
WARN[0000] unable to set termination log path            error="open /dev/termination-log: permission denied"
INFO[0000] Keeping server open for 1 seconds             database=scratchpad/database/index.db port=50051
INFO[0000] serving registry                              database=scratchpad/database/index.db port=50051
INFO[0001] Timeout expired. Gracefully stopping.         database=scratchpad/database/index.db port=50051

[opm (master *%=)]$ ./bin/opm registry serve --database scratchpad/database/index.db -s 2
WARN[0000] unable to set termination log path            error="open /dev/termination-log: permission denied"
INFO[0000] Keeping server open for 2 seconds             database=scratchpad/database/index.db port=50051
INFO[0000] serving registry                              database=scratchpad/database/index.db port=50051
INFO[0002] Timeout expired. Gracefully stopping.         database=scratchpad/database/index.db port=50051

[opm (master *%=)]$ ./bin/opm registry serve --database scratchpad/database/index.db
WARN[0000] unable to set termination log path            error="open /dev/termination-log: permission denied"
INFO[0000] Keeping server open for infinite seconds      database=scratchpad/database/index.db port=50051
INFO[0000] serving registry                              database=scratchpad/database/index.db port=50051
^C

[opm (master *%=)]$ ./bin/opm registry serve --database scratchpad/database/index.db --timeout-seconds infinite
WARN[0000] unable to set termination log path            error="open /dev/termination-log: permission denied"
INFO[0000] Keeping server open for infinite seconds      database=scratchpad/database/index.db port=50051
INFO[0000] serving registry                              database=scratchpad/database/index.db port=50051
^C

[opm (master *%=)]$ ./bin/opm registry serve --database scratchpad/database/index.db --timeout-seconds 1
WARN[0000] unable to set termination log path            error="open /dev/termination-log: permission denied"
INFO[0000] Keeping server open for 1 seconds             database=scratchpad/database/index.db port=50051
INFO[0000] serving registry                              database=scratchpad/database/index.db port=50051
INFO[0001] Timeout expired. Gracefully stopping.         database=scratchpad/database/index.db port=50051

[opm (master *%=)]$ ./bin/opm registry serve --database scratchpad/database/index.db
WARN[0000] unable to set termination log path            error="open /dev/termination-log: permission denied"
INFO[0000] Keeping server open for infinite seconds      database=scratchpad/database/index.db port=50051
INFO[0000] serving registry                              database=scratchpad/database/index.db port=50051

Sorry, something went wrong.

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 29, 2020
@openshift-ci-robot
Copy link

Hi @Jobava. Thanks for your PR.

I'm waiting for a operator-framework member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Jobava Jobava changed the title Add timeout to opm serve Add timeout to opm registry serve Jun 29, 2020
@Jobava
Copy link
Author

Jobava commented Jun 29, 2020

/assign @ecordell

Copy link
Member

@dinhxuanvu dinhxuanvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Just a small change request.

@dinhxuanvu
Copy link
Member

@kevinrizza @ecordell PTAL.

@dinhxuanvu
Copy link
Member

/ok-to-test

@openshift-ci-robot openshift-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 29, 2020
@ecordell
Copy link
Member

Is the timeout command not available to you on the target system? It's part of coreutils.

@Jobava
Copy link
Author

Jobava commented Jun 30, 2020

Is the timeout command not available to you on the target system? It's part of coreutils.

Not from the context of an index container, which has just scratch, the go binaries and a sqlite db. Our use case requires us pulling the index images and we have no fancy shell features.

@Jobava
Copy link
Author

Jobava commented Jun 30, 2020

Per @gallettilance's request, I also added a test for CLI parameters.

First I considered doing a unit test for the cobra invocations, but that got a bit complicated.

Instead I'm just doing an invocation of the opm binary with the parameters from the context of a container. I found a bundles.db file in the source and using that as a basis for opm registry serve.

@Jobava Jobava requested a review from dinhxuanvu June 30, 2020 12:01
@Jobava
Copy link
Author

Jobava commented Jun 30, 2020

/retest

@kevinrizza
Copy link
Member

@Jobava

I'm having a hard time understanding the use case for this switch.

  1. Downstream the index containers are built on UBI and would have access to coreutils IIUC.
  2. That is what I imagine IIB should be using for the binary images, therefore the images that IIB is creating.
  3. What data is being pulled out of the image? I wonder if there is a way to expose that on the image instead of needing to run it for 2s?

@kevinrizza
Copy link
Member

That being said, there's nothing wrong with this code, I just want to make sure we're solving the problem the right way.

@Jobava
Copy link
Author

Jobava commented Jun 30, 2020

@kensipe

@Jobava

I'm having a hard time understanding the use case for this switch.

1. Downstream the index containers are built on UBI and would have access to coreutils IIUC.

2. That is what I imagine IIB should be using for the binary images, therefore the images that IIB is creating.

3. What data is being pulled out of the image? I wonder if there is a way to expose that on the image instead of needing to run it for 2s?

Our project requires us to collect this data and serve it from Pyxis. The data will be ultimately used by Catalog or others for display purposes.

As to why we want to serve it from pyxis and not just let Catalog collect it... that's because index data uses container registries for storage and pulling a whole index image is pretty expensive [1]. So we have to load every single index container, extract the operator metadata and store it. Having a timeout here just makes our job a bit easier (i.e. we can load data from kubernetes Jobs).

We are pulling all the data and storing it (metadata, CSV, CRD etc), but are exposing just some of it to Catalog for now (like capabilities, ALM examples etc).

[1] - ~ 500 MiB of data (the docker image size) for 50KiB of actual index metadata (in uncompressed yaml form)

@kevinrizza
Copy link
Member

If opm had a command that could extract the contents of the index, wouldn't running the image at all be duplicate effort? opm already has opm index extract which fetches the manifests from all the bundles and puts them onto local disk. What if we just extended that to provide a serialized json/yaml blob that could be read by your tool directly, without the need for a kube cluster at all?

amarza-rh added 2 commits July 1, 2020 15:30

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add new flag --timeout-seconds to `opm registry serve`
which allows specifying that the server should be shut down
after a number of seconds.

The server is gracefully shut down.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@Jobava
Copy link
Author

Jobava commented Jul 1, 2020

After talking to @kevinrizza , I removed the short version of the flag. We agreed that eventually opm will have a function to export data as json and after that our tooling will switch to using that. After such time we can also remove the --timeout-seconds flag.

/retest

@Jobava Jobava requested a review from Bowenislandsong July 1, 2020 13:38
@Jobava
Copy link
Author

Jobava commented Jul 10, 2020

Is this good for merging?

@kevinrizza
Copy link
Member

Aside from that one comment, I think this is fine to merge.

@kevinrizza
Copy link
Member

/approve

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jobava, kevinrizza

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 14, 2020
@Jobava
Copy link
Author

Jobava commented Jul 28, 2020

Hi again. Can we get an idea of when this can be merged?

@ecordell
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@Jobava
Copy link
Author

Jobava commented Jul 31, 2020

@kevinrizza I see the test has failed, but can't tell why it is failing

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit e590015 into operator-framework:master Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants