Skip to content

Checklist

Scott Sievert edited this page Apr 10, 2017 · 1 revision

Getting started

Everybody in the first four days of NEXT. On the glossary, at the very least know where to look.

  • Learn the following vocab: environment variable, server, docker, container, requests (and look at requests library), 100ms rule, module, json, aws, ec2, instance-type(costs), api, client-server model, pep-8, editor, database, memory store, asynchronous vs synchronous, daemon, queue, git (Maybe take a short quiz?)
  • Learn how to launch NEXT locally, run an existing experiment
  • Learn how to launch NEXT on ec2, run and experiment, docker-compose up, docker-compose stop, docker-compose rm, python next_ec2 rsync, how to access aws console to start and stop machines.
  • Test from the browser
  • Know how to find the dashboards and what the plots represent.
  • How to access logs, participant data, etc. from the Dashboard and api (through the browser)
  • What’s a yaml file - where is the interface located. Run (and read) a test script on ec2.
  • (Locally) How to use docker for refreshing containers

Timeline

Estimated time: 3 weeks to 1 month

  • Vocab: app, myApp, memory, alg, yaml, algyaml, dataflow(initExp, getQuery, processAnswer, getModel), widget, dashboard(!), (optional exercise - read the NIPS paper), utils.debug_print, TargetManager
  • Identify the app you want to modify, copy and rename it. Get the renamed thing to work.
  • Understand the basic dataflow/api.
    • Read through an App, and an algorithm.
    • How the butler is used.
    • Where data is saved in the butler
    • When to use butler.memory vs butler.algorithms
    • Why is the butler used in a specific place? Why have a butler?
    • Dataflow between app and alg
  • Make a plan = make your yaml file, decide on inputs and outputs
    • Questions to ask: what do you need to make a query?
    • What do you need to receive an answer?
    • Where do you see slowdowns?
    • What do you want to see on the dashboard?
    • Is this a Big Data or Small Data problem?
    • How can you make getquery as fast as possible and move the compute to processAnswer?
    • Can you daemonize? Is there a way to use Butler.job to run a background batch process (eg Triplets)
    • Emphasize, base.yaml vs yaml - again COPY AND PASTE LIKE A MOFO
    • Data management?
    • AGAIN ASK QUESTIONS.
    • Don’t confuse not knowing how to do something with
  • Get your plan reviewed by Rudi, Scott, Lalit or Daniel. - Should happen by the end of week 1, notify us ahead of time.
  • Implement the most basic use case (always random).
  • Test early and test often. The tests should be the first thing you build after the random algorithm.
  • Build dashboards in parallel to development, do not wait to do this to the end (yes this may be a bit of a cognitive overload, don’t worry too much)
  • Remember you are working with indices!!!!
  • If you have succeed so far, start implementing your actual use case. Don’t hesitate to bring up “structural issues”.
    • Do you need a new targetmanager?
    • Use Butler.memory for locking (asynchronous issues),
    • Butler.Memory for storing extremely large objects (BIG DATA vs small DATA)
    • Is your data to big to upload to S3?
  • Build your widgets. Test them in browser to ensure your app isn’t too slow
  • Schedule a code review begin extensive stress (ec2 to ec2) testing. Plan to experiment at least one week after code review.

Do and Don't list

  • Don’t write to disk ever.
  • Don’t use custom stats and logs, use getModel
  • Do use Pep-8 (as much as reasonable)
  • Do use a real editor
  • Do ask lots of questions
  • Do write clean code
    • easy logic. Ideally code calls well-named functions less than 5 lines and has complete test coverage
    • easy to read (PEP8 -- follows Python naming conventions, <90 chars, etc)
  • Don’t have your code littered with comments and old code, GIT loves you
  • Do commit a lot
  • Do be careful about imports, only use them where necessary. You can import in functions.
  • Don’t reinvent the wheel. You never need to write a regexp parser unless you are Daniel Ross.
  • Don’t blame the verifier. It has the worst error messages but I guarantee you it’s your fault.

Facing an issue?

It might be one of these

  • Large database objects (i.e., large features)? Try butler.memory.
  • Slow query fetch? Try the line profiler. Use the libraries that we provide to the fullest extent.
  • Using butler.memory.lock to ensure atomic operations? Try using atomic operations in the database. i.e., butler.algorithms.increment(key=something).
Clone this wiki locally