Checklist

Getting started

Everybody in the first four days of NEXT. On the glossary, at the very least know where to look.

Learn the following vocab: environment variable, server, docker, container, requests (and look at requests library), 100ms rule, module, json, aws, ec2, instance-type(costs), api, client-server model, pep-8, editor, database, memory store, asynchronous vs synchronous, daemon, queue, git (Maybe take a short quiz?)
Learn how to launch NEXT locally, run an existing experiment
Learn how to launch NEXT on ec2, run and experiment, docker-compose up, docker-compose stop, docker-compose rm, python next_ec2 rsync, how to access aws console to start and stop machines.
Test from the browser
Know how to find the dashboards and what the plots represent.
How to access logs, participant data, etc. from the Dashboard and api (through the browser)
What’s a yaml file - where is the interface located. Run (and read) a test script on ec2.
(Locally) How to use docker for refreshing containers

Timeline

Estimated time: 3 weeks to 1 month

Vocab: app, myApp, memory, alg, yaml, algyaml, dataflow(initExp, getQuery, processAnswer, getModel), widget, dashboard(!), (optional exercise - read the NIPS paper), utils.debug_print, TargetManager
Identify the app you want to modify, copy and rename it. Get the renamed thing to work.
Understand the basic dataflow/api.
- Read through an App, and an algorithm.
- How the butler is used.
- Where data is saved in the butler
- When to use butler.memory vs butler.algorithms
- Why is the butler used in a specific place? Why have a butler?
- Dataflow between app and alg
Make a plan = make your yaml file, decide on inputs and outputs
- Questions to ask: what do you need to make a query?
- What do you need to receive an answer?
- Where do you see slowdowns?
- What do you want to see on the dashboard?
- Is this a Big Data or Small Data problem?
- How can you make getquery as fast as possible and move the compute to processAnswer?
- Can you daemonize? Is there a way to use Butler.job to run a background batch process (eg Triplets)
- Emphasize, base.yaml vs yaml - again COPY AND PASTE LIKE A MOFO
- Data management?
- AGAIN ASK QUESTIONS.
- Don’t confuse not knowing how to do something with
Get your plan reviewed by Rudi, Scott, Lalit or Daniel. - Should happen by the end of week 1, notify us ahead of time.
Implement the most basic use case (always random).
Test early and test often. The tests should be the first thing you build after the random algorithm.
Build dashboards in parallel to development, do not wait to do this to the end (yes this may be a bit of a cognitive overload, don’t worry too much)
Remember you are working with indices!!!!
If you have succeed so far, start implementing your actual use case. Don’t hesitate to bring up “structural issues”.
- Do you need a new targetmanager?
- Use Butler.memory for locking (asynchronous issues),
- Butler.Memory for storing extremely large objects (BIG DATA vs small DATA)
- Is your data to big to upload to S3?
Build your widgets. Test them in browser to ensure your app isn’t too slow
Schedule a code review begin extensive stress (ec2 to ec2) testing. Plan to experiment at least one week after code review.

Do and Don't list

Don’t write to disk ever.
Don’t use custom stats and logs, use getModel
Do use Pep-8 (as much as reasonable)
Do use a real editor
Do ask lots of questions
Do write clean code
- easy logic. Ideally code calls well-named functions less than 5 lines and has complete test coverage
- easy to read (PEP8 -- follows Python naming conventions, <90 chars, etc)
Don’t have your code littered with comments and old code, GIT loves you
Do commit a lot
Do be careful about imports, only use them where necessary. You can import in functions.
Don’t reinvent the wheel. You never need to write a regexp parser unless you are Daniel Ross.
Don’t blame the verifier. It has the worst error messages but I guarantee you it’s your fault.

Facing an issue?

It might be one of these

Large database objects (i.e., large features)? Try butler.memory.
Slow query fetch? Try the line profiler. Use the libraries that we provide to the fullest extent.
Using butler.memory.lock to ensure atomic operations? Try using atomic operations in the database. i.e., butler.algorithms.increment(key=something).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checklist

Getting started

Timeline

Do and Don't list

Facing an issue?

Clone this wiki locally