-
Notifications
You must be signed in to change notification settings - Fork 53
Checklist
Scott Sievert edited this page Apr 10, 2017
·
1 revision
Everybody in the first four days of NEXT. On the glossary, at the very least know where to look.
- Learn the following vocab: environment variable, server, docker, container, requests (and look at requests library), 100ms rule, module, json, aws, ec2, instance-type(costs), api, client-server model, pep-8, editor, database, memory store, asynchronous vs synchronous, daemon, queue, git (Maybe take a short quiz?)
- Learn how to launch NEXT locally, run an existing experiment
- Learn how to launch NEXT on ec2, run and experiment, docker-compose up, docker-compose stop, docker-compose rm, python next_ec2 rsync, how to access aws console to start and stop machines.
- Test from the browser
- Know how to find the dashboards and what the plots represent.
- How to access logs, participant data, etc. from the Dashboard and api (through the browser)
- What’s a yaml file - where is the interface located. Run (and read) a test script on ec2.
- (Locally) How to use docker for refreshing containers
Estimated time: 3 weeks to 1 month
- Vocab: app, myApp, memory, alg, yaml, algyaml, dataflow(initExp, getQuery, processAnswer, getModel), widget, dashboard(!), (optional exercise - read the NIPS paper), utils.debug_print, TargetManager
- Identify the app you want to modify, copy and rename it. Get the renamed thing to work.
- Understand the basic dataflow/api.
- Read through an App, and an algorithm.
- How the butler is used.
- Where data is saved in the butler
- When to use butler.memory vs butler.algorithms
- Why is the butler used in a specific place? Why have a butler?
- Dataflow between app and alg
- Make a plan = make your yaml file, decide on inputs and outputs
- Questions to ask: what do you need to make a query?
- What do you need to receive an answer?
- Where do you see slowdowns?
- What do you want to see on the dashboard?
- Is this a Big Data or Small Data problem?
- How can you make getquery as fast as possible and move the compute to processAnswer?
- Can you daemonize? Is there a way to use Butler.job to run a background batch process (eg Triplets)
- Emphasize, base.yaml vs yaml - again COPY AND PASTE LIKE A MOFO
- Data management?
- AGAIN ASK QUESTIONS.
- Don’t confuse not knowing how to do something with
- Get your plan reviewed by Rudi, Scott, Lalit or Daniel. - Should happen by the end of week 1, notify us ahead of time.
- Implement the most basic use case (always random).
- Test early and test often. The tests should be the first thing you build after the random algorithm.
- Build dashboards in parallel to development, do not wait to do this to the end (yes this may be a bit of a cognitive overload, don’t worry too much)
- Remember you are working with indices!!!!
- If you have succeed so far, start implementing your actual use case. Don’t hesitate to bring up “structural issues”.
- Do you need a new targetmanager?
- Use Butler.memory for locking (asynchronous issues),
- Butler.Memory for storing extremely large objects (BIG DATA vs small DATA)
- Is your data to big to upload to S3?
- Build your widgets. Test them in browser to ensure your app isn’t too slow
- Schedule a code review begin extensive stress (ec2 to ec2) testing. Plan to experiment at least one week after code review.
- Don’t write to disk ever.
- Don’t use custom stats and logs, use getModel
- Do use Pep-8 (as much as reasonable)
- Do use a real editor
- Do ask lots of questions
- Do write clean code
- easy logic. Ideally code calls well-named functions less than 5 lines and has complete test coverage
- easy to read (PEP8 -- follows Python naming conventions, <90 chars, etc)
- Don’t have your code littered with comments and old code, GIT loves you
- Do commit a lot
- Do be careful about imports, only use them where necessary. You can import in functions.
- Don’t reinvent the wheel. You never need to write a regexp parser unless you are Daniel Ross.
- Don’t blame the verifier. It has the worst error messages but I guarantee you it’s your fault.
It might be one of these
- Large database objects (i.e., large features)? Try butler.memory.
- Slow query fetch? Try the line profiler. Use the libraries that we provide to the fullest extent.
- Using butler.memory.lock to ensure atomic operations? Try using atomic operations in the database. i.e.,
butler.algorithms.increment(key=something)
.