|
| 1 | +This is a coding challenge for students in Lambda School's Machine Learning |
| 2 | +course. It covers some of the basic mathematical functions that appear in |
| 3 | +reinforcement learning. |
| 4 | + |
| 5 | +# Instructions for students |
| 6 | + |
| 7 | +Visit this article: |
| 8 | +https://medium.freecodecamp.org/an-introduction-to-reinforcement-learning-4339519de419 |
| 9 | + |
| 10 | +Read up to the line "To be simple, each reward will be discounted by `gamma` to |
| 11 | +the exponent of the time step." Today we'll be implementing the equation |
| 12 | +immediately above that. For simplicity we'll assume that each round has a fixed |
| 13 | +reward `R` and that the only thing changing the utility is the discount. |
| 14 | + |
| 15 | +Your task is to calculate the total lifetime reward for a given `R` and `gamma`. |
| 16 | +You should clone this repository, open it up locally, and open the file |
| 17 | +`reward.py`. You'll see an empty function called `reward(gamma, R)`. Right now |
| 18 | +it's set to return -1.0. Change it so that it returns the correct answer. |
| 19 | +THIS IS THE ONLY FILE YOU SHOULD EDIT. |
| 20 | + |
| 21 | +A resource to review summation notation: |
| 22 | +http://www.columbia.edu/itc/sipa/math/summation.html |
| 23 | + |
| 24 | +## Testing your solution |
| 25 | + |
| 26 | +The correctness of your code will be tested with `unittest`, a testing library |
| 27 | +built in to Python. You can run tests in this directory as follows: |
| 28 | + |
| 29 | +`python test_reward.py --verbose` |
| 30 | + |
| 31 | +Depending on your computer it may be `python3` instead. And also the `--verbose` |
| 32 | +flag is optional, but helpfully expands the output. |
| 33 | + |
| 34 | +This command will run the tests contained in `test_reward.py` - you do not |
| 35 | +need to edit this file, but please do check it out to see how it works. Unit |
| 36 | +tests usually test expected input/output of a short function (a "unit" - in this |
| 37 | +case the function `reward()` you are writing). |
| 38 | + |
| 39 | +This particular test file has one always passing test and four tests that will |
| 40 | +pass once you successfully complete the `reward()` function. The always passing |
| 41 | +test is for you to be sure you're running the test properly - even before |
| 42 | +writing code you should see 1 success and 4 failures if you run the tests. |
| 43 | + |
| 44 | +If you're curious about `unittest` you can read more here: |
| 45 | +https://docs.python.org/3.6/library/unittest.html |
| 46 | + |
| 47 | +With a few exceptions if you know enough mathematics, there is no good way for a |
| 48 | +program to calculate the exact value of an infinite sum. Besides the inherent |
| 49 | +trickiness of infinite sums, this is true because of the unavoidable |
| 50 | +approximateness of floating point arithmetic. Because of this, the tests are |
| 51 | +written such that your result has to be within 1/1000th of the answer. |
0 commit comments