You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/api.rst
+5-7
Original file line number
Diff line number
Diff line change
@@ -22,19 +22,21 @@ Gridding
22
22
--------
23
23
24
24
.. automodule:: mpol.gridding
25
-
:members:
26
25
27
26
Datasets and Cross-Validation
28
27
-----------------------------
29
28
30
29
.. automodule:: mpol.datasets
31
-
:members:
32
30
33
31
Images
34
32
------
35
33
36
34
.. automodule:: mpol.images
37
-
:members:
35
+
36
+
Fourier
37
+
-------
38
+
39
+
.. automodule:: mpol.fourier
38
40
39
41
40
42
Precomposed Modules
@@ -43,14 +45,11 @@ Precomposed Modules
43
45
For convenience, we provide some "precomposed" `modules <https://pytorch.org/docs/stable/notes/modules.html>`_ which may be useful for simple imaging or modeling applications. In general, though, we encourage you to compose your own set of layers if your application requires it. The source code for a precomposed network can provide useful a starting point. We also recommend checking out the PyTorch documentation on `modules <https://pytorch.org/docs/stable/notes/modules.html>`_.
44
46
45
47
.. automodule:: mpol.precomposed
46
-
:members:
47
-
48
48
49
49
Losses
50
50
------
51
51
52
52
.. automodule:: mpol.losses
53
-
:members:
54
53
55
54
56
55
Connectors
@@ -61,4 +60,3 @@ The objects in the Images and Precomposed modules are focused on bringing some i
61
60
Connectors are a PyTorch layer to help compute those residual visibilities (on a gridded form).
Copy file name to clipboardexpand all lines: docs/ci-tutorials/PyTorch.md
+14-19
Original file line number
Diff line number
Diff line change
@@ -14,17 +14,16 @@ kernelspec:
14
14
15
15
```{code-cell}
16
16
:tags: [hide-cell]
17
-
%matplotlib inline
18
17
%run notebook_setup
19
18
```
20
19
21
20
# Introduction to PyTorch: Tensors and Gradient Descent
22
21
23
-
This tutorial provides an introduction to PyTorch tensors, automatic differentiation, and optimization with gradient descent.
22
+
This tutorial provides a gentle introduction to PyTorch tensors, automatic differentiation, and optimization with gradient descent outside of any specifics about radio interferometry or the MPoL package itself.
24
23
25
24
## Introduction to Tensors
26
25
27
-
Tensors are matrices, similar to numpy arrays, with the added benefit that they can be used to calculate gradients (more on that later). MPoL is built on PyTorch, and uses a form of gradient descent optimization to find the "best" image given a dataset and choice of regularizers.
26
+
Tensors are multi-dimensional arrays, similar to numpy arrays, with the added benefit that they can be used to calculate gradients (more on that later). MPoL is built on the [PyTorch](https://pytorch.org/) machine learning library, and uses a form of gradient descent optimization to find the "best" image given some dataset and loss function, which may include regularizers.
28
27
29
28
We'll start this tutorial by importing the torch and numpy packages. Make sure you have [PyTorch installed](https://pytorch.org/get-started/locally/) before proceeding.
PyTorch provides a key functionality---the ability to calculate the gradients on tensors. Let's start by creating a tensor with a single value. Here we are setting ``requires_grad = True``, we'll see why this is important in a moment.
67
+
PyTorch allows us to calculate the gradients on tensors, which is a key functionality underlying MPoL. Let's start by creating a tensor with a single value. Here we are setting ``requires_grad = True``; we'll see why this is important in a moment.
69
68
70
69
```{code-cell}
71
70
x = torch.tensor(3.0, requires_grad=True)
@@ -78,7 +77,7 @@ Let's define some variable $y$ in terms of $x$:
78
77
y = x ** 2
79
78
```
80
79
81
-
We see that the value of $y$ is as we expect---nothing too strange here.
80
+
We see that the value of $y$ is as we expect---nothing too strange here.
82
81
83
82
```{code-cell}
84
83
print(f"x: {x}")
@@ -87,26 +86,24 @@ print(f"y: {y}")
87
86
88
87
But what if we wanted to calculate the gradient of $y$ with respect to $x$? Using calculus, we find that the answer is $\frac{dy}{dx} = 2x$. The derivative evaluated at $x = 3$ is $6$.
89
88
90
-
The magic is that can use PyTorch to get the same answer---no analytic derivative needed!
89
+
We can use PyTorch to get the same answer---no analytic derivative needed!
91
90
92
91
```{code-cell}
93
92
y.backward() # populates gradient (.grad) attributes of y with respect to all of its independent variables
94
93
x.grad # returns the grad attribute (the gradient) of y with respect to x
95
94
```
96
95
97
-
PyTorch uses the concept of automatic differentiation to calculate the derivative. Instead of computing the derivative as we would by hand, the program is using a computational graph and mechanistic application of the chain rule. For example, a computational graph with several operations on $x$ resulting in a final output $y$ will use the chain rule to compute the differential associated with each operation and multiply these differentials together to get the derivative of $y$ with respect to $x$.
96
+
PyTorch uses the concept of [automatic differentiation](https://arxiv.org/abs/1502.05767) to calculate the derivative. Instead of computing the derivative as we would by hand, the program uses a computational graph and the mechanistic application of the chain rule. For example, a computational graph with several operations on $x$ resulting in a final output $y$ will use the chain rule to compute the differential associated with each operation and multiply these differentials together to get the derivative of $y$ with respect to $x$.
98
97
99
98
+++
100
99
101
100
## Optimizing a Function with Gradient Descent
102
101
103
-
If we were on the side of a hill in the dark and we wanted to get down to the bottom of a valley, how would we do it?
102
+
If we were on the side of a hill in the dark and we wanted to get down to the bottom of a valley, how might we do it?
104
103
105
-
We wouldn't be able to see all the way to the bottom of the valley, but we could feel which way is down based on the incline of where we are standing. We would take steps in the downward direction and we'd know when to stop when the ground felt flat.
104
+
We can't see all the way to the bottom of the valley, but we can feel which way is down based on the incline of where we are standing. We might take steps in the downward direction and we'd know when to stop when the ground finally felt flat. We would also need to consider how large our steps should be. If we take very small steps, it will take us a longer time than if we take larger steps. However, if we take large leaps, we might completely miss the flat part of the valley, and jump straight across to the other side of the valley.
106
105
107
-
Before we leap, though, we need to consider how large our steps should be. If we take very small steps, it will take us a longer time than if we take larger steps. However, if we take large leaps, we might completely miss the flat part of the valley, and jump straight across to the other side of the valley.
108
-
109
-
We can look at the gradient descent from a more mathematical lense by looking at the graph $y = x^2$:
106
+
Now let's take a more quantitative look at the gradient descent using the function $y = x^2$:
110
107
111
108
```{code-cell}
112
109
def y(x):
@@ -115,7 +112,7 @@ def y(x):
115
112
116
113
We will choose some arbitrary place to start on the left side of the hill and use PyTorch to calculate the tangent.
117
114
118
-
Note that Matplotlib requires numpy arrays instead of PyTorch tensors, so in the following code you might see the occasional ``detach().numpy()`` or ``.item()`` calls, which are used to convert PyTorch tensors to numpy arrays and scalar values, respectively. When it comes time to use MPoL for RML imaging, or any large production run, we'll try to keep the calculations native to PyTorch tensors as long as possible, to avoid the overhead of converting types.
115
+
Note that the plotting library Matplotlib requires numpy arrays instead of PyTorch tensors, so in the following code you might see the occasional ``detach().numpy()`` or ``.item()`` calls, which are used to convert PyTorch tensors to numpy arrays and scalar values, respectively, for plotting. When it comes time to use MPoL for RML imaging, or any large production run, we'll try to keep the calculations native to PyTorch tensors as long as possible, to avoid the overhead of converting types.
119
116
120
117
```{code-cell}
121
118
x = torch.linspace(-5, 5, 100)
@@ -143,16 +140,14 @@ plt.ylim(ymin=0, ymax=25)
143
140
plt.show()
144
141
```
145
142
146
-
We see we need to go to the right to go down toward the minimum. For a multivariate function, the gradient will point in the direction of the steepest downward slope. When we take steps, we find the x coordinate of our new location by this equation:
143
+
We see we need to go to the right to go down toward the minimum. For a multivariate function, the gradient will be a vector pointing in the direction of the steepest downward slope. When we take steps, we find the x coordinate of our new location by:
- $\nabla y(x_\mathrm{current})$ is the gradient at our current point
155
-
156
151
- $(\mathrm{step\,size})$ is a value we choose that scales our steps
157
152
158
153
We will choose ``step_size = 0.1``:
@@ -206,7 +201,7 @@ plt.ylabel(r"$y$")
206
201
plt.show()
207
202
```
208
203
209
-
The gradient at our new point (shown in orange) is still not close to zero, meaning we haven't reached the minimum. We continue this process of checking if the gradient is nearly zero, and taking a step in the direction of steepest descent until we reach the bottom of the valley. We'll say we've reached the bottom of the valley when the absolute value of the gradient is $<0.1$:
204
+
The gradient at our new point (shown in orange) is still not close to zero, meaning we haven't reached the minimum. We'll continue this process of checking if the gradient is nearly zero, and take a step in the direction of steepest descent until we reach the bottom of the valley. We'll say we've reached the bottom of the valley when the absolute value of the gradient is $<0.1$:
*Note the change in scale.* With only one step, we already see that we stepped *right over* the minimum to somewhere far up the other side of the valley (orange point)! This is not good. If we kept iterating with the same learning rate, we'd find that the optimization process diverges and the step sizes start blowing up. This is why it is important to pick the proper step size by setting the learning rate appropriately. Steps that are too small take a long time while steps that are too large render the optimization process invalid. In this case, a reasonable choice appears to be ``step size = 0.6``, which would have reached pretty close to the minimum after only 3 steps.
295
+
*Note the change in scale!* With only one step, we already see that we stepped *right over* the minimum to somewhere far up the other side of the valley (orange point)! This is not good. If we kept iterating with the same learning rate, we'd find that the optimization process diverges and the step sizes start blowing up. This is why it is important to pick the proper step size by setting the learning rate appropriately. Steps that are too small take a long time while steps that are too large render the optimization process invalid. In this case, a reasonable choice appears to be ``step size = 0.6``, which would have reached pretty close to the minimum after only 3 steps.
301
296
302
297
To sum up, optimizing a function with gradient descent consists of
0 commit comments