In this lecture we're going to talk about rotations and translations. The focus of the lecture will be in certain formations between coordinate systems. When we talked about the camera model, we talked about the camera coordinate system which has its center at the projection of the lens. And we talked also about the world coordinate system which can be a reference frame fixed in the real world. In this picture here we see the Bebop, one of the commercial quadratures and we see a camera coordinate frame which is centered at the lens in the front of the Bebop. With optical axis going outwards and the X-Axis to the right and the Y-Axis downwards. We're going to find out how to write coordinate transformations built out of rotations/translations between these two coordinate systems. We're going to use the convention that we always have a red for X-axis, green for the Y-axis, and blue for the Z-axis. You remember RGB as a standard color convention. So RGB will stand for XYZ. In computer vision, we project points into image planes. We take a picture of the point with a camera. This point, however, might be known in a world coordinate system. We might know the GPS coordinates of this point. So on one hand we will have the coordinates of the point in GPS coordinates. We will call these coordinates, world coordinates. And on the other hand, we're going to have the coordinates of the point in the camera coordinate system, exactly the ray that is going out from the projection center and goes to the point. We are going to use the W and the C as prescripts for the vectors for the point P. So we're going to say, sub P, sub cP for the coordinates of the point respect to the camera, and sub wP for the coordinates with respect to the world. Now we are going to write the rotation from the camera to work organize system as a prescript c rotation sub script w and prescript c translation subscript w. How can we find out if we really see a snapshot of the scene? How the camera is aligned with a world coordinated system, how can we find out what is this rotation and translation? The trick to find this out is always to look at this equation. The point in respect to camera equals the rotation times the point with respect to the world plus the translation. Let's look first at the translation. In this equation if we set the world point coordinates equal to 0 then we have the origin of the world. In this case, the vector Pc starts from the camera coordinate systems and goes to the origin of the World Coordinate System. And because wP is equal 0, we have that this vector is equal to the translation. So when we write camera point equals rotation times world point plus translation this translation is always from camera to world. Let us find out now how we can write the rotation. The rotation is always on orthogonal matrix. Orthogonal matrix means that it has 3 orthogonal column vectors: r1, r2, r3. You can use your right hand fingers to denote r1, r2 and r3. And in our case we will have a red, green and blue in the picture. Now, if we follow the same trick when dealing with translation, and we set momentarily those lengths are equal to 0, so we take it out of the picture. And we replace the world point with 1,0,0. We see that if we multiply r1, r2, r3 times 1,0,0 we'll get r1. The same way we multiply with 0,1,0, we get r2. And 0,0,1, we get r3. But what is actually 1,0,0 world? This is it's red vector, the X-axis of the world coordinate system. So the meaning of the rotation matrix is the following. The first column is the X-axis of the world with respect to the camera. The second column is the Y-axis of the world with respect to the camera. And the third column is the Z-axis of the world respect to the camera. The way we see it in the picture, the red is the X-axis of the world is r1, the green is r2 and the blue is r3. The best way to understand the simple interpretation of the rotation and translation is with an example. Let's look at this example where we see a camera coordinate system with the X-axis, the red vector, going inside the slide. And the world coordinate system, with the red vector of the world coordinate system going outside the slide. So, how do we write the rotation metrics? What is the X-axis of the world with respect to the camera? This is a vector which is coming out this way while the camera was going inside this light. This is a vector which is coming out this way while the camera exactly is going this way. So they are parallel, but in opposite directions. This means that r1 is -1, 0, 0. What is the vector? That is the second column. The second column is actually parallel to the Z-axis, but in the opposite direction, so it is 0, 0, -1. What is the third column? The blue of the world with respect to the camera. This is parallel to the green of the camera or to the Y-axis but the gain in the opposite direction. So it is 0 minus 1, 0. So with this very easy way we have found exactly the formation, the rotation actually from world to the camera. How can we find the translation? That's the easy part. This is just the vector from the origin of the camera to the origin of the world. If this vector is inside the Y-Zed plane of the camera it doesn't have any X-coordinates so it starts with 0, and then it goes 5 down and 10 towards the world coordinate system. We want to make sure that the rotation matrix is always special orthogonal. Not only that the vectors are orthogonal to each other, but also that the determinant is equal to 1. And indeed in this case if we compute the determinant it is -1 times the sub-determinant which is 0 minus 1 minus 1, 0 which makes it 1. This is a final verification we always have to do so that we guarantee that we are working with right-handed coordinate systems. Now let's assume that we have one more coordinate frame and see how we can relate three coordinate frames to each other. This coordinate frame we're going to add is something very common in quadrotors, it is a body frame with X-axis going forward, Y-axis to the right and Z-axis going downwards. Again, make always a check that it is X, Y and Z and that this is a right handed coordinated system. We still have the camera coordinate system and the world coordinate system. The names we use for the body axis are actually for the X-axis it is the raw angle, for the Y-axis it is the each and for the Z-axis it is the yarn. The best way to concatenate matrices and to relate multiple transformation matrices to each other is to write them as 4x4 matrices by concatenating the rotation in translation, and just adding a row which is 0, 0, 0, 1. Then by simple 4x4 matrix multiplication we can find the transformation from the world to the body frame by writing the transformation from the world to the camera, times the transformation from the camera to the body. This is a operation which is very common in computer graphics. Now we have seen that transformation from world to camera or [INAUDIBLE] w transformation c. This is actually the inverse transformation from the camera to the world. How does this invert transformation look like? We know that the inverse of a rotation matrix is the rotation matrix transposed. What about the inverse translation? It is easy to find by taking the inverse of the 4x4 matrix. Then we will see that on the upper-right hand we have minus rotation transposed translation. This is really the inverse translation, which is the translation from the origin of the world to the origin of the camera, the orange vector we see there. This answers also the question if we have this transformation, where exactly were coordinates, let's say in GPS coordinates is the camera? It's a position minus our transposed translation. Now is there any alternative interpretation of these transformations? Is there any other way except by the rotation of the columns and the vector between the origins that we can find these transformation? And indeed there is one, which is by rotating and translating the actual coordinate system. At the end of this slide we're going to show an actual animation of how this works, but let's now consider again these two coordinate frames, the camera and the world. Again, with a convention, red is always X, blue is Y, and green is the Z-axis. And see how can we find the transformation between the two. This involves three steps. First, we can move the camera coordinate system on top of the world coordinate system. This is a 4x4 matrix where there is no rotation. That's why we write identity and in the last column, we have the translation vector. To avoid mixing the coordinate system, we just eliminate this translation vector and we show the two coordinate systems. A second is a rotation around the X-axis which will bring the two Z-axis aligned. So the X-axis is this one. It is just a rotation by 90 degrees which is not in the positive direction. It is in the negative direction. So this can be written as the matrix with a rotation only in the upper 3x3 matrix and the last column can be 0, 0, 0, 1. Now, why does it have 1, 0, 0, 0, 0, 1 and then 0, -1 0? It is because this angle is -90 degrees. In the diagonal elements 2, 12, 3,3 we have the cosine of 90, which is 0. And then we have minus sine minus 90, which makes 1 and sine minus 90, which makes minus 1. At this point we need one more rotation. You see the coordinate systems are not yet aligned. We have the Z-axis aligned, but we need still to rotate to align the red with the green. If you watch carefully, this rotation of 180 degrees around the Z-axis. And this can be written this way. Again, it's a pure rotation matrix Inside the 4X4 transformation, and having the cosine 180 degrees minus 1 and the rest of the element 0. We have applied three steps, a translation and two rotations. When these are always will respect to the coordinates frame of the last post, we always post multiply. This is a golden rule of moving coordinate frames. So when we first translate, then rotate around the X-axis, then rotate about the Z-axis. This is always the last X-axis or the last Z-axis, then we always post multiply. And we will see if we post multiply these matrices, I get exactly the same matrix. We had by having the interpretation of the orthogonal column vectors. So we have seen two ways to find transformation between two coordinate systems. One was by the interpretation of the columns and the translation vector, and the other is by actual motion. And I'm going to show you now, with actual motions, how this happens so that you understand better this [INAUDIBLE] of the three motions. So I'm going to show you in a real situation where I'm going to use pencils for the coordinate axis, how this coordinate system transformations look like. We have a camera here and you see the lens of the camera, and I'm going to put the camera coordinate system on top of the lens that it's almost aligned with the actual coordinate system of the camera. So we have the X-axis and I'm going to show you like this. The X-axis and the Y-axis, always with the convention red, green for the X and Y and blue for the Z. And if you have seen in model of you always get the X-axis of the images this way and the Y-axis this way. So it's perfectly aligned with what you do in program. Now we have also a checkerboard which is usually applied in calibration scenarios, when we want to calibrate. And we know every point in this checkerboard. We know in respect to the origin, it is something that we printed. And let's have the following situation where the camera coordinates system is orient it in a way that the blue is z-axis is parallel to the calibration pattern as well as the X-axis, and we have the Y-axis looking downwards. So you see how about it looks like. If we write the world coordinate system, with respect to the camera coordinate system, you will see that the X-axis of the world., which is actually this red axis here. It is in opposite direction of the X-axis of the camera. So it is minus 1, 0, 0. The Y-axis of the world is in the opposite axis of the Z, so it 0, 0 minus 1. And the Z-axis of the world is in the opposite direction than the Y-axis of the camera, so it will be 0 minus 1, 0. So this is the interpretation we have shown with columns as the rotation of the axis with respect to the camera. Let's see how can we stop this the second interpretation is a concatenation of three motions. The first one was a translation and this was pretty much moving the two coordinate systems one onto the other. So I can do it by, for example by moving the calibration pattern and just align them almost like this. I don't have space to this, but imagine that I have done this as a first step. Then what I can do to align the Z-axis is to rotate around the X-axis of the camera by -90, which would be this way. And then the only thing I have to do is to rotate around the Z-axis about 180 degrees, which will be this way. So this way the two coordinate systems are perfectly aligned. And I have written the formation as a concatenation of three transformations.