Hi, in this video we're going to revisit confounding. There are two main objectives. First, we're going to learn about frontdoor paths and backdoor paths. Next, we're going to aim to understand why backdoor paths need to be blocked. As a reminder of confounders, what they are, well think about an informal definition, which is a variable that affects both the treatment and the outcome. A simple DAG that depicts confounding is here. In particular, it's depicting confounding in between A and Y. So again, we'll think about a treatment a and outcome y and in this case, X is representing confounding. So here X is affecting both A and Y. In this DAG, the situation is a little more complicated. So we have, again, our interest is in the relationship between A and Y, so an exposure and an outcome, and we see that V affects A directly, so there's a direct path from V to A, so V affects A. But V affects Y indirectly, so V affects W which affects Y. So V does affect Y but indirectly through W. So in this case, it would be reasonable to argue that V is a confounder. It does affect treatment directly, and it also affects the outcome, it just does so indirectly through its affect on W. But you could argue that in this case V is a confounder. However, it doesn't necessarily matter whether we call a variable a confounder or not. What matters is that we've sufficiently controlled for confounding. In order to do that we're going to need to block backdoor paths from treatment to outcome. And I'll explain what that means. But before we get there let's first look at frontdoor paths. So a frontdoor path from A to Y, from exposure to treatment, that's one that begins with an arrow that's emanating out of A. So here are two examples. On the left we see that A directly affects Y, so there's an arrow directly to Y. On the right, we see that A does get to Y, but indirectly through its effect on Z. So here, on the left, A directly affects Y. And that's called the frontdoor path, from A to Y. The frontdoor is essentially coming from, you can think of that information is flowing from A to Y, or you can think of swimming with the current. So you're going in the direction that you're being pushed in a sense. That's considered the frontdoor. So A to Y is a frontdoor path on the data on the left. On the right, we have another frontdoor path from A to Y and this one goes through Z. So A, Z, Y, that's also a frontdoor path from A to Y. And again, the word frontdoor is coming from strictly because of the direction of that arrow. So that's what makes it a frontdoor path is that its coming out of A. So you could also think of it as something that A is affecting. Here, A is exposure. So the exposure is affecting something. So if you think of it from a time perspective, you could think of Z as occurring after A. A is affecting Z. So that's what we mean by a frontdoor path. If we're interested in the causal relationship between A and Y, we are not typically going to worry about frontdoor paths. And in fact, we actually would not want to block anything on a frontdoor path. And that's because frontdoor paths actually capture things we are interested in which are effects of treatment. So in this first DAG, A directly affects Y, and that's fine. There's not anything else that needs to be said about that one. But let's look at the second case here, so DAG 2. A affects Y indirectly through its effect on Z. In this case, we would not want to block Z, we would not want to control for Z, because Z is part of the effective treatment. So we would now want to control for part of the effective treatment. Our interest is in the causal relationship between A and Y. And if some of that is through the causal affect that A has on Z, that's not something we're concerned about at this point. We are interested in just how does A affect Y regardless of what path it take to get there. So we don't want to control for anything on a frontdoor path. So to reiterate, in this case, we would not want to control for Z. Z is capturing some of the effective treatment. In general, what we're interested in, is if we were to manipulate A via treatment assignment, for example, if we were to assign treatment to people. What would the impact be on Y? Again, we don't want to control for an effective treatment. However, it could be that you are interested in these pathways and that would be known as a causal mediation analysis. So if you do care about how much of the effect of A on Y is through As effect on Z, that would be a causal mediation analysis. In that case you would care about frontdoor paths, you would care about quantifying it, but you still wouldn't be controlling for Z. And that is a different type of analysis that will cover, but it still surpassed under causal inference. But causal mediation analysis specifically focuses on quantifying how much of the effective treatment is through intermediate variables. But our main goal is just how much does A affect Y in general and we don't necessarily care about how it doesn't, what are the mechanisms by which it doesn't. Next, we'll move to backdoor paths, and backdoor paths are the ones I said that we need to worry about when it comes to controlling for confounding. So in this a backdoor path from treatment A down to Y. Our paths that travel from A to Y through arrows that are actually going into A, all right, so here's an arrow going into A. So in this case, a backdoor path from A to Y is as follows, so it's A to X to Y, so that's a backdoor path from A to Y. So in other words, there's one way you can get from A to Y is through the treatment effect, so A affecting Y. But there's this other path where you can get from A to Y, right. But this is a path that has nothing to do with A causing Y, right. This backdoor path does not involve any arrows coming out of A. So there's no treatment effect involved there. But A and Y are still associated with each other through that path. So this is something we have to worry about. Because if we look at just marginal associations between A and Y, some of that association will be due to a causal effect of A and Y. But some of it also could be because X causes both A and Y. So we want to be able to separate out the actual treatment effect from this kind of confounding effect, which is happening through this kind of backdoor path. So there's basically this sneaky way to get from A to Y, and we want to get rid of that. So, in general, backdoor paths confound the relationship between A and Y. We want to make sure those backdoor paths are blocked so there is no sneaky way to get from A to Y and there is only a frontdoor path which is a causal path. So we're going to focus on trying to block these paths. Our big picture goal then is to identify a set of variables that block all of the backdoor paths from treatment to outcome, from A to Y. So it just sufficiently control for confounding, that's what we're going to have to do, we're going to have to block all of these backdoor paths. So we could think of X as being a set of variables. And if we did block all of the backdoor paths from A to Y then we would have ignorability, A would be independent of the potential outcomes given X. So this is really the key thing that we are focused on here is if we eliminate all the backdoor paths by blocking them then we have the ignorability of the treatment mechanism. So ignorability would hold given X. So X would have to be this collection of variables that blocks all of the backdoor paths. So that really sets up what we're going to do next which is to discuss two different criteria for identifying sets of variables that are sufficient to control for confounding. So the two we will discuss are the backdoor path criterion and also the disjunctive cause criterion.