hello and welcome to this course on

linear regression in this lecture I'm

going to introduce you to the uses of

statistical models and in this course

I'll be showing you how to develop and

interpret model results to learn about

disease so my name is viktoria and I'll

be your guide through it so this is the

second course in a specialization on

statistics of Public Health in the

previous course you met my colleague

Alex and he showed you how to take your

time to work out the right questions so

that you can turn your questions into

testable hypotheses and he also taught

you practical skills such as assessing

key features of a data set and

summarizing your data so you're going to

need all these skills to enable you to

move on to regression so if you're

unfamiliar with these I suggest you take

that course first so let's start by

looking at the way in which models are

commonly used for public health research

one use of models is to help us evaluate

interventions to see if they work for

example in a clinical trial we might be

interested in estimating the effect of

treatments such as a statin for heart

disease or in an observational study we

may want to determine the effects of a

certain exposure such as air pollution

for on asthma in these models our focus

is on obtaining an estimate for that

treatment or exposure that's been

adjusted for all other variables we can

also use models to help us understand

the cause of disease for example what

are the predictors for high blood

pressure or suffering anxiety or

depression in this case we're interested

in all regression estimates and their

relationship with the outcome and

finally models can be very useful tools

for prediction as they can provide us

with a chance to intervene in order to

avoid future adverse outcomes so one

risk prediction model you might be aware

of is if Remmington risk or and that

model is used to predict a patient's 10

year risk of having a cardiovascular

event we can also use prediction models

to help us diagnose patients

so in diagnosis one or more measurements

are taken and the model is used to

categorize the patient as either having

or not having the disease and one

diagnostic test that researchers are

currently working on here at Imperial

College London is a breath test to

detect stomach and esophageal cancer

they're currently looking at how the

combination of several organic compounds

in the breath can be used to diagnose

patients early so I've described three

common ways we use models in practice

and the statistical theory behind these

models is the same for all three

approaches so I now have a question for

you do you think that the way in which a

model is used in practice will alter the

way we approach developing that model so

the correct answer is yes the purpose of

the model will inform important aspects

of how we develop that model while the

statistical theory is the same our

approach to selecting variables and the

accuracy to which we wish to model the

relationships between variables will

depend on whether we want to use the

model for evaluating intervention

understanding the disease or predicting

a future outcome so that's why it's

important to find the research question

before you start developing a model so

you can see such as all models are

incredibly powerful tools for public

health research but only if they're good

ones it's very easy to develop a bad

model if we ignore missing data or make

unreasonably strong assumptions about

the relationships between variables the

model will give us the wrong answer and

this can waste huge amounts of resource

and lead us down the wrong research

track so this is why it's important you

get to take your time to know your data

before you start any analysis and

thereafter fitting a model you check the

assumptions you've made so I'm looking

forward to teaching you all about linear

regression and model building we're

going to start with correlation and then

I'll introduce you to linear regression

and I'll show you how to fit and

interpret model results and then the

next step will be to unpick the

relationships between all of our

variables we

will do using multiple linear regression

so you can see there's lots for you to

learn so let's get started

[Music]