[MUSIC] Hi, in this module we're going to talk about the data section of the research proposal. The section on data is, again, of crucial importance to the research proposal. It has to address and provide information on a number of points related to the data. One is the geographic coverage of the data that you are using. So, for example, if you're using a national level study, then you may indicate that. If you're using a nonnational study, one that's perhaps data from a particular region, a particular set of communities, then you have to be clear about the geography, the geographic context of those communities. You have to make clear the time span covered by your data. That's perhaps less of an issue for contemporary data where it maybe obvious that the data refer to the last two years or last two decades. But certainly in longer term or historical studies, there has to be some discussion of the time period that is covered by the data. Very importantly, there has to be some discussion of the origin of the data. Now if you are making use of data that's already publicly available, we're going to talk about that in our later module, data that you've downloaded from the web or acquired from a government website, then the discussion of the origin of the data may be fairly brief. You may refer to the information, the documentation of the data and talk about who conducted the survey and so forth. If you're working with historical data, as I do, then the origin discussion maybe much more detailed. So, in the case of my own research which makes use of historical data from 18th to 19th Century China, we have to go back and talk about the specific government institutions that produced the raw data. And then where that data were archived, those documents were held, and then how those documents were eventually transformed into the data that we analyzed in the study. This is also where we have to talk about the format and the organization of our data. Now, especially for quantitative data, some formats are fairly straight forward. If you just have a single cross-sectional survey, we talked about cross-sectional surveys back in part one and we'll mention them again in later modules. You may have a very simple dataset that looks like something you might open up in an Excel spreadsheet, just the rows and columns, perhaps one row per person, one column per variable. However, nowadays, it's very common for people to make use of extremely complex longitudinal datasets that come from longitudinal surveys in which people are interviewed at multiple points in time. Or people, as in my case, make use of complex historical data that is put together in very complex databases. We have to discuss that aspect of format and organization when we introduce our data, and then talk about how we turned it into a dataset that we actually conduct analysis on in order to address our research questions. Finally, the section on data may also be an opportunity to revisit the issue of representativeness. Now we may have talked about representativeness before in the literature sections, that is the literature or the background section. Or we may have talked about it in methodology. That was quite often from the perspective of talking about the representativeness of the populations that we sought to study and how they relate to other populations. Here, when we talk about representativeness, we may be talking about how the data that we have collected that we hope describes a population is actually representative of that population. So does it contain particular omissions of particular kinds of people, from the population that we're interested in studying. So this section on data has several questions that it typically needs to address. One, it has to establish the strength of the data with respect to the question at hand. That is, we have to show that the data that we plan to use is the right data for the task at hand. And that may require comparing it to other possible data sets and then justifying our choice of data, relative to those other possible choices of data. What are the limitations of the data, and how will they be addressed? So almost any data that we may make use of in a study, has limitations. No data are perfect. And we can get ourselves into real trouble if we try to conceal or hide or omit, important limitations of the data that may affect its usability for the task at hand. So we have to be honest about what problems we know exist in the data that are related to our research topic. So, for example, my own research on Historical China, we make use of population data. This historical population in many cases omits daughters, it omits children who died when they were very young before their parents had an opportunity to register them, this is back in the 18th and 19th centuries. So, when we write our proposals, we have to be honest about the fact that these people are missing from our data and it limits certain kinds of analyses. And then we have to demonstrate that these known limitations are not relevant for other analyses that we may be interested in conducting. For example, studying health at later ages. So overall, we're trying to establish why we're choosing this data over other possible sources of data. Now this maybe a little easier in my case using historical data because for the period that I study, for the topic that I study, there are essentially no other sources of data. Or I can identify the problems with the existing data very quickly. But if you're doing a contemporary topic, it's very likely that there are multiple surveys, multiple longitudinal surveys, multiple databases out there that may actually contain data relevant to the topic of interest. And, in that case, you have to explain why you chose the particular database, the particular survey dataset that you are using. And you have to explain why you chose that over other datasets that are out there. In some cases, it's just because that's the data set that was available. That's okay to say, but perhaps you would be better off explaining it in terms of differences in the limitations. Finally, you'll also want to talk about restrictions that are applied to the data before you include it in the analysis. So when we're conducting a quantitative analysis, typically we may start with a relatively large dataset, which includes information about a large number of people. But then, depending on the topic at hand, we may apply various sorts of restrictions so that the numbers of records in our analysis is much smaller than the number of records that we started out with. So, for example, if we're doing an analysis of the effects of childhood conditions on people's health in later life. And we have some sought of population database that follows people over time, but doesn't follow everybody over their lifespan, we may have to restrict the dataset to only look at the records of people who were followed from the time they were born until they reach middle or old age which was the subject of our study. So, again, this is where we would talk about those kinds of problems, those kinds of limitations that effect the records that we choose to include in our analysis. The data section also has to discuss the protocol, the process for the collection of the data. If the data are to be collected, if you're planning to conduct your own survey or do your own qualitative fieldwork, then you need to offer additional detail on the procedures you intend to follow in order to collect that data. So, for example, if you're proposing to conduct qualitative field work in a number of villages, then you have to explain how you're going to gain access to those villages and address questions that people might have about whether you will be permitted to stay in those villages. If you're proposing to conduct a survey, then you're going to have to provide a lot of detail about how you're going to execute that. So you'll have to talk about your sampling strategy, we'll come back to that in a later module. You'll have to talk about your questionnaire design. What's going to be on the questionnaire? You're going to even have to talk about how you're going to recruit and train interviewers. And provide a lot of other details to convince people that you'll actually be able to execute the plan that you've offered in your proposal. Now if the data have already been collected, then this section may be a little bit easier especially if it's a publically accessible dataset, you may simply have to explain how you, downloaded the data. And then you may also talk a bit about the procedures that were originally used to conduct the survey by referring to the documentation of these survey. Now, another issue that you have to address is whether and how data are accessible. So again, if you're making us of existing data that you downloaded from the web, you may have to provide a link to the relevant website and disclose the fact that you're making use of publicly available data. If the data were provided to you by somebody that had collected them but not released them publically, normally you need to disclose that and give credit to the people that provided the data to you. And you may have to talk about the arrangements under which you gained access to the data. If you're making use of archival data or other data that you collected yourself by visiting archives or libraries, you'll have to talk about the locations, the specific libraries, the archives, where you were able to collect the data. And you may have to talk about the procedures you followed to gain access. You'll also have to talk about the practical issues that you anticipate in your data collection and how you will deal with these. So especially if we're conducting survey research, there are a number of well known challenges to conducting survey's in the modern era. The willingness of people to participate in surveys around the world, especially in developed countries has been declining. So you're going to have to talk a lot about how you will take measures to encourage people to respond to an invitation to participate in your research. People are very aware of these challenges and you have to show that you're aware of them too and that you have a plan for them. You'll also have to talk about any ethical issues that may be associated with your data collection. This is especially the case if you're planning to conduct surveys or qualitative field work among what we call vulnerable populations. So when we talk about studying children, the elderly, the incarcerated, people who are not competent to make key decisions for themselves in other cases, special protocols and special procedures must apply. And you have to discuss how you are going to follow such protocols and procedures in order to avoid any ethical issues that might arise in the collection of your data. Now if you're using existing data, you may still have to discuss some ethical issues depending on the source of the data. But the burden may not be as demanding as if you propose to collect the data yourself. Finally, you'll want to talk about the procedures by which you plan to transcribe the data, check it for problems and then clean it. So especially if you're conducting a survey or you're collecting data from an archive, there is a step involved where you have to go from the original data, it might be the forms or the publications in an archive, or the forms that are filled out in a certain interviewer for the survey, you have to go from that to an actual data set in a computer. And then there's a procedure once the data set has been entered into a computer of checking that data for problems and then resolving those issues. If you're going to be doing that yourself, then you need to explain what you planned to do. If you're making use of datasets that somebody's already constructed, you may need to summarize the protocols, the procedures that they followed in order to produce the clean data set that you are using for your analysis. So overall, hopefully, I've introduced to you some of the key goals for the data section of a proposal for a research study. If you address this questions, you should be able to produce a successful research proposal assuming that the remaining parts are also acceptable.