Data Train Starter Track: Asking the right research questions in data science
“An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question” said the renowned statistician John Tukey as early as 1969.
Based on my own experience in statistical consultations, much confusion occurs due to a mismatch between research question and data/methods. However, even more fundamentally, the research question is often not even clearly articulated at the outset – perhaps because researchers anticipate that the right question can only be answered approximately. But how can we discuss what data and methods are suitable, if we are unclear or vague about the question to be answered? It seems that now, in the era of big data characterised by an abundance of data and a similar abundance of methods for analysing the data, the issue of asking the right question receives a new urgency.
In this course we will discuss the different types of research questions one might face in a variety of applied fields within data science, such as psychology, epidemiology, genetics, or political & social sciences. Key distinctions concern questions that are (i) descriptive, (ii) predictive, or (iii) causal (i.e. about counterfactual prediction). We will consider how these types of research questions are interrelated with the choices / requirements of data, methods of analysis, and the need for more or less specific subject matter background knowledge. We will see how starting with a clear and explicit research question helps with assessing, and maybe avoiding, potential sources of (structural) bias in answering that research question.
Key topics that will be covered:
- Types of research questions (descriptive, predictive, causal/counterfactual)
- Issues of validity and structural bias (e.g. selection, confounding, ascertainment)
- The target trial principle
Upon completion, participants of the course will be able to
- categorise research questions as descriptive, predictive or causal
- elicit a research question by formulating a target trial
- determine implications for the required data and choice of appropriate methods
- identify possible threats to validity / sources of structural bias.
Speaker: Vanessa Didelez
Please register here.