The two-day meeting autumn 2024 will take place in Copenhagen November 12–13.

The meeting is organized by the Statistics and Data Analysis group at the Danish Cancer Institute, the Danish Cancer Society.

## Venue

Room 5.2.A.B, Danish Cancer Society (Kræftens Bekæmpelse), Strandboulevarden 49, DK-2100 Copenhagen

## Registration

Conference fee:

- Regular DSTS members: DKK 850
- Honary, retired and student (MSc, BSc) members: DKK 150
- All others: DKK 1400

The conference fee covers coffee, cake & fruit, drinks, dinner and sandwich.

Please register before October 11th at https://www.conferencemanager.dk/dststwo-daymeetingfall2024

## Preliminary programme

**Tuesday, November 12**

- 13:00-13:15: Welcome
- 13:15-13:50: Susanne Rosthøj, Mia Klinten Grand & Anders Tolver (Statistics and Data Analysis, Danish Cancer Institute):
*The Role of Statisticians at the Danish Cancer Institute*
- 13:50-14:00: Break
- 14:00-14:40: Corine Baayen (Global Biometrics, Ferring Pharmaceuticals):
*Valid p-values, confidence intervals and point estimates for group sequential trials with pipeline data – how to order the outcome space*
- 14:40-14:50: Break
- 14:50-15:30: Christian Ritz (National Institute of Public Health, Copenhagen, SDU):
*Some statistical methods in nutrition and beyond*
- 15:30-16:00: Coffee break
- 16:00-16:40: Laust Hvas Mortensen (Statistics Denmark / Section of Epidemiology, KU / The Rockwool Foundation):
*Innovation in official statistics: How to count what counts?*
- 16:40-16:50: Break
- 16:50-17:20: Claus Ekstrøm (Section of Biostatistics, KU):
*According to protocol*
- 17:20-17:30: News from DSTS
- 17:30-18:30: Beer and soda at KB.
- 19:00-22:00: Dinner at Comwell Copenhagen Portside, Alexandriagade 1, 2150 København

**Wednesday, November 13**

- 09:00-09:40: Alessandra Meddis (Section of Biostatistics, KU):
*Capturing the effect of treatment breaks in users of hormonal contraception*
- 09:40-09:50: Break
- 09:50-10:30: Mikkel Meyer Andersen (Department of Mathematical Sciences, AAU):
*Research, courtroom, and media: a forensic statistician’s perspective*
- 10:30-11:00: Coffee break
- 11:00-11:40: Søren Wengel Mogensen (Department of Finance, CBS):
*Deceivingly simple causal effect estimation from time series data*
- 11:40-11:50: Break
- 11:50-12:30: Line Clemmensen (Department of Mathematical Sciences, KU):
*TBA*
- 12:30: Sandwiches

## Abstracts

In this presentation, we will introduce the statistics group at the Danish Cancer Institute and illustrate how we contribute to high-quality research within a collaborative environment. We are a team of eight statisticians working in a dynamic research environment, engaging with a broad spectrum of data types. Our primary focus is on register data, although we are also involved in several randomized clinical trials and manage several large in-house cohorts. Given the diversity of data and research questions we encounter, it is essential that our expertise spans a wide range of statistical methods.

We will outline how we operate as a team and how we strive to create a stimulating professional environment by focusing on the implementation of the most recent statistical methodologies. Staying up to date with the latest advances in statistical techniques is central to our approach. Additionally, we will provide examples of some of the statistical challenges we have encountered in our collaborations with research groups within the institute.

In group sequential trials, recruitment may be stopped at pre-defined interim analyses for efficacy or futility, without compromising the validity or integrity of the trial. Contrary to trials with a fixed sample size, the outcome space not only consists of the value of the test-statistic, but also of the stage (analysis) at which the trial was stopped. An ordering on this outcome space is required to derive valid p-values, confidence intervals and point estimates. Multiple orderings are possible, but typically an ordering proposed by Armitage is used, which orders results first by whether or not the null hypothesis is rejected, then by the stage at which the trial stopped and finally by the value of the test statistic.

In this talk we will discuss how to do valid inference in the context of pipeline data. Pipeline data arise when at the time of the interim analysis some randomized patients have incomplete follow-up, due to the final outcome being collected with a delay (e.g. some months). In this case, at an interim analysis it is only decided whether to stop recruitment. A decision analysis is performed when all patients have completed the trial. We will show how to include these extra analyses in the Armitage ordering and illustrate how p-values, confidence intervals and point estimates can be obtained. Conservative approaches for handling non-binding futility rules and/or constraints on the critical value at the decision analysis will be discussed. A clinical trial example and simulation results will be provided to illustrate and substantiate the considerations in this talk.

Recently, there has been an increasing interest in large-scale analyses of burden of disease and deficiency. Typically, such analyses involve pooling data from multiple sources and meta-analytic approaches based on aggregate data have often been used. However, it is becoming more common to use individual-level participant data (IPD). Motivated by an ongoing EU project, which aims to quantify micronutrient deficiency across Europe, a number of statistical methods using IPD will be revisited, including different ways to estimate prevalence of deficiency of one or more micronutrients, while accommodating correlations and adjustments, and spatial modelling of prevalence using some recent developments in mixed-effects methodology.

Good and trustworthy official statistics are indispensable for society. In Denmark, Statistics Denmark has compiled those numbers for 175 years. Official statistics are overwhelmingly produced using simple, explainable methods. In Denmark, this often means aggregating numbers from the famed Danish administrative registries. However, the world is changing fast and the need for statistics on emerging phenomena cannot easily be met with the data that is in the registers and/or with the methods that traditionally have been used. Given that data now comes in many forms and shapes national statistical institutes have started experimenting with new ways of getting from these data to statistics. Luckily, Statistics Denmark is not alone in facing this challenge as the same applies to researchers using the Danish registers. This talk will showcase various cases, failed and successful, from Statistics Denmark’s Data Science Lab over recent year.

Words matter. They matter a lot. They have the power to break hearts and start wars, to heal wounds and validate research that benefits society.

Statistics is the grammar of science and traditional statistical methods, carefully outlined in research protocols, have long promoted transparency and accountability among scientists. But today, as “modern methods” like machine learning and AI take center stage, the rigor that once underpinned scientific inquiry is at risk. There is a growing belief that these new techniques do not require the same scrutiny or clarity.

In this talk, we will explore how this shift is affecting the integrity of research and discuss the importance of maintaining scientific rigor in an age of rapid technological advancement, and how we must uphold the credibility of science.

It is well-established that combined oral hormonal contraceptives, also known as ‘the pill’, increase the rate of thrombosis. It is very common for women to stop and restart combined oral hormonal contraceptives, however, little is known about the effect of treatment breaks on the risk of thrombosis. Studies have shown a relatively higher rate of thrombosis in the first months of use and this observation has led doctors and guidelines to discourage women from frequently stopping and restarting treatment. The reasoning would be that re-starting use of combined oral hormonal contraceptives could increase the risk of thrombosis.

In this talk, we discuss the statistical challenges in our approach to obtain more evidence regarding what effect treatment breaks may have on the risk of thrombosis. We employ targeted learning methods for longitudinal data to define the effect of a treatment break in a target trial framework for causal inference. Specifically, we compare the two interventions: (1) sustained use versus (2) taking a break within two years, and we employ stochastic interventions to identify the effect of a treatment break without focusing on its specific duration and/or timing.

DNA evidence is pivotal in modern criminal investigations, and this relies on accurate statistical interpretation. This talk will begin by explaining the different types of DNA profiles and how they are applied as evidence and in investigations. Next, I will present some of the statistical models I have co-developed and discuss how they are used in court. I will also explain the dissemination of the results for the public and the media. Finally, I will offer a few observations from my experience as an interdisciplinary statistician, reflecting on the unique challenges and rewards of collaborating across legal, scientific, and media fields.

Will lowering the marginal tax rate increase labor supply? In social sciences, experimentation is often impossible and data-driven approaches to answering such causal questions must use non-experimental data. Statistical methods to learn about cause and effect can exploit a known temporal ordering of the random variables as cause must precede effect. In some ways, causal inference from time series data appears simpler than causal inference using data with no temporal ordering. However, this perceived simplicity may be deceiving, and in this talk we discuss some common issues that arise when we try to reason about cause and effect from time series data along with some partial solutions.

Granger causality and impulse response functions are examples of standard time series methods that are often, at least between the lines, given a causal interpretion. Causal effects in time series are closely related to impulse response functions that are commonly used in, e.g., econometrics, and the interpretation of impulse response functions is similar to that of causal effects. However, this conflation implicitly assumes that the multivariate time series is fully observed, i.e., that there are no unobserved confounding processes. Without this assumption, we need additional information to identify causal effects from time series data. One possibility is a known causal graph representing cause-effect relations, and we outline a recent result in this context which allows us to corroborate or invalidate a causal interpretation of parameter estimates obtained using standard time series techniques. The widespread notion of Granger `causality’ can also be given an actual causal interpretation, most easily under the restrictive assumption of no unobserved confounding processes. Without this assumption, Granger causality can still teach us about causal structure, and we describe how this can be done.

If the causal system develops faster than the sampling frequency, unobserved confounding may occur between the sampling points, and this will also complicate causal effect estimation and causal structure learning. A similar problem occurs when data is aggregated over time as is common for economic data. We give examples of this and some potential solutions.