Continuous Intelligence¶
This site provides documentation for this project. Use the navigation to explore module-specific materials.
How-To Guide¶
Many instructions are common to all our projects.
See ⭐ Workflow: Apply Example to get these projects running on your machine.
Project Documentation Pages (docs/)¶
- Home - this documentation landing page
- Project Instructions - instructions specific to this module
- Your Files - how to copy the example and create your version
- Glossary - project terms and concepts
Custom Project¶
Dataset¶
For my custom project I pulled data from the past 5 years of my life tracked in my Data Journal, aggregated to the month. The data include a number of behavioral metrics that I keep track of because... well... it's fun..
For more on the Data Journal - see https://datajournal.guide
Signals¶
The signals I added to base data are couple of data points I've always wanted to add, actually. I track most days, but not necessarily every day. This means my data are only ~95% accurate. On days that I miss, I may have missed things I try to keep limited, like dining out for example.
So I created a meta-signal that shows tracking_consistency - the proportion of days I tracked. Then used that to linearlly extrapolate a couple of other data points to compensate for missing days - extrapolated_media_signal and extrapolated_dining_out_signal.
Experiments¶
The modification was slightly less straightforward than intended. It took a couple of tries to get the "number of days in the month" code working. Also I very nearly extrapolated using the inverse of the math that would make sense (multiplying rather than dividing).
Results¶
The new columns were successfully added! They show the range of what "true" media consumption counts and dining out at restaurant counts might be when acommodating for the rare days I miss tracking things.
Interpretation¶
This sort of extrapolation can be used (with some risk) to accommodate / compensate for incomplete data. The assumption to linearly extrapolate out the average per day for the rest of the month is likely flawed, but also likely better than the existing implicit assumption that any days missed didn't likely include any trackable events on them. Adding this signal for a pipeline is perfect, because it doesn't really make sense in the raw data - but could be very useful for analysis & trending.