Behavioral Drift Detection and Conversational Recalibration in Personalized Attention Management Systems
Author: Chance Kelch
Affiliation: Precedent AI (getprecedent.ai)
Date: October 29, 2025
Contact: chance@getprecedent.ai
Abstract
We present a novel approach to personalized attention management that addresses a critical limitation in existing AI-driven email prioritization systems: the temporal misalignment between users' stated preferences and their actual behavior. Our system introduces three key innovations: (1) a behavioral drift detection mechanism that continuously monitors divergence between declared goals and observed interaction patterns, (2) a conversational recalibration protocol that proactively initiates goal reassessment when drift exceeds learned thresholds, and (3) a context-aware situation detection system that dynamically adjusts urgency scoring based on temporal life events. We provide detailed algorithmic descriptions, implementation architectures, and theoretical foundations sufficient for reproduction by practitioners in the field. This work establishes prior art for these techniques in production AI systems.
Keywords: attention management, behavioral drift detection, preference learning, human-AI alignment, email prioritization, temporal context awareness
1. Introduction
1.1 Problem Statement
Current email prioritization systems suffer from a fundamental temporal alignment problem. Users articulate preferences at time t₀ (e.g., "I want to focus less on recruiting"), but their behavior evolves continuously while their stated preferences remain static. This creates three pathologies:
- Preference Fossilization: Systems continue optimizing for outdated goals
- Silent Drift: Behavioral changes occur without system awareness
- Misalignment Accumulation: The gap between stated and revealed preferences grows unbounded
Existing approaches fall into two categories:
- Passive learning: Systems learn solely from behavior, ignoring explicit preferences
- Static preferences: Systems use only declared rules, ignoring behavioral signals
Neither approach addresses the temporal misalignment problem.
1.2 Contributions
We present a system that:
- Detects behavioral drift by computing divergence metrics between stated goals and observed interactions across multiple behavioral dimensions
- Initiates conversational recalibration when drift exceeds dynamically learned thresholds, asking users if their priorities have changed
- Implements situation-aware scoring that automatically detects temporal contexts (e.g., "hiring sprint," "tax season") and adjusts prioritization accordingly
- Provides progressive trust mechanisms that grant increasing autonomy based on accuracy over category-specific decision types
This work establishes prior art for these techniques and provides sufficient technical detail for implementation by others in the field.
2. Related Work
2.1 Email Prioritization Systems
Traditional email prioritization relies on static rules (SaneBox, Hey.com) or generic ML models trained on aggregate user behavior (Gmail Priority Inbox, Outlook Focused Inbox). These approaches fail to capture individual preference evolution.
Key Limitation: No existing system detects when user behavior diverges from stated preferences and proactively initiates goal reassessment.
2.2 Preference Learning
Reinforcement Learning from Human Feedback (RLHF) [Christiano et al., 2017] learns reward functions from binary preferences. Active Preference Learning [Sadigh et al., 2017] queries users strategically to reduce uncertainty.
Our Distinction: Rather than learning a static reward function, we maintain two parallel models—stated preferences (G) and revealed preferences (B)—and explicitly monitor their divergence over time.
2.3 Context-Aware Systems
Calendar-integrated systems (Google Calendar Smart Scheduling, x.ai) use temporal context for scheduling. Recommender systems use contextual bandits [Li et al., 2010] for content ranking.
Our Distinction: We introduce automatic situation detection from communication patterns combined with dynamic urgency recalibration, rather than manual context specification or single-factor temporal features.
2.4 Human-AI Trust
Progressive autonomy in robotics [Goodrich & Schultz, 2007] and adjustable autonomy in multi-agent systems [Scerri et al., 2002] gradually increase system independence.
Our Distinction: We apply category-specific trust with reversible autonomy and continuous monitoring rather than one-time approval or domain-general trust scores.
3. Technical Approach
3.1 Behavioral Drift Detection
3.1.1 State Representation
At time t, we maintain:
Stated Goals Vector (Gt): User's explicit preferences
G_t = {
focus_areas: [(domain_i, priority_i, declared_at_t)],
ignore_patterns: [(pattern_j, threshold_j, declared_at_t)],
response_targets: [(sender_class_k, max_delay_k, declared_at_t)]
}Behavioral Profile Matrix (Bt): Observed interaction patterns
B_t = [
response_velocity(domain_i, t_window),
attention_allocation(domain_i, t_window),
completion_rate(domain_i, t_window),
interruption_tolerance(domain_i, t_window)
]Where t_window is a sliding window (default: 14 days) to capture recent behavior while maintaining temporal sensitivity. This window size balances:
- Short enough to detect drift quickly (vs 30-day windows that lag)
- Long enough to smooth out daily variance (vs 7-day windows that are too noisy)
- Configurable per user based on email volume (high-volume users can use 7-day windows)
Response Velocity Buckets:
We discretize continuous response times into interpretable categories:
VELOCITY_BUCKETS = {
'instant': (0, 0.5), # < 30 minutes
'same_day': (0.5, 8), # 30min - 8 hours
'next_day': (8, 32), # 8 - 32 hours
'week': (32, 168), # 32h - 1 week
'never': (168, float('inf')) # > 1 week or no response
}
def bucket_response_time(hours):
for bucket_name, (min_h, max_h) in VELOCITY_BUCKETS.items():
if min_h <= hours < max_h:
return bucket_name
return 'never'These buckets are used in drift computations and user-facing reports ("You respond to recruiting emails within 8 hours but stated goal was next-day").
3.1.2 Drift Metric Computation
We compute drift using a weighted earth mover's distance between normalized behavioral distributions and stated preference distributions:
δ(G_t, B_t) = Σᵢ wᵢ · EMD(Pᵢᴳ, Pᵢᴮ)Where:
- Pᵢᴳ: Probability distribution of priority i derived from stated goals
- Pᵢᴮ: Probability distribution of priority i derived from behavior
- wᵢ: Domain-specific weight learned from historical recalibration acceptance rates
- EMD: Earth Mover's Distance [Rubner et al., 2000]
Why EMD?: Unlike L2 distance, EMD respects the semantic similarity between email categories. "Product feedback" and "product bugs" should have lower transportation cost than "product bugs" and "personal finance."
Ground Distance for EMD: The cost of moving probability mass between categories i and j is computed using one of:
1. Embedding-based distance (default):
ground_cost(i, j) = 1 - cosine_similarity(embed(i), embed(j))
# where embed() uses text-embedding-3-small on category descriptions2. Learned confusion matrix: After sufficient user data, estimate empirically:
ground_cost(i, j) = P(user treats category i like category j)
# measured by response velocity similarity, attention allocation correlation3. Ontology-based distance: Use predefined category hierarchy:
ground_cost(i, j) = shortest_path_length(i, j) / max_path_length
# normalized tree distanceWe default to (1) for cold-start, transition to (2) after 30+ days of user data.
3.1.3 Specific Drift Signals
We track four behavioral dimensions with distinct drift computations:
1. Response Velocity Drift (δvelocity)
For each domain d and time window w:
# Default priority-to-response-time mapping (personalized per user over time)
DEFAULT_PRIORITY_MAP = {
10: 0.25, # Urgent: 15 minutes
9: 1, # Very High: 1 hour
8: 4, # High: 4 hours
7: 8, # Medium-High: same day
6: 24, # Medium: next day
5: 48, # Medium-Low: 2 days
4: 72, # Low: 3 days
3: 168, # Very Low: 1 week
2: 336, # Minimal: 2 weeks
1: None # Ignore: no expected response
}
def compute_velocity_drift(stated_priority_d, observed_responses_d, user_history):
"""
Stated priority implies expected response time.
Compare to actual median response time.
priority_to_response_map is personalized per user over time by fitting
a regression model: response_time ~ f(priority, domain, sender_VIP_score)
"""
# Get user's personalized mapping, fall back to default
priority_map = user_history.learned_priority_map or DEFAULT_PRIORITY_MAP
expected_response_hours = priority_map[stated_priority_d]
actual_median_hours = median([r.response_time_hours for r in observed_responses_d])
if expected_response_hours is None: # Priority 1 = no response expected
return 0.0 # No drift for "ignore" category
# Normalized drift: log ratio of actual to expected
# log(1) = 0 (no drift), log(0.5) = -0.69 (responding faster)
# log(2) = 0.69 (responding slower)
velocity_drift = abs(log(actual_median_hours / expected_response_hours))
# Personalization: Update user's priority map using exponential moving average
alpha = 0.1 # learning rate
user_history.learned_priority_map[stated_priority_d] = (
(1 - alpha) * expected_response_hours +
alpha * actual_median_hours
)
return velocity_driftExample: User states "recruiting is low priority (5)" (expected response: 48h), but actually responds in 6h median → drift = log(6/48) = -2.0
2. Attention Allocation Drift (δattention)
Measures divergence between stated focus areas and actual time spent:
def compute_attention_drift(stated_focus_weights, observed_time_allocation):
"""
Use KL divergence between stated and observed distributions.
Note: KL divergence is asymmetric; we also compute Jensen-Shannon
as a symmetric alternative.
"""
# Normalize both to probability distributions with ε-smoothing
epsilon = 1e-4 # ensures strictly positive support
P_stated = normalize(stated_focus_weights) + epsilon
P_stated = P_stated / sum(P_stated) # re-normalize after smoothing
P_observed = normalize(observed_time_allocation) + epsilon
P_observed = P_observed / sum(P_observed)
# KL divergence: D_KL(P_observed || P_stated)
# Measures cost of encoding observed using stated distribution
kl_div = sum(P_observed[i] * log(P_observed[i] / P_stated[i])
for i in domains)
# Jensen-Shannon divergence (symmetric alternative)
# JS(P,Q) = 0.5*KL(P||M) + 0.5*KL(Q||M) where M = 0.5*(P+Q)
M = 0.5 * (P_observed + P_stated)
js_div = 0.5 * sum(P_observed[i] * log(P_observed[i] / M[i]) for i in domains) + \
0.5 * sum(P_stated[i] * log(P_stated[i] / M[i]) for i in domains)
# Return KL by default (penalizes observed deviating from stated)
# But track JS for symmetric comparison
return kl_div, js_divExample: User states 70% product, 30% recruiting. Observed: 40% product, 60% recruiting → KL divergence = 0.51 nats
3. Completion Rate Drift (δcompletion)
Tracks which declared important items go unhandled:
def compute_completion_drift(domain_priorities, completion_rates):
"""
High priority domains should have high completion rates.
Compute rank correlation.
"""
priority_ranks = rank(domain_priorities)
completion_ranks = rank(completion_rates)
# Spearman's rank correlation (1 = perfect alignment, -1 = opposite)
rho = spearman_correlation(priority_ranks, completion_ranks)
# Convert to drift metric (0 = aligned, 1 = opposite)
completion_drift = (1 - rho) / 2
return completion_drift4. Interruption Tolerance Drift (δinterrupt)
Measures when user accepts/dismisses immediate notifications:
def compute_interruption_drift(notification_settings, actual_responses, user_history):
"""
Track which urgency levels get immediate attention vs dismissal.
Thresholds are learned per-user via Bayesian updating.
"""
# User-specific dismissal threshold (initialized at 0.3, updated via Beta prior)
dismissal_threshold = user_history.learned_dismissal_threshold
# Beta distribution parameters from historical data
alpha_prior = user_history.dismissal_alpha # successes (accepts)
beta_prior = user_history.dismissal_beta # failures (dismissals)
drift_signals = []
for urgency_level in [8, 9, 10]:
stated_threshold = notification_settings.urgency_threshold
dismissal_rate = actual_responses[urgency_level].dismissal_rate
if urgency_level >= stated_threshold and dismissal_rate > dismissal_threshold:
# User dismisses above their learned threshold of "urgent" notifications
# This suggests stated urgency threshold is too low
drift_signals.append(('interrupt_threshold_too_low', dismissal_rate))
# Update dismissal threshold using Bayesian update
new_accepts = sum(actual_responses[u].accept_count for u in [8,9,10])
new_dismissals = sum(actual_responses[u].dismiss_count for u in [8,9,10])
user_history.dismissal_alpha += new_accepts
user_history.dismissal_beta += new_dismissals
user_history.learned_dismissal_threshold = beta_prior / (alpha_prior + beta_prior)
return aggregate(drift_signals)Note on threshold learning: The 0.3 initial value is based on internal empirical observation, but the system adapts this per-user. Users with high interruption tolerance may converge to 0.5+ (tolerate more dismissals before flagging drift), while notification-sensitive users converge to 0.1-0.2.
3.1.4 Composite Drift Score
The final drift score combines all dimensions with learned weights:
def compute_drift_score(G_t, B_t, user_history):
"""
Compute weighted composite drift score.
Weights are personalized based on which signals
predicted successful recalibrations in the past.
"""
# Get user-specific weights (initialized uniformly, learned over time)
w = user_history.drift_weights # [w_velocity, w_attention, w_completion, w_interrupt]
# Compute component drifts
δ_v = compute_velocity_drift(G_t.response_targets, B_t.response_times)
δ_a = compute_attention_drift(G_t.focus_areas, B_t.time_allocation)
δ_c = compute_completion_drift(G_t.priorities, B_t.completion_rates)
δ_i = compute_interruption_drift(G_t.notification_settings, B_t.interrupt_responses)
# Weighted sum
δ_total = w[0]*δ_v + w[1]*δ_a + w[2]*δ_c + w[3]*δ_i
# Normalize to [0,1] using user's OWN historical distribution (not cohort)
# This personalizes what counts as "high drift" for each user
# A user with naturally volatile behavior needs higher drift to trigger
# vs a user with stable patterns where small drifts are meaningful
δ_normalized = percentile(δ_total, user_history.drift_distribution)
# percentile() returns where δ_total falls in user's historical drift scores
# e.g., if δ_total is at 80th percentile of user's past drifts → 0.80
return δ_normalized, [δ_v, δ_a, δ_c, δ_i]Note on normalization: We use per-user percentile normalization rather than cohort-based normalization because:
- Users have different baseline behavioral variance (some are naturally more variable)
- Email patterns differ by role (exec vs IC), industry, communication style
- Avoids penalizing high-variance users or under-detecting drift in low-variance users
- Each user's drift threshold adapts to their personal patterns
3.1.5 Per-User Threshold Adaptation
Critical innovation: The recalibration threshold is not fixed but learned per-user.
def should_trigger_recalibration(δ_score, user_history):
"""
Dynamic threshold based on user's historical response to recalibrations.
Includes cool-down period to prevent prompt fatigue.
"""
# Get user's learned threshold (initialized at 0.65)
threshold = user_history.recalibration_threshold
if δ_score > threshold:
# Check recency: enforce 7-day minimum between recalibration prompts
# This prevents overwhelming users with frequent prompts even if drift is high
days_since_last = (now() - user_history.last_recalibration).days
COOLDOWN_DAYS = 7 # Minimum days between prompts (consistent across system)
if days_since_last < COOLDOWN_DAYS:
# Log that drift was detected but suppressed due to cooldown
log_event('drift_detected_cooldown', δ_score, days_since_last)
return False # Too soon - respect cooldown
return True
return False
def update_threshold(user_accepted_recalibration, δ_score_at_trigger):
"""
Update threshold based on user response to recalibration prompt.
"""
if user_accepted_recalibration:
# User confirmed priorities changed → threshold was good or too high
# Lower it slightly to catch drifts earlier
new_threshold = threshold * 0.95
else:
# User said "no, priorities haven't changed" → threshold too low
# Raise it to reduce false positives
new_threshold = threshold * 1.1
# Clamp to reasonable bounds
return clip(new_threshold, min=0.4, max=0.9)3.2 Conversational Recalibration Protocol
When drift is detected, we don't just log it—we initiate a conversation.
3.2.1 Recalibration Message Generation
def generate_recalibration_prompt(δ_components, G_t, B_t):
"""
Generate specific, evidence-based recalibration question.
"""
# Identify which drift component is highest
dominant_drift = argmax(δ_components)
if dominant_drift == 'velocity':
# Find specific domain with largest velocity divergence
domain = find_max_velocity_drift(G_t, B_t)
stated = G_t.response_targets[domain]
actual = median(B_t.response_times[domain])
prompt = f"""
You mentioned you wanted to focus less on {domain},
but you're responding {actual}h faster than your stated
target of {stated}h. Has your priority changed?
[Yes, {domain} is more important now] [No, help me stick to my goal]
"""
elif dominant_drift == 'attention':
# Show attention reallocation
top_stated = top_domains(G_t.focus_areas)
top_actual = top_domains(B_t.time_allocation)
prompt = f"""
I notice you're spending more time on {top_actual[0]} and less
on {top_stated[0]} than you intended. Should I adjust to match
your current focus?
[Yes, update priorities] [No, help me rebalance]
"""
# Similar logic for completion and interruption drifts...
return prompt3.2.2 Response Handling
def handle_recalibration_response(user_response, G_t, B_t):
"""
Update system based on user's recalibration choice.
"""
if user_response.intent == 'update_priorities':
# User confirms priorities changed → update G_t to match B_t
G_t_new = align_goals_to_behavior(G_t, B_t)
# Log this as a successful drift detection
log_event('drift_confirmed', δ_score, user_response)
# Reset drift accumulator
reset_drift_tracking()
return G_t_new
elif user_response.intent == 'enforce_goals':
# User wants to stick to stated goals → increase enforcement
# Add nudges/reminders to help user honor stated goals
enable_goal_enforcement(G_t, B_t)
# Examples:
# - "You have 3 recruiting emails. Reminder: you said this is low priority"
# - Daily summary: "You spent 60% of time on recruiting vs 30% goal"
# Keep G_t unchanged, but mark drift as "user wants correction"
log_event('drift_rejected', δ_score, 'user_wants_enforcement')
return G_t # unchanged3.2.3 Meta-Learning from Recalibrations
Each recalibration event provides training signal:
def update_drift_model(recalibration_event):
"""
Learn which drift signals predict successful recalibrations.
"""
# Extract features
X = {
'δ_velocity': recalibration_event.drift_components[0],
'δ_attention': recalibration_event.drift_components[1],
'δ_completion': recalibration_event.drift_components[2],
'δ_interrupt': recalibration_event.drift_components[3],
'time_since_goal_set': recalibration_event.goal_age_days,
'domain_category': recalibration_event.primary_domain,
}
# Label: did user confirm priorities changed?
y = 1 if recalibration_event.outcome == 'priorities_changed' else 0
# Update logistic regression model (or gradient boosting tree)
drift_model.partial_fit(X, y)
# This improves future drift detection accuracy3.3 Situation-Aware Dynamic Urgency Scoring
3.3.1 Automatic Situation Detection
We detect temporal contexts ("situations") using multi-signal analysis:
class SituationDetector:
"""
Automatically detect when user enters/exits situation contexts.
"""
def detect_situations(self, user_id, current_date):
"""
Run nightly to detect active situations.
"""
signals = self.gather_signals(user_id, lookback_days=14)
situations = []
# Example: Hiring Sprint Detection
hiring_signals = {
'email_volume': count_emails(domain='recruiting', signals),
'calendar_interviews': count_calendar_events(type='interview', signals),
'keyword_frequency': count_keywords(['candidate', 'hire', 'interview'], signals),
'domain_velocity': compute_response_speed('recruiting', signals)
}
if (hiring_signals['email_volume'] > baseline * 2.0 and
hiring_signals['calendar_interviews'] > 3 and
hiring_signals['keyword_frequency'] > baseline * 1.5):
situations.append({
'type': 'hiring_sprint',
'confidence': 0.87,
'evidence': hiring_signals,
'started_at': estimate_start_date(signals),
'expected_duration': 30 # days, learned from historical patterns
})
# Example: Tax Season Detection
if (current_date.month in [3, 4] and
count_emails(sender_domain='cpa|accountant|irs', signals) > baseline * 3):
situations.append({
'type': 'tax_season',
'confidence': 0.95,
'evidence': {...},
'started_at': date(current_date.year, 3, 1),
'expected_duration': 45
})
# Example: Board Preparation Detection
board_signals = {
'calendar_event': find_calendar_event(title_contains='board meeting'),
'deck_mentions': count_keywords(['board deck', 'board materials', 'board prep']),
'leadership_emails': count_emails(from_role='c-suite', about='board')
}
if board_signals['calendar_event'] and board_signals['deck_mentions'] > 2:
days_until_meeting = (board_signals['calendar_event'].date - current_date).days
situations.append({
'type': 'board_prep',
'confidence': 0.92,
'deadline': board_signals['calendar_event'].date,
'urgency_curve': 'exponential', # urgency increases as deadline approaches
'expected_duration': min(days_until_meeting, 21)
})
return situations
def estimate_start_date(self, signals):
"""
Use changepoint detection to find when signal pattern emerged.
We use the PELT (Pruned Exact Linear Time) algorithm [Killick et al., 2012].
"""
time_series = [signal.count for signal in signals]
# PELT parameters:
# - Cost function: Normal likelihood (assumes Gaussian noise)
# - Penalty λ: Controls false positive rate
# Higher λ = fewer changepoints (more conservative)
# We use λ = log(n) * σ² where n = len(time_series), σ² = variance
n = len(time_series)
sigma_sq = np.var(time_series)
penalty = np.log(n) * sigma_sq # BIC-like penalty
changepoints = detect_changepoints(
time_series,
cost_function='normal', # Gaussian likelihood
penalty=penalty,
min_segment_length=3 # Require at least 3 days per segment
) # Returns indices where distribution changes
# Alternative: Bayesian Online Changepoint Detection (BOCD)
# Tradeoff: PELT is faster, BOCD provides uncertainty estimates
# For situations needing probability of changepoint:
# changepoint_probs = bayesian_online_changepoint(time_series, hazard=1/100)
if changepoints:
return signals[changepoints[-1]].date # Most recent changepoint
return signals[0].date # No changepoint found, use start of window3.3.2 Situation-Aware Urgency Adjustment
Once a situation is detected, we modify urgency scoring:
def compute_urgency_score(email, base_urgency, active_situations):
"""
Adjust base urgency based on active situations.
"""
adjusted_urgency = base_urgency
for situation in active_situations:
if is_relevant(email, situation):
# Apply situation-specific urgency boost
boost = compute_situation_boost(email, situation)
adjusted_urgency = min(10, adjusted_urgency + boost)
return adjusted_urgency
def compute_situation_boost(email, situation):
"""
Calculate urgency boost for situation-relevant emails.
"""
if situation['type'] == 'hiring_sprint':
# Recruiting emails get +3 urgency during hiring sprints
if 'recruiting' in email.domain:
return 3.0
elif situation['type'] == 'board_prep':
# Board-related emails get exponentially urgent as deadline approaches
days_until_deadline = (situation['deadline'] - now()).days
if days_until_deadline <= 3:
return 4.0 # Critical urgency
elif days_until_deadline <= 7:
return 3.0 # High urgency
elif days_until_deadline <= 14:
return 2.0 # Moderate urgency
elif situation['type'] == 'tax_season':
# Financial/tax emails get +2 urgency March-April
if any(kw in email.sender for kw in ['cpa', 'accountant', 'tax']):
return 2.0
return 0.0
def is_relevant(email, situation):
"""
Determine if email is relevant to situation.
"""
relevance_keywords = {
'hiring_sprint': ['candidate', 'interview', 'hire', 'recruiting', 'talent'],
'tax_season': ['tax', 'cpa', 'irs', 'return', 'deduction', 'audit'],
'board_prep': ['board', 'deck', 'materials', 'presentation', 'directors']
}
keywords = relevance_keywords[situation['type']]
# Check subject, body, sender domain
return any(kw in email.subject.lower() or
kw in email.body.lower() or
kw in email.sender.lower()
for kw in keywords)3.3.3 Automatic Situation Decay
Situations don't last forever. We detect when they end:
def check_situation_decay(situation, recent_signals):
"""
Determine if situation has ended.
"""
current_intensity = measure_signal_intensity(situation['type'], recent_signals)
# If signal intensity drops below 50% of peak, situation is ending
if current_intensity < situation['peak_intensity'] * 0.5:
return True
# If past expected duration + grace period
days_active = (now() - situation['started_at']).days
if days_active > situation['expected_duration'] * 1.2:
return True
# If deadline passed (for deadline-driven situations like board prep)
if 'deadline' in situation and now() > situation['deadline']:
return True
return False
def deactivate_situation(situation):
"""
Graceful situation deactivation.
"""
# Gradually reduce urgency boosts over 3 days
situation['decay_rate'] = 0.33 # per day
situation['status'] = 'decaying'
# After 3 days, fully deactivate
schedule_job(delay=3*days, job=lambda: situation.update(status='inactive'))3.4 Progressive Trust Mode
3.4.1 Category-Specific Trust Accumulation
Trust is not monolithic—users may trust the system for scheduling but not for drafting emails:
class TrustTracker:
"""
Track trust scores per decision category.
"""
categories = [
'email_urgency_scoring',
'draft_generation',
'meeting_scheduling',
'email_archiving',
'response_sending',
'contact_prioritization'
]
def __init__(self):
# Initialize trust scores per category
self.trust_scores = {cat: TrustScore(initial=0.0) for cat in self.categories}
def record_decision(self, category, ai_suggestion, user_action):
"""
Track whether user accepted AI's suggestion.
"""
agreement = (ai_suggestion == user_action)
# Update category-specific trust score
self.trust_scores[category].add_observation(agreement)
# Check if category crossed autonomy threshold
if self.trust_scores[category].count >= 20 and \
self.trust_scores[category].accuracy >= 0.90:
self.grant_autonomy(category)
def grant_autonomy(self, category):
"""
Enable autonomous action for trusted category.
Uses Wilson confidence interval to ensure statistical significance.
"""
trust_score = self.trust_scores[category]
# Require both sufficient samples AND high accuracy with tight confidence
sufficient_data = trust_score.count >= 20
high_accuracy = trust_score.accuracy >= 0.90
# Wilson 95% confidence interval for accuracy
ci_lower, ci_upper = trust_score.confidence_interval
# Require lower bound of CI to be above threshold (conservative)
confident = ci_lower >= 0.85
if not (sufficient_data and high_accuracy and confident):
return # Don't grant autonomy yet
# Grant category-specific autonomy levels
autonomy_levels = {
'email_urgency_scoring': {
'level': 'AUTONOMOUS',
'actions': ['auto_label', 'auto_prioritize'],
'require_approval': False,
'confidence_threshold': 0.90 # High stakes need high confidence
},
'meeting_scheduling': {
'level': 'SEMI_AUTONOMOUS',
'actions': ['suggest_times', 'auto_send_invite'],
'require_approval': True, # still needs approval before sending
'confidence_threshold': 0.85 # Medium stakes
},
'response_sending': {
'level': 'SUGGEST_ONLY', # High stakes, stay very cautious
'actions': ['generate_draft'],
'require_approval': True,
'confidence_threshold': 0.95 # Highest threshold for sending on user's behalf
},
'email_archiving': {
'level': 'AUTONOMOUS',
'actions': ['auto_archive_handled', 'auto_delete_spam'],
'require_approval': False,
'confidence_threshold': 0.88 # Reversible action, medium confidence ok
}
}
config = autonomy_levels[category]
# Double-check accuracy meets category-specific threshold
if trust_score.accuracy < config['confidence_threshold']:
return # Need higher accuracy for this category
self.autonomy[category] = config
# Notify user with calibration details
send_notification(
f"I've earned your trust in {category} "
f"({trust_score.accuracy:.1%} accuracy over {trust_score.count} decisions, "
f"95% CI: [{ci_lower:.1%}, {ci_upper:.1%}]). "
f"I'll now {config['actions'][0]} automatically."
)
class TrustScore:
"""
Rolling accuracy tracker with confidence intervals.
"""
def __init__(self, initial=0.0, window=50):
self.observations = []
self.window = window
def add_observation(self, agreement: bool):
self.observations.append(1 if agreement else 0)
# Keep only recent observations
if len(self.observations) > self.window:
self.observations.pop(0)
@property
def accuracy(self):
if not self.observations:
return 0.0
return sum(self.observations) / len(self.observations)
@property
def count(self):
return len(self.observations)
@property
def confidence_interval(self):
"""
Wilson score interval for binomial proportion.
"""
if self.count < 5:
return (0.0, 1.0) # Too few samples
z = 1.96 # 95% confidence
p = self.accuracy
n = self.count
denominator = 1 + z**2 / n
center = (p + z**2 / (2*n)) / denominator
margin = z * sqrt(p * (1-p) / n + z**2 / (4*n**2)) / denominator
return (center - margin, center + margin)3.4.2 Reversible Autonomy
Trust can be lost. Continuous monitoring detects degradation:
def monitor_autonomous_actions(category, recent_actions):
"""
Check if autonomous actions maintain quality.
"""
if len(recent_actions) < 10:
return # Need enough data
# Check recent accuracy
recent_accuracy = sum(a.user_approved for a in recent_actions[-20:]) / 20
# Check if accuracy dropped below threshold
if recent_accuracy < 0.85:
# Downgrade autonomy
downgrade_trust(category, reason='accuracy_degraded')
send_notification(
f"I've made some mistakes with {category} recently. "
f"I'll go back to suggesting rather than acting automatically "
f"until I earn your trust again."
)
def downgrade_trust(category, reason):
"""
Reduce autonomy level for category.
"""
current_level = autonomy[category]['level']
if current_level == 'AUTONOMOUS':
autonomy[category]['level'] = 'SEMI_AUTONOMOUS'
autonomy[category]['require_approval'] = True
elif current_level == 'SEMI_AUTONOMOUS':
autonomy[category]['level'] = 'SUGGEST_ONLY'
# Log for analysis
log_event('trust_downgrade', category, reason, current_level)3.5 Implementation Architecture
3.5.1 System Components
┌─────────────────────────────────────────────────────────────┐
│ Email Ingestion Layer │
│ (Gmail API, Webhook, Pub/Sub) → Inngest Job Queue │
└─────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Real-Time Processing │
│ • Base urgency scoring (Claude Sonnet 4.5) │
│ • VIP detection │
│ • Situation-aware adjustment │
│ • Trust-level action execution │
└─────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Behavioral Tracking │
│ • PostgreSQL: email_interactions table │
│ • Metrics: response_time, completion, attention_duration │
│ • Real-time: Redis counters for velocity │
└─────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Nightly Drift Analysis (Inngest Cron) │
│ │
│ 1. Compute δ_velocity, δ_attention, δ_completion, δ_interrupt│
│ 2. Calculate composite drift score │
│ 3. If δ > threshold → generate recalibration prompt │
│ 4. Update drift model weights │
│ 5. Detect/update active situations │
│ 6. Update trust scores per category │
└─────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ User Interaction Layer │
│ • SMS/Slack: Recalibration prompts │
│ • Dashboard: Drift visualization, situation timeline │
│ • API: Goal updates, manual situation triggers │
└─────────────────────────────────────────────────────────────┘3.5.2 Data Schema (PostgreSQL)
-- User Goals (Stated Preferences - G_t)
CREATE TABLE user_goals (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
goal_type TEXT, -- 'focus_area', 'ignore_pattern', 'response_target'
domain TEXT, -- 'recruiting', 'product', 'finance', etc.
priority INTEGER, -- 1-10
target_response_hours DECIMAL,
created_at TIMESTAMP,
updated_at TIMESTAMP,
-- For drift comparison
stated_at TIMESTAMP, -- when user declared this
last_reaffirmed TIMESTAMP -- when user last confirmed this
);
-- Behavioral Tracking (Revealed Preferences - B_t)
CREATE TABLE email_interactions (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
email_id UUID REFERENCES emails(id),
-- Interaction signals
opened_at TIMESTAMP,
response_time_seconds INTEGER, -- time to respond
marked_handled_at TIMESTAMP,
archived_at TIMESTAMP,
attention_duration_seconds INTEGER, -- time spent reading/acting
-- Contextual features
email_domain TEXT,
email_urgency_score INTEGER,
email_category TEXT,
-- Derived features (computed nightly)
response_velocity TEXT, -- 'instant', 'same_day', 'next_day', 'week+', 'never'
created_at TIMESTAMP
);
-- Drift Tracking
CREATE TABLE drift_events (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
computed_at TIMESTAMP,
-- Component drift scores
velocity_drift DECIMAL,
attention_drift DECIMAL,
completion_drift DECIMAL,
interruption_drift DECIMAL,
-- Composite
total_drift_score DECIMAL,
threshold_at_time DECIMAL,
triggered_recalibration BOOLEAN,
-- Metadata
dominant_drift_component TEXT,
affected_domains TEXT[],
created_at TIMESTAMP
);
-- Recalibration Events
CREATE TABLE recalibration_events (
id UUID PRIMARY KEY,
drift_event_id UUID REFERENCES drift_events(id),
user_id UUID REFERENCES users(id),
-- Prompt sent to user
recalibration_prompt TEXT,
prompt_sent_at TIMESTAMP,
-- User response
user_response TEXT, -- 'priorities_changed', 'enforce_goals', 'no_change', 'dismissed'
responded_at TIMESTAMP,
-- Outcome
goals_updated BOOLEAN,
enforcement_enabled BOOLEAN,
-- For model learning
drift_score_at_prompt DECIMAL,
threshold_at_prompt DECIMAL,
successful_prompt BOOLEAN, -- did user engage meaningfully?
created_at TIMESTAMP
);
-- Situations (Temporal Contexts)
CREATE TABLE situations (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
situation_type TEXT, -- 'hiring_sprint', 'tax_season', 'board_prep', etc.
-- Detection metadata
detected_at TIMESTAMP,
detection_confidence DECIMAL,
detection_signals JSONB, -- evidence used for detection
-- Lifecycle
status TEXT, -- 'active', 'decaying', 'inactive'
started_at TIMESTAMP,
expected_end_at TIMESTAMP,
actual_ended_at TIMESTAMP,
-- For urgency adjustment
urgency_boost DECIMAL,
relevant_domains TEXT[],
-- For deadline-driven situations
deadline TIMESTAMP,
urgency_curve TEXT, -- 'constant', 'linear', 'exponential'
created_at TIMESTAMP
);
-- Situation Preferences (user can customize)
CREATE TABLE situation_preferences (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
situation_type TEXT,
enabled BOOLEAN DEFAULT TRUE,
urgency_boost_override DECIMAL, -- user can adjust default boost
notify_on_detection BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP,
UNIQUE(user_id, situation_type)
);
-- Trust Scores
CREATE TABLE trust_mode_events (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
category TEXT, -- 'email_urgency_scoring', 'draft_generation', etc.
ai_suggestion JSONB,
user_action JSONB,
agreement BOOLEAN,
-- Trust progression
trust_score_before DECIMAL,
trust_score_after DECIMAL,
autonomy_level TEXT, -- 'SUGGEST_ONLY', 'SEMI_AUTONOMOUS', 'AUTONOMOUS'
-- If autonomy granted/revoked
autonomy_changed BOOLEAN,
created_at TIMESTAMP
);
-- Indexes for performance
CREATE INDEX idx_interactions_user_time ON email_interactions(user_id, created_at DESC);
CREATE INDEX idx_drift_events_user ON drift_events(user_id, computed_at DESC);
CREATE INDEX idx_situations_active ON situations(user_id, status) WHERE status = 'active';
CREATE INDEX idx_trust_category ON trust_mode_events(user_id, category);3.5.3 Nightly Drift Analysis Job (Inngest)
// Scheduled job that runs at 2am daily
export const nightlyDriftAnalysis = inngest.createFunction(
{ id: "nightly-drift-analysis" },
{ cron: "0 2 * * *" }, // 2am daily
async ({ step }) => {
// Process all active users
const activeUsers = await step.run("fetch-active-users", async () => {
return db.users.findMany({
where: { status: 'active', goals_defined: true }
});
});
for (const user of activeUsers) {
await step.run(`analyze-drift-${user.id}`, async () => {
// 1. Gather behavioral data from past 14 days
const behaviorData = await gatherBehaviorData(user.id, days=14);
// 2. Load stated goals
const statedGoals = await db.user_goals.findMany({
where: { user_id: user.id, active: true }
});
// 3. Compute drift metrics
const driftMetrics = computeDriftScore(statedGoals, behaviorData);
// 4. Store drift event
const driftEvent = await db.drift_events.create({
data: {
user_id: user.id,
velocity_drift: driftMetrics.velocity,
attention_drift: driftMetrics.attention,
completion_drift: driftMetrics.completion,
interruption_drift: driftMetrics.interruption,
total_drift_score: driftMetrics.total,
threshold_at_time: user.recalibration_threshold,
triggered_recalibration: false,
dominant_drift_component: driftMetrics.dominant,
affected_domains: driftMetrics.affectedDomains
}
});
// 5. Check if recalibration needed
if (shouldTriggerRecalibration(driftMetrics.total, user)) {
const prompt = generateRecalibrationPrompt(driftMetrics, statedGoals, behaviorData);
// Send via user's preferred channel (SMS/Slack)
await sendRecalibrationPrompt(user, prompt);
await db.drift_events.update({
where: { id: driftEvent.id },
data: { triggered_recalibration: true }
});
}
// 6. Update situation detection
const situations = detectSituations(user.id);
await updateActiveSituations(user.id, situations);
// 7. Update trust scores
await updateTrustScores(user.id);
// 8. Meta-learning: Update drift model weights
await updateDriftModelWeights(user.id);
});
}
}
);4. Theoretical Foundations
4.1 The Temporal Preference Alignment Problem
We formalize the problem as follows:
Definition 4.1 (Stated Preferences): At time t₀, user declares preference function G: X → ℝ mapping email features X to priority scores.
Definition 4.2 (Revealed Preferences): Behavioral function B_t: X → ℝ inferred from user interactions at time t.
Definition 4.3 (Temporal Alignment): Preferences are aligned at time t if ||G(x) - B_t(x)|| < ε for all x ∈ X and small ε.
Observation 4.1 (Drift Inevitability in Non-Stationary Contexts): Under non-stationary user contexts, without periodic recalibration, the expected divergence between stated preferences G and behavioral preferences B_t grows without bound: lim{t→∞} E[||G - B_t||] → ∞.
Intuition: User contexts evolve continuously (job changes, life events, strategic shifts) while stated preferences G are recorded at discrete time points t₀, t₁, ... If behavioral preferences B_t adapt to context but G remains fixed between recalibrations, the gap accumulates. A formal proof would require specifying (1) a stochastic process for context evolution C_t, (2) a mapping from contexts to preferences B_t = f(C_t), (3) sampling intervals for G updates, and (4) appropriate distance metrics with bounded norms. We leave this formalization to future work.
Corollary 4.1: Periodic recalibration is necessary to maintain bounded alignment in non-stationary environments.
4.2 Optimal Recalibration Frequency
There's a tradeoff: recalibrate too often (user annoyance), too rarely (accumulated misalignment).
We model recalibration cost as:
C(δ, λ) = α · δ² + β · λWhere:
- δ: drift magnitude (misalignment cost)
- λ: recalibration frequency (interruption cost)
- α, β: user-specific weights
The optimal recalibration frequency minimizes expected cost:
λ* = argminλ E[α · δ(λ)² + β · λ]This explains why our threshold is learned per-user: different users have different tolerance for interruptions (β) vs tolerance for misalignment (α).
4.3 Multi-Armed Bandit Formulation
Situation detection can be viewed as a contextual bandit problem [Li et al., 2010]:
- Arms: Different urgency boost values [0, 1, 2, 3, 4, 5]
- Context: Email features + detected situation type
- Reward: User satisfaction signal, measured as weighted combination of:
- Response appropriateness (0.4 weight): Did user respond within expected timeframe for the assigned urgency? reward = 1 if response_time <= expected_time(urgency), else 0
- Override rate (0.3 weight): Did user manually change the urgency score? reward = 1 if no_override, else 0
- Completion rate (0.2 weight): Did user mark as handled within 48h? reward = 1 if completed, else 0
- Explicit feedback (0.1 weight): Did user thumbs-up/down the urgency? reward = +1 / -1 if feedback given, else 0
Composite reward function:
def compute_reward(email_interaction, urgency_assigned, situation):
"""
Compute reward for assigned urgency in given situation context.
"""
r_response = 1.0 if email_interaction.response_time <= expected_response_time(urgency_assigned) else 0.0
r_override = 1.0 if not email_interaction.urgency_manually_changed else 0.0
r_completion = 1.0 if email_interaction.completed_within_48h else 0.0
r_explicit = email_interaction.explicit_feedback_score # +1, 0, or -1
reward = 0.4*r_response + 0.3*r_override + 0.2*r_completion + 0.1*r_explicit
# Clip to [-1, 1]
return np.clip(reward, -1.0, 1.0)Guardrails:
- Cap maximum boost at +4 to prevent over-urgency
- Require minimum 10 observations before allowing boosts >2
- Force exploration: 10% epsilon-greedy for rare situation types
We use Thompson Sampling [Russo et al., 2018] to balance exploration (trying new boosts) and exploitation (using known-good boosts):
def select_urgency_boost(email, situation):
"""
Thompson sampling for situation-aware urgency.
Each arm (boost level) has a Beta posterior from historical rewards.
"""
# For each possible boost level
sampled_reward = {}
for boost in [0, 1, 2, 3, 4, 5]:
# Get historical reward statistics for this boost in this situation type
successes = situation.boost_successes[boost] # rewards > 0.5
failures = situation.boost_failures[boost] # rewards <= 0.5
# Sample from posterior distribution (Beta for binary rewards)
posterior = beta_distribution(
alpha=successes + 1, # +1 for uninformative prior
beta=failures + 1
)
sampled_reward[boost] = posterior.sample()
# Select boost with highest sampled reward
selected_boost = argmax(sampled_reward)
# Guardrails
if situation.total_observations < 10 and selected_boost > 2:
selected_boost = 2 # Conservative during learning
return min(selected_boost, 4) # Cap at +45. Experimental Considerations
While this work establishes technical prior art, we describe experimental validation that would demonstrate efficacy:
5.1 Drift Detection Accuracy
Metric: Precision and recall of drift detection prompts.
- Precision: % of recalibration prompts where user confirms priorities changed
- Recall: % of actual priority changes that were detected
Target: Precision > 75%, Recall > 80%
5.2 Time-to-Alignment
Metric: Days until drift score returns to baseline after recalibration.
Target: < 3 days (rapid re-alignment)
5.3 Situation Detection Latency
Metric: Days between situation start and automatic detection.
Target: < 5 days for high-signal situations (hiring sprints)
5.4 User Engagement
Metric: % of recalibration prompts that receive meaningful responses (not dismissed).
Target: > 70% engagement
5.5 Trust Calibration
Metric: Category-specific accuracy at autonomy grant threshold.
Target: > 90% accuracy when autonomy granted, < 85% triggers downgrade
6. Discussion
6.1 Key Innovations
To our knowledge, this work presents a practical production design that combines several techniques in a novel configuration for attention management systems:
1. Behavioral Drift Detection: A system that continuously monitors divergence between stated and revealed preferences using multi-dimensional drift metrics (velocity, attention allocation, completion rates, interruption tolerance) with learned per-user thresholds. While preference learning and behavioral tracking exist independently, we are not aware of prior systems that explicitly compute and surface this divergence with conversational intervention. This relates to concept drift detection in machine learning [Gama et al., 2014], but applies it to the human preference domain rather than data distribution shifts.
2. Conversational Recalibration: Rather than silently learning from behavior or rigidly following stated rules, we explicitly surface drift and ask users to resolve ambiguity through conversation. This respects user agency while maintaining alignment.
3. Automatic Situation Detection: An implementation of unsupervised temporal context detection using multi-signal analysis (email patterns, calendar data, keywords) with dynamic urgency recalibration that automatically begins and ends without manual triggers.
Prior Art Welcome: If you know of prior work in this space that we've missed, please reach out at chance@getprecedent.ai. We want this document to accurately represent the state of the art—both for intellectual honesty and to establish exactly what's new in our specific approach.
6.2 Limitations
Cold Start: We can't detect drift for new users until we have 2-4 weeks of behavioral data. During this period, the system relies solely on stated preferences without drift monitoring. This is a fundamental tradeoff—accurate drift detection requires history, and we prioritize precision over speed. Users in their first month receive standard urgency scoring without personalized drift detection.
Domain Specificity: Email domain. The techniques are generalizable but implementation details are email-specific.
Privacy: Requires access to email content and metadata. Implementation uses a two-tier encryption and retention model to balance functionality with data minimization.
Two-Tier Content Storage Architecture:
We store email content in two encrypted forms with different retention periods:
1. Full content (encrypted, 72-hour retention):
- Purpose: High-fidelity reply drafting and immediate action items
- Contains: Complete subject and body, encrypted per-tenant
- Retention: 72 hours, then permanently deleted
- Access: Decryption logged via
auditDecryption()with reason and caller
2. Redacted content (encrypted, 21-day retention):
- Purpose: Reminders, follow-ups, search, drift detection
- Contains: Subject and body with sensitive content masked (see redaction rules below)
- Retention: 21 days, then permanently deleted
- Access: Decryption logged via
auditDecryption()with reason and caller
Redaction Rules (applied before encryption for long-term storage):
- Dollar amounts / deal values →
[AMOUNT] - M&A language ("acquire", "term sheet", "funding round") →
[DEAL] - Email addresses →
[CONTACT] - Phone numbers →
[PHONE] - Health disclosures ("diagnosed with", medical terms) →
[HEALTH] - SSNs, bank account numbers →
[PII]
Encryption Model:
- Per-tenant encryption keys (not environment-wide)
- Keys managed separately from data (KMS-backed in production)
- All email content stored as encrypted blobs only—no plaintext in database columns
- Two encrypted fields per message:
body_encrypted(full, 72h) +body_redacted_encrypted(redacted, 21d)
Fields Retained (with encryption and hashing):
Email metadata:
sender_domain_hash: SHA-256 hash - non-reversiblesender_email_hash: SHA-256 hash - non-reversiblethread_id_hash: SHA-256 hash - non-reversiblereceived_timestamp: Datetimeurgency_score: Derived integer [1-10]category: Derived category stringexpires_full_at: 72-hour TTL for full contentexpires_redacted_at: 21-day TTL for redacted content
Interaction data retained indefinitely (for drift detection):
interaction_timestamp: When user opened/responded/handledresponse_time_seconds: Time to respond (not content)attention_duration_seconds: Time spent (not content)action_taken: Enum of {opened, replied, archived, marked_handled, deleted}category: Email category (for drift computation)
Aggregated features:
domain_response_velocity: Median response time per domain (daily rollup)category_attention_allocation: Time distribution across categories (daily)drift_scores: Weekly drift metric snapshotsvip_sender_scores: Frequency-based VIP rankings
Data Retention Policy:
- Full encrypted content: 72 hours, then column NULLed
- Redacted encrypted content: 21 days, then row deleted
- Interaction metadata: Retained for drift analysis (no email content)
- Aggregated features: Retained indefinitely (cannot reconstruct emails)
- User deletion: All data purged within 24 hours of account closure
Security Controls:
- Per-tenant encryption for all email content
- At-rest encryption (AES-256) for database
- Field-level hashing for PII (sender emails, thread IDs) - non-reversible
- Row-level security (RLS) ensures users only access own data
- Audit logging for every decryption operation (
auditDecryptionwith timestamp, reason, caller) - No raw email content in database—encrypted blobs only
- Human access forbidden except via explicit "break glass" flow (logged and ticketed)
Computational Cost: Nightly drift analysis at scale requires significant compute. We batch process and use cached embeddings to manage costs, but this creates a tradeoff: drift detection runs daily rather than real-time. We chose accuracy and cost-efficiency over instant detection—for most users, detecting drift within 24 hours is sufficient.
6.3 Future Directions
Multi-Modal Drift: Extend to calendar, Slack, documents—detect when stated project priorities diverge from actual time allocation.
Causal Drift Attribution: Use causal inference to determine why drift occurred (external factors vs internal preference change).
Federated Learning: Enable drift detection without centralized behavioral data through federated drift metrics.
Explainable Recalibration: Improve interpretability of drift signals to help users understand why the system detected drift.
7. Conclusion
We have presented a comprehensive system for behavioral drift detection and conversational recalibration in AI-powered attention management. By continuously monitoring the divergence between stated goals and revealed behavior, automatically detecting temporal contexts, and implementing category-specific progressive trust, our system maintains long-term alignment between user intent and system behavior.
The technical details provided—including algorithms, data schemas, implementation architecture, and theoretical foundations—establish clear prior art for these innovations. We believe these techniques represent meaningful advances in human-AI alignment for personal productivity systems and hope this publication spurs further research in temporal preference learning and proactive recalibration mechanisms.
References
Christiano, P. F., et al. (2017). Deep reinforcement learning from human preferences. NeurIPS.
Gama, J., et al. (2014). A Survey on Concept Drift Adaptation. ACM Computing Surveys.
Goodrich, M. A., & Schultz, A. C. (2007). Human-robot interaction: a survey. Foundations and Trends in Human–Computer Interaction.
Killick, R., et al. (2012). PELT: Optimal Detection of Changepoints with Linear Computational Cost. JASA.
Li, L., et al. (2010). A contextual-bandit approach to personalized news article recommendation. WWW.
Rubner, Y., et al. (2000). The earth mover's distance as a metric for image retrieval. IJCV.
Russo, B., et al. (2018). A Tutorial on Thompson Sampling. Foundations & Trends in ML.
Sadigh, D., et al. (2017). Active preference-based learning of reward functions. RSS.
Scerri, P., et al. (2002). Designing agents for systems with adjustable autonomy. IJCAI.
Appendix A: Pseudocode for Core Algorithms
A.1 Complete Drift Detection Algorithm
def nightly_drift_detection(user_id: str) -> Optional[RecalibrationPrompt]:
"""
Full drift detection pipeline.
"""
# 1. Gather data
stated_goals = load_user_goals(user_id)
behavior_data = load_interactions(user_id, window_days=14)
user_history = load_drift_history(user_id)
if len(behavior_data) < 50: # Insufficient data
return None
# 2. Compute component drifts
drift_components = {
'velocity': compute_velocity_drift(
stated_goals.response_targets,
behavior_data.response_times
),
'attention': compute_attention_drift(
stated_goals.focus_areas,
behavior_data.time_allocation
),
'completion': compute_completion_drift(
stated_goals.priorities,
behavior_data.completion_rates
),
'interruption': compute_interruption_drift(
stated_goals.notification_settings,
behavior_data.interrupt_responses
)
}
# 3. Compute weighted composite score
weights = user_history.learned_weights # personalized per user
drift_score = sum(weights[k] * drift_components[k]
for k in drift_components.keys())
# 4. Normalize using historical distribution
drift_percentile = compute_percentile(
drift_score,
user_history.drift_distribution
)
# 5. Store drift event
store_drift_event(
user_id=user_id,
components=drift_components,
total_score=drift_score,
percentile=drift_percentile,
threshold=user_history.recalibration_threshold
)
# 6. Check recalibration threshold
if drift_percentile < user_history.recalibration_threshold:
return None # No recalibration needed
# 7. Check recency (don't spam)
days_since_last = (now() - user_history.last_recalibration).days
if days_since_last < 7:
return None # Too soon
# 8. Generate recalibration prompt
dominant_component = max(drift_components.items(), key=lambda x: x[1])
prompt = generate_contextual_prompt(
drift_type=dominant_component[0],
magnitude=dominant_component[1],
stated_goals=stated_goals,
behavior_data=behavior_data
)
# 9. Update history
user_history.last_recalibration = now()
user_history.save()
return RecalibrationPrompt(
user_id=user_id,
prompt_text=prompt,
drift_score=drift_score,
dominant_component=dominant_component[0]
)A.2 Situation Detection Algorithm
def detect_situations(user_id: str, lookback_days: int = 14) -> List[Situation]:
"""
Detect active situations using multi-signal analysis.
"""
signals = gather_signals(user_id, lookback_days)
situations = []
# Pattern matchers for different situation types
matchers = [
HiringSprintMatcher(),
TaxSeasonMatcher(),
BoardPrepMatcher(),
ProductLaunchMatcher(),
ContractNegotiationMatcher()
]
for matcher in matchers:
if match := matcher.detect(signals):
situations.append(Situation(
type=matcher.situation_type,
confidence=match.confidence,
evidence=match.evidence,
started_at=estimate_start_date(signals, match),
expected_duration=matcher.typical_duration,
urgency_boost=matcher.default_boost,
relevant_domains=matcher.relevant_domains
))
return situations
class HiringSprintMatcher:
situation_type = 'hiring_sprint'
typical_duration = 30 # days
default_boost = 3.0
relevant_domains = ['recruiting', 'talent', 'hr']
def detect(self, signals: SignalBundle) -> Optional[Match]:
# Email volume spike
recruiting_emails = signals.email_counts['recruiting']
baseline = signals.baseline_counts['recruiting']
if recruiting_emails < baseline * 2:
return None # No significant spike
# Calendar confirmation
interview_count = len([e for e in signals.calendar_events
if 'interview' in e.title.lower()])
if interview_count < 3:
return None # Not enough interviews scheduled
# Keyword frequency
keywords = ['candidate', 'hire', 'talent', 'recruiting', 'interview']
keyword_mentions = sum(signals.keyword_counts[kw] for kw in keywords)
keyword_baseline = sum(signals.baseline_keywords[kw] for kw in keywords)
if keyword_mentions < keyword_baseline * 1.5:
return None # Not enough keyword signal
# Compute confidence based on signal strength
confidence = min(1.0, (
0.4 * (recruiting_emails / baseline / 2) + # email volume weight
0.3 * (interview_count / 5) + # calendar weight
0.3 * (keyword_mentions / keyword_baseline / 1.5) # keyword weight
))
return Match(
confidence=confidence,
evidence={
'email_volume_ratio': recruiting_emails / baseline,
'interview_count': interview_count,
'keyword_frequency_ratio': keyword_mentions / keyword_baseline
}
)Appendix B: Deployment Considerations
B.1 Scalability
Single-User Performance:
- Drift analysis: ~2-5 seconds/user (PostgreSQL + Python)
- Can process 10,000 users in ~10 hours (nightly batch)
Optimization Strategies:
- Parallel processing (10 workers → 1 hour for 10k users)
- Incremental computation (cache intermediate drift metrics)
- Sampling for very active users (analyze subset of emails)
B.2 Privacy & Data Retention
Data Minimization:
- Two-tier encrypted storage: full content (72h) + redacted content (21d)
- Per-tenant encryption keys, never store plaintext in database
- Redaction of sensitive content (dollar amounts, PII, health info, M&A terms) before long-term storage
- Interaction metadata retained for drift detection (no email content)
- Aggregated drift metrics retained indefinitely
User Control:
- One-click export of all behavioral data
- Immediate deletion of all data on account closure
- Granular controls over which data feeds drift detection
- Audit log of all decryption operations available to user
Document Version: 1.0
Publication Date: October 29, 2025
License: CC BY 4.0 (Creative Commons Attribution)
Permanent Identifier: https://getprecedent.ai/research/drift-detection-2025
This document establishes prior art for the described techniques and may be cited as:
Kelch, C. (2025). Behavioral Drift Detection and Conversational Recalibration in Personalized Attention Management Systems. Technical Report, Precedent AI.