Behavioral Drift Detection and Conversational Recalibration in Personalized Attention Management Systems

Author: Chance Kelch
Affiliation: Precedent AI (getprecedent.ai)
Date: October 29, 2025
Contact: chance@getprecedent.ai

Abstract

We present a novel approach to personalized attention management that addresses a critical limitation in existing AI-driven email prioritization systems: the temporal misalignment between users' stated preferences and their actual behavior. Our system introduces three key innovations: (1) a behavioral drift detection mechanism that continuously monitors divergence between declared goals and observed interaction patterns, (2) a conversational recalibration protocol that proactively initiates goal reassessment when drift exceeds learned thresholds, and (3) a context-aware situation detection system that dynamically adjusts urgency scoring based on temporal life events. We provide detailed algorithmic descriptions, implementation architectures, and theoretical foundations sufficient for reproduction by practitioners in the field. This work establishes prior art for these techniques in production AI systems.

Keywords: attention management, behavioral drift detection, preference learning, human-AI alignment, email prioritization, temporal context awareness

1. Introduction

1.1 Problem Statement

Current email prioritization systems suffer from a fundamental temporal alignment problem. Users articulate preferences at time t₀ (e.g., "I want to focus less on recruiting"), but their behavior evolves continuously while their stated preferences remain static. This creates three pathologies:

  1. Preference Fossilization: Systems continue optimizing for outdated goals
  2. Silent Drift: Behavioral changes occur without system awareness
  3. Misalignment Accumulation: The gap between stated and revealed preferences grows unbounded

Existing approaches fall into two categories:

  • Passive learning: Systems learn solely from behavior, ignoring explicit preferences
  • Static preferences: Systems use only declared rules, ignoring behavioral signals

Neither approach addresses the temporal misalignment problem.

1.2 Contributions

We present a system that:

  1. Detects behavioral drift by computing divergence metrics between stated goals and observed interactions across multiple behavioral dimensions
  2. Initiates conversational recalibration when drift exceeds dynamically learned thresholds, asking users if their priorities have changed
  3. Implements situation-aware scoring that automatically detects temporal contexts (e.g., "hiring sprint," "tax season") and adjusts prioritization accordingly
  4. Provides progressive trust mechanisms that grant increasing autonomy based on accuracy over category-specific decision types

This work establishes prior art for these techniques and provides sufficient technical detail for implementation by others in the field.

2. Related Work

2.1 Email Prioritization Systems

Traditional email prioritization relies on static rules (SaneBox, Hey.com) or generic ML models trained on aggregate user behavior (Gmail Priority Inbox, Outlook Focused Inbox). These approaches fail to capture individual preference evolution.

Key Limitation: No existing system detects when user behavior diverges from stated preferences and proactively initiates goal reassessment.

2.2 Preference Learning

Reinforcement Learning from Human Feedback (RLHF) [Christiano et al., 2017] learns reward functions from binary preferences. Active Preference Learning [Sadigh et al., 2017] queries users strategically to reduce uncertainty.

Our Distinction: Rather than learning a static reward function, we maintain two parallel models—stated preferences (G) and revealed preferences (B)—and explicitly monitor their divergence over time.

2.3 Context-Aware Systems

Calendar-integrated systems (Google Calendar Smart Scheduling, x.ai) use temporal context for scheduling. Recommender systems use contextual bandits [Li et al., 2010] for content ranking.

Our Distinction: We introduce automatic situation detection from communication patterns combined with dynamic urgency recalibration, rather than manual context specification or single-factor temporal features.

2.4 Human-AI Trust

Progressive autonomy in robotics [Goodrich & Schultz, 2007] and adjustable autonomy in multi-agent systems [Scerri et al., 2002] gradually increase system independence.

Our Distinction: We apply category-specific trust with reversible autonomy and continuous monitoring rather than one-time approval or domain-general trust scores.

3. Technical Approach

3.1 Behavioral Drift Detection

3.1.1 State Representation

At time t, we maintain:

Stated Goals Vector (Gt): User's explicit preferences

G_t = {
  focus_areas: [(domain_i, priority_i, declared_at_t)],
  ignore_patterns: [(pattern_j, threshold_j, declared_at_t)],
  response_targets: [(sender_class_k, max_delay_k, declared_at_t)]
}

Behavioral Profile Matrix (Bt): Observed interaction patterns

B_t = [
  response_velocity(domain_i, t_window),
  attention_allocation(domain_i, t_window),
  completion_rate(domain_i, t_window),
  interruption_tolerance(domain_i, t_window)
]

Where t_window is a sliding window (default: 14 days) to capture recent behavior while maintaining temporal sensitivity. This window size balances:

  • Short enough to detect drift quickly (vs 30-day windows that lag)
  • Long enough to smooth out daily variance (vs 7-day windows that are too noisy)
  • Configurable per user based on email volume (high-volume users can use 7-day windows)

Response Velocity Buckets:

We discretize continuous response times into interpretable categories:

VELOCITY_BUCKETS = {
    'instant': (0, 0.5),      # < 30 minutes
    'same_day': (0.5, 8),     # 30min - 8 hours
    'next_day': (8, 32),      # 8 - 32 hours  
    'week': (32, 168),        # 32h - 1 week
    'never': (168, float('inf'))  # > 1 week or no response
}

def bucket_response_time(hours):
    for bucket_name, (min_h, max_h) in VELOCITY_BUCKETS.items():
        if min_h <= hours < max_h:
            return bucket_name
    return 'never'

These buckets are used in drift computations and user-facing reports ("You respond to recruiting emails within 8 hours but stated goal was next-day").

3.1.2 Drift Metric Computation

We compute drift using a weighted earth mover's distance between normalized behavioral distributions and stated preference distributions:

δ(G_t, B_t) = Σᵢ wᵢ · EMD(Pᵢᴳ, Pᵢᴮ)

Where:

  • Pᵢᴳ: Probability distribution of priority i derived from stated goals
  • Pᵢᴮ: Probability distribution of priority i derived from behavior
  • wᵢ: Domain-specific weight learned from historical recalibration acceptance rates
  • EMD: Earth Mover's Distance [Rubner et al., 2000]

Why EMD?: Unlike L2 distance, EMD respects the semantic similarity between email categories. "Product feedback" and "product bugs" should have lower transportation cost than "product bugs" and "personal finance."

Ground Distance for EMD: The cost of moving probability mass between categories i and j is computed using one of:

1. Embedding-based distance (default):

ground_cost(i, j) = 1 - cosine_similarity(embed(i), embed(j))
# where embed() uses text-embedding-3-small on category descriptions

2. Learned confusion matrix: After sufficient user data, estimate empirically:

ground_cost(i, j) = P(user treats category i like category j)
# measured by response velocity similarity, attention allocation correlation

3. Ontology-based distance: Use predefined category hierarchy:

ground_cost(i, j) = shortest_path_length(i, j) / max_path_length
# normalized tree distance

We default to (1) for cold-start, transition to (2) after 30+ days of user data.

3.1.3 Specific Drift Signals

We track four behavioral dimensions with distinct drift computations:

1. Response Velocity Drift (δvelocity)

For each domain d and time window w:

# Default priority-to-response-time mapping (personalized per user over time)
DEFAULT_PRIORITY_MAP = {
    10: 0.25,   # Urgent: 15 minutes
    9: 1,       # Very High: 1 hour
    8: 4,       # High: 4 hours
    7: 8,       # Medium-High: same day
    6: 24,      # Medium: next day
    5: 48,      # Medium-Low: 2 days
    4: 72,      # Low: 3 days
    3: 168,     # Very Low: 1 week
    2: 336,     # Minimal: 2 weeks
    1: None     # Ignore: no expected response
}

def compute_velocity_drift(stated_priority_d, observed_responses_d, user_history):
    """
    Stated priority implies expected response time.
    Compare to actual median response time.
    
    priority_to_response_map is personalized per user over time by fitting
    a regression model: response_time ~ f(priority, domain, sender_VIP_score)
    """
    # Get user's personalized mapping, fall back to default
    priority_map = user_history.learned_priority_map or DEFAULT_PRIORITY_MAP
    
    expected_response_hours = priority_map[stated_priority_d]
    actual_median_hours = median([r.response_time_hours for r in observed_responses_d])
    
    if expected_response_hours is None:  # Priority 1 = no response expected
        return 0.0  # No drift for "ignore" category
    
    # Normalized drift: log ratio of actual to expected
    # log(1) = 0 (no drift), log(0.5) = -0.69 (responding faster)
    # log(2) = 0.69 (responding slower)
    velocity_drift = abs(log(actual_median_hours / expected_response_hours))
    
    # Personalization: Update user's priority map using exponential moving average
    alpha = 0.1  # learning rate
    user_history.learned_priority_map[stated_priority_d] = (
        (1 - alpha) * expected_response_hours + 
        alpha * actual_median_hours
    )
    
    return velocity_drift

Example: User states "recruiting is low priority (5)" (expected response: 48h), but actually responds in 6h median → drift = log(6/48) = -2.0

2. Attention Allocation Drift (δattention)

Measures divergence between stated focus areas and actual time spent:

def compute_attention_drift(stated_focus_weights, observed_time_allocation):
    """
    Use KL divergence between stated and observed distributions.
    Note: KL divergence is asymmetric; we also compute Jensen-Shannon
    as a symmetric alternative.
    """
    # Normalize both to probability distributions with ε-smoothing
    epsilon = 1e-4  # ensures strictly positive support
    P_stated = normalize(stated_focus_weights) + epsilon
    P_stated = P_stated / sum(P_stated)  # re-normalize after smoothing
    
    P_observed = normalize(observed_time_allocation) + epsilon
    P_observed = P_observed / sum(P_observed)
    
    # KL divergence: D_KL(P_observed || P_stated)
    # Measures cost of encoding observed using stated distribution
    kl_div = sum(P_observed[i] * log(P_observed[i] / P_stated[i]) 
                 for i in domains)
    
    # Jensen-Shannon divergence (symmetric alternative)
    # JS(P,Q) = 0.5*KL(P||M) + 0.5*KL(Q||M) where M = 0.5*(P+Q)
    M = 0.5 * (P_observed + P_stated)
    js_div = 0.5 * sum(P_observed[i] * log(P_observed[i] / M[i]) for i in domains) + \
             0.5 * sum(P_stated[i] * log(P_stated[i] / M[i]) for i in domains)
    
    # Return KL by default (penalizes observed deviating from stated)
    # But track JS for symmetric comparison
    return kl_div, js_div

Example: User states 70% product, 30% recruiting. Observed: 40% product, 60% recruiting → KL divergence = 0.51 nats

3. Completion Rate Drift (δcompletion)

Tracks which declared important items go unhandled:

def compute_completion_drift(domain_priorities, completion_rates):
    """
    High priority domains should have high completion rates.
    Compute rank correlation.
    """
    priority_ranks = rank(domain_priorities)
    completion_ranks = rank(completion_rates)
    
    # Spearman's rank correlation (1 = perfect alignment, -1 = opposite)
    rho = spearman_correlation(priority_ranks, completion_ranks)
    
    # Convert to drift metric (0 = aligned, 1 = opposite)
    completion_drift = (1 - rho) / 2
    
    return completion_drift

4. Interruption Tolerance Drift (δinterrupt)

Measures when user accepts/dismisses immediate notifications:

def compute_interruption_drift(notification_settings, actual_responses, user_history):
    """
    Track which urgency levels get immediate attention vs dismissal.
    Thresholds are learned per-user via Bayesian updating.
    """
    # User-specific dismissal threshold (initialized at 0.3, updated via Beta prior)
    dismissal_threshold = user_history.learned_dismissal_threshold
    # Beta distribution parameters from historical data
    alpha_prior = user_history.dismissal_alpha  # successes (accepts)
    beta_prior = user_history.dismissal_beta    # failures (dismissals)
    
    drift_signals = []
    
    for urgency_level in [8, 9, 10]:
        stated_threshold = notification_settings.urgency_threshold
        dismissal_rate = actual_responses[urgency_level].dismissal_rate
        
        if urgency_level >= stated_threshold and dismissal_rate > dismissal_threshold:
            # User dismisses above their learned threshold of "urgent" notifications
            # This suggests stated urgency threshold is too low
            drift_signals.append(('interrupt_threshold_too_low', dismissal_rate))
    
    # Update dismissal threshold using Bayesian update
    new_accepts = sum(actual_responses[u].accept_count for u in [8,9,10])
    new_dismissals = sum(actual_responses[u].dismiss_count for u in [8,9,10])
    
    user_history.dismissal_alpha += new_accepts
    user_history.dismissal_beta += new_dismissals
    user_history.learned_dismissal_threshold = beta_prior / (alpha_prior + beta_prior)
    
    return aggregate(drift_signals)

Note on threshold learning: The 0.3 initial value is based on internal empirical observation, but the system adapts this per-user. Users with high interruption tolerance may converge to 0.5+ (tolerate more dismissals before flagging drift), while notification-sensitive users converge to 0.1-0.2.

3.1.4 Composite Drift Score

The final drift score combines all dimensions with learned weights:

def compute_drift_score(G_t, B_t, user_history):
    """
    Compute weighted composite drift score.
    Weights are personalized based on which signals 
    predicted successful recalibrations in the past.
    """
    # Get user-specific weights (initialized uniformly, learned over time)
    w = user_history.drift_weights  # [w_velocity, w_attention, w_completion, w_interrupt]
    
    # Compute component drifts
    δ_v = compute_velocity_drift(G_t.response_targets, B_t.response_times)
    δ_a = compute_attention_drift(G_t.focus_areas, B_t.time_allocation)
    δ_c = compute_completion_drift(G_t.priorities, B_t.completion_rates)
    δ_i = compute_interruption_drift(G_t.notification_settings, B_t.interrupt_responses)
    
    # Weighted sum
    δ_total = w[0]*δ_v + w[1]*δ_a + w[2]*δ_c + w[3]*δ_i
    
    # Normalize to [0,1] using user's OWN historical distribution (not cohort)
    # This personalizes what counts as "high drift" for each user
    # A user with naturally volatile behavior needs higher drift to trigger
    # vs a user with stable patterns where small drifts are meaningful
    δ_normalized = percentile(δ_total, user_history.drift_distribution)
    # percentile() returns where δ_total falls in user's historical drift scores
    # e.g., if δ_total is at 80th percentile of user's past drifts → 0.80
    
    return δ_normalized, [δ_v, δ_a, δ_c, δ_i]

Note on normalization: We use per-user percentile normalization rather than cohort-based normalization because:

  1. Users have different baseline behavioral variance (some are naturally more variable)
  2. Email patterns differ by role (exec vs IC), industry, communication style
  3. Avoids penalizing high-variance users or under-detecting drift in low-variance users
  4. Each user's drift threshold adapts to their personal patterns

3.1.5 Per-User Threshold Adaptation

Critical innovation: The recalibration threshold is not fixed but learned per-user.

def should_trigger_recalibration(δ_score, user_history):
    """
    Dynamic threshold based on user's historical response to recalibrations.
    Includes cool-down period to prevent prompt fatigue.
    """
    # Get user's learned threshold (initialized at 0.65)
    threshold = user_history.recalibration_threshold
    
    if δ_score > threshold:
        # Check recency: enforce 7-day minimum between recalibration prompts
        # This prevents overwhelming users with frequent prompts even if drift is high
        days_since_last = (now() - user_history.last_recalibration).days
        
        COOLDOWN_DAYS = 7  # Minimum days between prompts (consistent across system)
        
        if days_since_last < COOLDOWN_DAYS:
            # Log that drift was detected but suppressed due to cooldown
            log_event('drift_detected_cooldown', δ_score, days_since_last)
            return False  # Too soon - respect cooldown
        
        return True
    
    return False

def update_threshold(user_accepted_recalibration, δ_score_at_trigger):
    """
    Update threshold based on user response to recalibration prompt.
    """
    if user_accepted_recalibration:
        # User confirmed priorities changed → threshold was good or too high
        # Lower it slightly to catch drifts earlier
        new_threshold = threshold * 0.95
    else:
        # User said "no, priorities haven't changed" → threshold too low
        # Raise it to reduce false positives
        new_threshold = threshold * 1.1
    
    # Clamp to reasonable bounds
    return clip(new_threshold, min=0.4, max=0.9)

3.2 Conversational Recalibration Protocol

When drift is detected, we don't just log it—we initiate a conversation.

3.2.1 Recalibration Message Generation

def generate_recalibration_prompt(δ_components, G_t, B_t):
    """
    Generate specific, evidence-based recalibration question.
    """
    # Identify which drift component is highest
    dominant_drift = argmax(δ_components)
    
    if dominant_drift == 'velocity':
        # Find specific domain with largest velocity divergence
        domain = find_max_velocity_drift(G_t, B_t)
        
        stated = G_t.response_targets[domain]
        actual = median(B_t.response_times[domain])
        
        prompt = f"""
        You mentioned you wanted to focus less on {domain}, 
        but you're responding {actual}h faster than your stated 
        target of {stated}h. Has your priority changed?
        
        [Yes, {domain} is more important now] [No, help me stick to my goal]
        """
    
    elif dominant_drift == 'attention':
        # Show attention reallocation
        top_stated = top_domains(G_t.focus_areas)
        top_actual = top_domains(B_t.time_allocation)
        
        prompt = f"""
        I notice you're spending more time on {top_actual[0]} and less 
        on {top_stated[0]} than you intended. Should I adjust to match 
        your current focus?
        
        [Yes, update priorities] [No, help me rebalance]
        """
    
    # Similar logic for completion and interruption drifts...
    
    return prompt

3.2.2 Response Handling

def handle_recalibration_response(user_response, G_t, B_t):
    """
    Update system based on user's recalibration choice.
    """
    if user_response.intent == 'update_priorities':
        # User confirms priorities changed → update G_t to match B_t
        G_t_new = align_goals_to_behavior(G_t, B_t)
        
        # Log this as a successful drift detection
        log_event('drift_confirmed', δ_score, user_response)
        
        # Reset drift accumulator
        reset_drift_tracking()
        
        return G_t_new
    
    elif user_response.intent == 'enforce_goals':
        # User wants to stick to stated goals → increase enforcement
        
        # Add nudges/reminders to help user honor stated goals
        enable_goal_enforcement(G_t, B_t)
        
        # Examples:
        # - "You have 3 recruiting emails. Reminder: you said this is low priority"
        # - Daily summary: "You spent 60% of time on recruiting vs 30% goal"
        
        # Keep G_t unchanged, but mark drift as "user wants correction"
        log_event('drift_rejected', δ_score, 'user_wants_enforcement')
        
        return G_t  # unchanged

3.2.3 Meta-Learning from Recalibrations

Each recalibration event provides training signal:

def update_drift_model(recalibration_event):
    """
    Learn which drift signals predict successful recalibrations.
    """
    # Extract features
    X = {
        'δ_velocity': recalibration_event.drift_components[0],
        'δ_attention': recalibration_event.drift_components[1],
        'δ_completion': recalibration_event.drift_components[2],
        'δ_interrupt': recalibration_event.drift_components[3],
        'time_since_goal_set': recalibration_event.goal_age_days,
        'domain_category': recalibration_event.primary_domain,
    }
    
    # Label: did user confirm priorities changed?
    y = 1 if recalibration_event.outcome == 'priorities_changed' else 0
    
    # Update logistic regression model (or gradient boosting tree)
    drift_model.partial_fit(X, y)
    
    # This improves future drift detection accuracy

3.3 Situation-Aware Dynamic Urgency Scoring

3.3.1 Automatic Situation Detection

We detect temporal contexts ("situations") using multi-signal analysis:

class SituationDetector:
    """
    Automatically detect when user enters/exits situation contexts.
    """
    
    def detect_situations(self, user_id, current_date):
        """
        Run nightly to detect active situations.
        """
        signals = self.gather_signals(user_id, lookback_days=14)
        
        situations = []
        
        # Example: Hiring Sprint Detection
        hiring_signals = {
            'email_volume': count_emails(domain='recruiting', signals),
            'calendar_interviews': count_calendar_events(type='interview', signals),
            'keyword_frequency': count_keywords(['candidate', 'hire', 'interview'], signals),
            'domain_velocity': compute_response_speed('recruiting', signals)
        }
        
        if (hiring_signals['email_volume'] > baseline * 2.0 and
            hiring_signals['calendar_interviews'] > 3 and
            hiring_signals['keyword_frequency'] > baseline * 1.5):
            
            situations.append({
                'type': 'hiring_sprint',
                'confidence': 0.87,
                'evidence': hiring_signals,
                'started_at': estimate_start_date(signals),
                'expected_duration': 30  # days, learned from historical patterns
            })
        
        # Example: Tax Season Detection
        if (current_date.month in [3, 4] and
            count_emails(sender_domain='cpa|accountant|irs', signals) > baseline * 3):
            
            situations.append({
                'type': 'tax_season',
                'confidence': 0.95,
                'evidence': {...},
                'started_at': date(current_date.year, 3, 1),
                'expected_duration': 45
            })
        
        # Example: Board Preparation Detection
        board_signals = {
            'calendar_event': find_calendar_event(title_contains='board meeting'),
            'deck_mentions': count_keywords(['board deck', 'board materials', 'board prep']),
            'leadership_emails': count_emails(from_role='c-suite', about='board')
        }
        
        if board_signals['calendar_event'] and board_signals['deck_mentions'] > 2:
            days_until_meeting = (board_signals['calendar_event'].date - current_date).days
            
            situations.append({
                'type': 'board_prep',
                'confidence': 0.92,
                'deadline': board_signals['calendar_event'].date,
                'urgency_curve': 'exponential',  # urgency increases as deadline approaches
                'expected_duration': min(days_until_meeting, 21)
            })
        
        return situations
    
    def estimate_start_date(self, signals):
        """
        Use changepoint detection to find when signal pattern emerged.
        We use the PELT (Pruned Exact Linear Time) algorithm [Killick et al., 2012].
        """
        time_series = [signal.count for signal in signals]
        
        # PELT parameters:
        # - Cost function: Normal likelihood (assumes Gaussian noise)
        # - Penalty λ: Controls false positive rate
        #   Higher λ = fewer changepoints (more conservative)
        #   We use λ = log(n) * σ² where n = len(time_series), σ² = variance
        n = len(time_series)
        sigma_sq = np.var(time_series)
        penalty = np.log(n) * sigma_sq  # BIC-like penalty
        
        changepoints = detect_changepoints(
            time_series, 
            cost_function='normal',  # Gaussian likelihood
            penalty=penalty,
            min_segment_length=3  # Require at least 3 days per segment
        )  # Returns indices where distribution changes
        
        # Alternative: Bayesian Online Changepoint Detection (BOCD)
        # Tradeoff: PELT is faster, BOCD provides uncertainty estimates
        # For situations needing probability of changepoint:
        # changepoint_probs = bayesian_online_changepoint(time_series, hazard=1/100)
        
        if changepoints:
            return signals[changepoints[-1]].date  # Most recent changepoint
        return signals[0].date  # No changepoint found, use start of window

3.3.2 Situation-Aware Urgency Adjustment

Once a situation is detected, we modify urgency scoring:

def compute_urgency_score(email, base_urgency, active_situations):
    """
    Adjust base urgency based on active situations.
    """
    adjusted_urgency = base_urgency
    
    for situation in active_situations:
        if is_relevant(email, situation):
            # Apply situation-specific urgency boost
            boost = compute_situation_boost(email, situation)
            adjusted_urgency = min(10, adjusted_urgency + boost)
    
    return adjusted_urgency

def compute_situation_boost(email, situation):
    """
    Calculate urgency boost for situation-relevant emails.
    """
    if situation['type'] == 'hiring_sprint':
        # Recruiting emails get +3 urgency during hiring sprints
        if 'recruiting' in email.domain:
            return 3.0
    
    elif situation['type'] == 'board_prep':
        # Board-related emails get exponentially urgent as deadline approaches
        days_until_deadline = (situation['deadline'] - now()).days
        
        if days_until_deadline <= 3:
            return 4.0  # Critical urgency
        elif days_until_deadline <= 7:
            return 3.0  # High urgency
        elif days_until_deadline <= 14:
            return 2.0  # Moderate urgency
    
    elif situation['type'] == 'tax_season':
        # Financial/tax emails get +2 urgency March-April
        if any(kw in email.sender for kw in ['cpa', 'accountant', 'tax']):
            return 2.0
    
    return 0.0

def is_relevant(email, situation):
    """
    Determine if email is relevant to situation.
    """
    relevance_keywords = {
        'hiring_sprint': ['candidate', 'interview', 'hire', 'recruiting', 'talent'],
        'tax_season': ['tax', 'cpa', 'irs', 'return', 'deduction', 'audit'],
        'board_prep': ['board', 'deck', 'materials', 'presentation', 'directors']
    }
    
    keywords = relevance_keywords[situation['type']]
    
    # Check subject, body, sender domain
    return any(kw in email.subject.lower() or 
               kw in email.body.lower() or
               kw in email.sender.lower() 
               for kw in keywords)

3.3.3 Automatic Situation Decay

Situations don't last forever. We detect when they end:

def check_situation_decay(situation, recent_signals):
    """
    Determine if situation has ended.
    """
    current_intensity = measure_signal_intensity(situation['type'], recent_signals)
    
    # If signal intensity drops below 50% of peak, situation is ending
    if current_intensity < situation['peak_intensity'] * 0.5:
        return True
    
    # If past expected duration + grace period
    days_active = (now() - situation['started_at']).days
    if days_active > situation['expected_duration'] * 1.2:
        return True
    
    # If deadline passed (for deadline-driven situations like board prep)
    if 'deadline' in situation and now() > situation['deadline']:
        return True
    
    return False

def deactivate_situation(situation):
    """
    Graceful situation deactivation.
    """
    # Gradually reduce urgency boosts over 3 days
    situation['decay_rate'] = 0.33  # per day
    situation['status'] = 'decaying'
    
    # After 3 days, fully deactivate
    schedule_job(delay=3*days, job=lambda: situation.update(status='inactive'))

3.4 Progressive Trust Mode

3.4.1 Category-Specific Trust Accumulation

Trust is not monolithic—users may trust the system for scheduling but not for drafting emails:

class TrustTracker:
    """
    Track trust scores per decision category.
    """
    
    categories = [
        'email_urgency_scoring',
        'draft_generation',
        'meeting_scheduling',
        'email_archiving',
        'response_sending',
        'contact_prioritization'
    ]
    
    def __init__(self):
        # Initialize trust scores per category
        self.trust_scores = {cat: TrustScore(initial=0.0) for cat in self.categories}
    
    def record_decision(self, category, ai_suggestion, user_action):
        """
        Track whether user accepted AI's suggestion.
        """
        agreement = (ai_suggestion == user_action)
        
        # Update category-specific trust score
        self.trust_scores[category].add_observation(agreement)
        
        # Check if category crossed autonomy threshold
        if self.trust_scores[category].count >= 20 and \
           self.trust_scores[category].accuracy >= 0.90:
            self.grant_autonomy(category)
    
    def grant_autonomy(self, category):
        """
        Enable autonomous action for trusted category.
        Uses Wilson confidence interval to ensure statistical significance.
        """
        trust_score = self.trust_scores[category]
        
        # Require both sufficient samples AND high accuracy with tight confidence
        sufficient_data = trust_score.count >= 20
        high_accuracy = trust_score.accuracy >= 0.90
        
        # Wilson 95% confidence interval for accuracy
        ci_lower, ci_upper = trust_score.confidence_interval
        
        # Require lower bound of CI to be above threshold (conservative)
        confident = ci_lower >= 0.85
        
        if not (sufficient_data and high_accuracy and confident):
            return  # Don't grant autonomy yet
        
        # Grant category-specific autonomy levels
        autonomy_levels = {
            'email_urgency_scoring': {
                'level': 'AUTONOMOUS',
                'actions': ['auto_label', 'auto_prioritize'],
                'require_approval': False,
                'confidence_threshold': 0.90  # High stakes need high confidence
            },
            'meeting_scheduling': {
                'level': 'SEMI_AUTONOMOUS',
                'actions': ['suggest_times', 'auto_send_invite'],
                'require_approval': True,  # still needs approval before sending
                'confidence_threshold': 0.85  # Medium stakes
            },
            'response_sending': {
                'level': 'SUGGEST_ONLY',  # High stakes, stay very cautious
                'actions': ['generate_draft'],
                'require_approval': True,
                'confidence_threshold': 0.95  # Highest threshold for sending on user's behalf
            },
            'email_archiving': {
                'level': 'AUTONOMOUS',
                'actions': ['auto_archive_handled', 'auto_delete_spam'],
                'require_approval': False,
                'confidence_threshold': 0.88  # Reversible action, medium confidence ok
            }
        }
        
        config = autonomy_levels[category]
        
        # Double-check accuracy meets category-specific threshold
        if trust_score.accuracy < config['confidence_threshold']:
            return  # Need higher accuracy for this category
        
        self.autonomy[category] = config
        
        # Notify user with calibration details
        send_notification(
            f"I've earned your trust in {category} "
            f"({trust_score.accuracy:.1%} accuracy over {trust_score.count} decisions, "
            f"95% CI: [{ci_lower:.1%}, {ci_upper:.1%}]). "
            f"I'll now {config['actions'][0]} automatically."
        )

class TrustScore:
    """
    Rolling accuracy tracker with confidence intervals.
    """
    
    def __init__(self, initial=0.0, window=50):
        self.observations = []
        self.window = window
    
    def add_observation(self, agreement: bool):
        self.observations.append(1 if agreement else 0)
        
        # Keep only recent observations
        if len(self.observations) > self.window:
            self.observations.pop(0)
    
    @property
    def accuracy(self):
        if not self.observations:
            return 0.0
        return sum(self.observations) / len(self.observations)
    
    @property
    def count(self):
        return len(self.observations)
    
    @property
    def confidence_interval(self):
        """
        Wilson score interval for binomial proportion.
        """
        if self.count < 5:
            return (0.0, 1.0)  # Too few samples
        
        z = 1.96  # 95% confidence
        p = self.accuracy
        n = self.count
        
        denominator = 1 + z**2 / n
        center = (p + z**2 / (2*n)) / denominator
        margin = z * sqrt(p * (1-p) / n + z**2 / (4*n**2)) / denominator
        
        return (center - margin, center + margin)

3.4.2 Reversible Autonomy

Trust can be lost. Continuous monitoring detects degradation:

def monitor_autonomous_actions(category, recent_actions):
    """
    Check if autonomous actions maintain quality.
    """
    if len(recent_actions) < 10:
        return  # Need enough data
    
    # Check recent accuracy
    recent_accuracy = sum(a.user_approved for a in recent_actions[-20:]) / 20
    
    # Check if accuracy dropped below threshold
    if recent_accuracy < 0.85:
        # Downgrade autonomy
        downgrade_trust(category, reason='accuracy_degraded')
        
        send_notification(
            f"I've made some mistakes with {category} recently. "
            f"I'll go back to suggesting rather than acting automatically "
            f"until I earn your trust again."
        )

def downgrade_trust(category, reason):
    """
    Reduce autonomy level for category.
    """
    current_level = autonomy[category]['level']
    
    if current_level == 'AUTONOMOUS':
        autonomy[category]['level'] = 'SEMI_AUTONOMOUS'
        autonomy[category]['require_approval'] = True
    
    elif current_level == 'SEMI_AUTONOMOUS':
        autonomy[category]['level'] = 'SUGGEST_ONLY'
    
    # Log for analysis
    log_event('trust_downgrade', category, reason, current_level)

3.5 Implementation Architecture

3.5.1 System Components

┌─────────────────────────────────────────────────────────────┐
│                     Email Ingestion Layer                      │
│  (Gmail API, Webhook, Pub/Sub) → Inngest Job Queue           │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                   Real-Time Processing                        │
│  • Base urgency scoring (Claude Sonnet 4.5)                  │
│  • VIP detection                                              │
│  • Situation-aware adjustment                                 │
│  • Trust-level action execution                               │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                   Behavioral Tracking                         │
│  • PostgreSQL: email_interactions table                       │
│  • Metrics: response_time, completion, attention_duration     │
│  • Real-time: Redis counters for velocity                     │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│              Nightly Drift Analysis (Inngest Cron)            │
│                                                               │
│  1. Compute δ_velocity, δ_attention, δ_completion, δ_interrupt│
│  2. Calculate composite drift score                           │
│  3. If δ > threshold → generate recalibration prompt          │
│  4. Update drift model weights                                │
│  5. Detect/update active situations                           │
│  6. Update trust scores per category                          │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                  User Interaction Layer                       │
│  • SMS/Slack: Recalibration prompts                          │
│  • Dashboard: Drift visualization, situation timeline         │
│  • API: Goal updates, manual situation triggers               │
└─────────────────────────────────────────────────────────────┘

3.5.2 Data Schema (PostgreSQL)

-- User Goals (Stated Preferences - G_t)
CREATE TABLE user_goals (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(id),
  
  goal_type TEXT,  -- 'focus_area', 'ignore_pattern', 'response_target'
  domain TEXT,     -- 'recruiting', 'product', 'finance', etc.
  priority INTEGER,  -- 1-10
  target_response_hours DECIMAL,
  
  created_at TIMESTAMP,
  updated_at TIMESTAMP,
  
  -- For drift comparison
  stated_at TIMESTAMP,  -- when user declared this
  last_reaffirmed TIMESTAMP  -- when user last confirmed this
);

-- Behavioral Tracking (Revealed Preferences - B_t)
CREATE TABLE email_interactions (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(id),
  email_id UUID REFERENCES emails(id),
  
  -- Interaction signals
  opened_at TIMESTAMP,
  response_time_seconds INTEGER,  -- time to respond
  marked_handled_at TIMESTAMP,
  archived_at TIMESTAMP,
  attention_duration_seconds INTEGER,  -- time spent reading/acting
  
  -- Contextual features
  email_domain TEXT,
  email_urgency_score INTEGER,
  email_category TEXT,
  
  -- Derived features (computed nightly)
  response_velocity TEXT,  -- 'instant', 'same_day', 'next_day', 'week+', 'never'
  
  created_at TIMESTAMP
);

-- Drift Tracking
CREATE TABLE drift_events (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(id),
  
  computed_at TIMESTAMP,
  
  -- Component drift scores
  velocity_drift DECIMAL,
  attention_drift DECIMAL,
  completion_drift DECIMAL,
  interruption_drift DECIMAL,
  
  -- Composite
  total_drift_score DECIMAL,
  threshold_at_time DECIMAL,
  triggered_recalibration BOOLEAN,
  
  -- Metadata
  dominant_drift_component TEXT,
  affected_domains TEXT[],
  
  created_at TIMESTAMP
);

-- Recalibration Events
CREATE TABLE recalibration_events (
  id UUID PRIMARY KEY,
  drift_event_id UUID REFERENCES drift_events(id),
  user_id UUID REFERENCES users(id),
  
  -- Prompt sent to user
  recalibration_prompt TEXT,
  prompt_sent_at TIMESTAMP,
  
  -- User response
  user_response TEXT,  -- 'priorities_changed', 'enforce_goals', 'no_change', 'dismissed'
  responded_at TIMESTAMP,
  
  -- Outcome
  goals_updated BOOLEAN,
  enforcement_enabled BOOLEAN,
  
  -- For model learning
  drift_score_at_prompt DECIMAL,
  threshold_at_prompt DECIMAL,
  successful_prompt BOOLEAN,  -- did user engage meaningfully?
  
  created_at TIMESTAMP
);

-- Situations (Temporal Contexts)
CREATE TABLE situations (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(id),
  
  situation_type TEXT,  -- 'hiring_sprint', 'tax_season', 'board_prep', etc.
  
  -- Detection metadata
  detected_at TIMESTAMP,
  detection_confidence DECIMAL,
  detection_signals JSONB,  -- evidence used for detection
  
  -- Lifecycle
  status TEXT,  -- 'active', 'decaying', 'inactive'
  started_at TIMESTAMP,
  expected_end_at TIMESTAMP,
  actual_ended_at TIMESTAMP,
  
  -- For urgency adjustment
  urgency_boost DECIMAL,
  relevant_domains TEXT[],
  
  -- For deadline-driven situations
  deadline TIMESTAMP,
  urgency_curve TEXT,  -- 'constant', 'linear', 'exponential'
  
  created_at TIMESTAMP
);

-- Situation Preferences (user can customize)
CREATE TABLE situation_preferences (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(id),
  situation_type TEXT,
  
  enabled BOOLEAN DEFAULT TRUE,
  urgency_boost_override DECIMAL,  -- user can adjust default boost
  notify_on_detection BOOLEAN DEFAULT TRUE,
  
  created_at TIMESTAMP,
  
  UNIQUE(user_id, situation_type)
);

-- Trust Scores
CREATE TABLE trust_mode_events (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(id),
  
  category TEXT,  -- 'email_urgency_scoring', 'draft_generation', etc.
  
  ai_suggestion JSONB,
  user_action JSONB,
  agreement BOOLEAN,
  
  -- Trust progression
  trust_score_before DECIMAL,
  trust_score_after DECIMAL,
  autonomy_level TEXT,  -- 'SUGGEST_ONLY', 'SEMI_AUTONOMOUS', 'AUTONOMOUS'
  
  -- If autonomy granted/revoked
  autonomy_changed BOOLEAN,
  
  created_at TIMESTAMP
);

-- Indexes for performance
CREATE INDEX idx_interactions_user_time ON email_interactions(user_id, created_at DESC);
CREATE INDEX idx_drift_events_user ON drift_events(user_id, computed_at DESC);
CREATE INDEX idx_situations_active ON situations(user_id, status) WHERE status = 'active';
CREATE INDEX idx_trust_category ON trust_mode_events(user_id, category);

3.5.3 Nightly Drift Analysis Job (Inngest)

// Scheduled job that runs at 2am daily
export const nightlyDriftAnalysis = inngest.createFunction(
  { id: "nightly-drift-analysis" },
  { cron: "0 2 * * *" },  // 2am daily
  async ({ step }) => {
    
    // Process all active users
    const activeUsers = await step.run("fetch-active-users", async () => {
      return db.users.findMany({
        where: { status: 'active', goals_defined: true }
      });
    });
    
    for (const user of activeUsers) {
      await step.run(`analyze-drift-${user.id}`, async () => {
        
        // 1. Gather behavioral data from past 14 days
        const behaviorData = await gatherBehaviorData(user.id, days=14);
        
        // 2. Load stated goals
        const statedGoals = await db.user_goals.findMany({
          where: { user_id: user.id, active: true }
        });
        
        // 3. Compute drift metrics
        const driftMetrics = computeDriftScore(statedGoals, behaviorData);
        
        // 4. Store drift event
        const driftEvent = await db.drift_events.create({
          data: {
            user_id: user.id,
            velocity_drift: driftMetrics.velocity,
            attention_drift: driftMetrics.attention,
            completion_drift: driftMetrics.completion,
            interruption_drift: driftMetrics.interruption,
            total_drift_score: driftMetrics.total,
            threshold_at_time: user.recalibration_threshold,
            triggered_recalibration: false,
            dominant_drift_component: driftMetrics.dominant,
            affected_domains: driftMetrics.affectedDomains
          }
        });
        
        // 5. Check if recalibration needed
        if (shouldTriggerRecalibration(driftMetrics.total, user)) {
          const prompt = generateRecalibrationPrompt(driftMetrics, statedGoals, behaviorData);
          
          // Send via user's preferred channel (SMS/Slack)
          await sendRecalibrationPrompt(user, prompt);
          
          await db.drift_events.update({
            where: { id: driftEvent.id },
            data: { triggered_recalibration: true }
          });
        }
        
        // 6. Update situation detection
        const situations = detectSituations(user.id);
        await updateActiveSituations(user.id, situations);
        
        // 7. Update trust scores
        await updateTrustScores(user.id);
        
        // 8. Meta-learning: Update drift model weights
        await updateDriftModelWeights(user.id);
      });
    }
  }
);

4. Theoretical Foundations

4.1 The Temporal Preference Alignment Problem

We formalize the problem as follows:

Definition 4.1 (Stated Preferences): At time t₀, user declares preference function G: X → ℝ mapping email features X to priority scores.

Definition 4.2 (Revealed Preferences): Behavioral function B_t: X → ℝ inferred from user interactions at time t.

Definition 4.3 (Temporal Alignment): Preferences are aligned at time t if ||G(x) - B_t(x)|| < ε for all x ∈ X and small ε.

Observation 4.1 (Drift Inevitability in Non-Stationary Contexts): Under non-stationary user contexts, without periodic recalibration, the expected divergence between stated preferences G and behavioral preferences B_t grows without bound: lim{t→∞} E[||G - B_t||] → ∞.

Intuition: User contexts evolve continuously (job changes, life events, strategic shifts) while stated preferences G are recorded at discrete time points t₀, t₁, ... If behavioral preferences B_t adapt to context but G remains fixed between recalibrations, the gap accumulates. A formal proof would require specifying (1) a stochastic process for context evolution C_t, (2) a mapping from contexts to preferences B_t = f(C_t), (3) sampling intervals for G updates, and (4) appropriate distance metrics with bounded norms. We leave this formalization to future work.

Corollary 4.1: Periodic recalibration is necessary to maintain bounded alignment in non-stationary environments.

4.2 Optimal Recalibration Frequency

There's a tradeoff: recalibrate too often (user annoyance), too rarely (accumulated misalignment).

We model recalibration cost as:

C(δ, λ) = α · δ² + β · λ

Where:

  • δ: drift magnitude (misalignment cost)
  • λ: recalibration frequency (interruption cost)
  • α, β: user-specific weights

The optimal recalibration frequency minimizes expected cost:

λ* = argminλ E[α · δ(λ)² + β · λ]

This explains why our threshold is learned per-user: different users have different tolerance for interruptions (β) vs tolerance for misalignment (α).

4.3 Multi-Armed Bandit Formulation

Situation detection can be viewed as a contextual bandit problem [Li et al., 2010]:

  • Arms: Different urgency boost values [0, 1, 2, 3, 4, 5]
  • Context: Email features + detected situation type
  • Reward: User satisfaction signal, measured as weighted combination of:
    • Response appropriateness (0.4 weight): Did user respond within expected timeframe for the assigned urgency? reward = 1 if response_time <= expected_time(urgency), else 0
    • Override rate (0.3 weight): Did user manually change the urgency score? reward = 1 if no_override, else 0
    • Completion rate (0.2 weight): Did user mark as handled within 48h? reward = 1 if completed, else 0
    • Explicit feedback (0.1 weight): Did user thumbs-up/down the urgency? reward = +1 / -1 if feedback given, else 0

Composite reward function:

def compute_reward(email_interaction, urgency_assigned, situation):
    """
    Compute reward for assigned urgency in given situation context.
    """
    r_response = 1.0 if email_interaction.response_time <= expected_response_time(urgency_assigned) else 0.0
    r_override = 1.0 if not email_interaction.urgency_manually_changed else 0.0
    r_completion = 1.0 if email_interaction.completed_within_48h else 0.0
    r_explicit = email_interaction.explicit_feedback_score  # +1, 0, or -1
    
    reward = 0.4*r_response + 0.3*r_override + 0.2*r_completion + 0.1*r_explicit
    
    # Clip to [-1, 1]
    return np.clip(reward, -1.0, 1.0)

Guardrails:

  • Cap maximum boost at +4 to prevent over-urgency
  • Require minimum 10 observations before allowing boosts >2
  • Force exploration: 10% epsilon-greedy for rare situation types

We use Thompson Sampling [Russo et al., 2018] to balance exploration (trying new boosts) and exploitation (using known-good boosts):

def select_urgency_boost(email, situation):
    """
    Thompson sampling for situation-aware urgency.
    Each arm (boost level) has a Beta posterior from historical rewards.
    """
    # For each possible boost level
    sampled_reward = {}
    for boost in [0, 1, 2, 3, 4, 5]:
        # Get historical reward statistics for this boost in this situation type
        successes = situation.boost_successes[boost]  # rewards > 0.5
        failures = situation.boost_failures[boost]    # rewards <= 0.5
        
        # Sample from posterior distribution (Beta for binary rewards)
        posterior = beta_distribution(
            alpha=successes + 1,  # +1 for uninformative prior
            beta=failures + 1
        )
        sampled_reward[boost] = posterior.sample()
    
    # Select boost with highest sampled reward
    selected_boost = argmax(sampled_reward)
    
    # Guardrails
    if situation.total_observations < 10 and selected_boost > 2:
        selected_boost = 2  # Conservative during learning
    
    return min(selected_boost, 4)  # Cap at +4

5. Experimental Considerations

While this work establishes technical prior art, we describe experimental validation that would demonstrate efficacy:

5.1 Drift Detection Accuracy

Metric: Precision and recall of drift detection prompts.

  • Precision: % of recalibration prompts where user confirms priorities changed
  • Recall: % of actual priority changes that were detected

Target: Precision > 75%, Recall > 80%

5.2 Time-to-Alignment

Metric: Days until drift score returns to baseline after recalibration.

Target: < 3 days (rapid re-alignment)

5.3 Situation Detection Latency

Metric: Days between situation start and automatic detection.

Target: < 5 days for high-signal situations (hiring sprints)

5.4 User Engagement

Metric: % of recalibration prompts that receive meaningful responses (not dismissed).

Target: > 70% engagement

5.5 Trust Calibration

Metric: Category-specific accuracy at autonomy grant threshold.

Target: > 90% accuracy when autonomy granted, < 85% triggers downgrade

6. Discussion

6.1 Key Innovations

To our knowledge, this work presents a practical production design that combines several techniques in a novel configuration for attention management systems:

1. Behavioral Drift Detection: A system that continuously monitors divergence between stated and revealed preferences using multi-dimensional drift metrics (velocity, attention allocation, completion rates, interruption tolerance) with learned per-user thresholds. While preference learning and behavioral tracking exist independently, we are not aware of prior systems that explicitly compute and surface this divergence with conversational intervention. This relates to concept drift detection in machine learning [Gama et al., 2014], but applies it to the human preference domain rather than data distribution shifts.

2. Conversational Recalibration: Rather than silently learning from behavior or rigidly following stated rules, we explicitly surface drift and ask users to resolve ambiguity through conversation. This respects user agency while maintaining alignment.

3. Automatic Situation Detection: An implementation of unsupervised temporal context detection using multi-signal analysis (email patterns, calendar data, keywords) with dynamic urgency recalibration that automatically begins and ends without manual triggers.

Prior Art Welcome: If you know of prior work in this space that we've missed, please reach out at chance@getprecedent.ai. We want this document to accurately represent the state of the art—both for intellectual honesty and to establish exactly what's new in our specific approach.

6.2 Limitations

Cold Start: We can't detect drift for new users until we have 2-4 weeks of behavioral data. During this period, the system relies solely on stated preferences without drift monitoring. This is a fundamental tradeoff—accurate drift detection requires history, and we prioritize precision over speed. Users in their first month receive standard urgency scoring without personalized drift detection.

Domain Specificity: Email domain. The techniques are generalizable but implementation details are email-specific.

Privacy: Requires access to email content and metadata. Implementation uses a two-tier encryption and retention model to balance functionality with data minimization.

Two-Tier Content Storage Architecture:

We store email content in two encrypted forms with different retention periods:

1. Full content (encrypted, 72-hour retention):

  • Purpose: High-fidelity reply drafting and immediate action items
  • Contains: Complete subject and body, encrypted per-tenant
  • Retention: 72 hours, then permanently deleted
  • Access: Decryption logged via auditDecryption() with reason and caller

2. Redacted content (encrypted, 21-day retention):

  • Purpose: Reminders, follow-ups, search, drift detection
  • Contains: Subject and body with sensitive content masked (see redaction rules below)
  • Retention: 21 days, then permanently deleted
  • Access: Decryption logged via auditDecryption() with reason and caller

Redaction Rules (applied before encryption for long-term storage):

  • Dollar amounts / deal values → [AMOUNT]
  • M&A language ("acquire", "term sheet", "funding round") → [DEAL]
  • Email addresses → [CONTACT]
  • Phone numbers → [PHONE]
  • Health disclosures ("diagnosed with", medical terms) → [HEALTH]
  • SSNs, bank account numbers → [PII]

Encryption Model:

  • Per-tenant encryption keys (not environment-wide)
  • Keys managed separately from data (KMS-backed in production)
  • All email content stored as encrypted blobs only—no plaintext in database columns
  • Two encrypted fields per message: body_encrypted (full, 72h) + body_redacted_encrypted (redacted, 21d)

Fields Retained (with encryption and hashing):

Email metadata:

  • sender_domain_hash: SHA-256 hash - non-reversible
  • sender_email_hash: SHA-256 hash - non-reversible
  • thread_id_hash: SHA-256 hash - non-reversible
  • received_timestamp: Datetime
  • urgency_score: Derived integer [1-10]
  • category: Derived category string
  • expires_full_at: 72-hour TTL for full content
  • expires_redacted_at: 21-day TTL for redacted content

Interaction data retained indefinitely (for drift detection):

  • interaction_timestamp: When user opened/responded/handled
  • response_time_seconds: Time to respond (not content)
  • attention_duration_seconds: Time spent (not content)
  • action_taken: Enum of {opened, replied, archived, marked_handled, deleted}
  • category: Email category (for drift computation)

Aggregated features:

  • domain_response_velocity: Median response time per domain (daily rollup)
  • category_attention_allocation: Time distribution across categories (daily)
  • drift_scores: Weekly drift metric snapshots
  • vip_sender_scores: Frequency-based VIP rankings

Data Retention Policy:

  • Full encrypted content: 72 hours, then column NULLed
  • Redacted encrypted content: 21 days, then row deleted
  • Interaction metadata: Retained for drift analysis (no email content)
  • Aggregated features: Retained indefinitely (cannot reconstruct emails)
  • User deletion: All data purged within 24 hours of account closure

Security Controls:

  • Per-tenant encryption for all email content
  • At-rest encryption (AES-256) for database
  • Field-level hashing for PII (sender emails, thread IDs) - non-reversible
  • Row-level security (RLS) ensures users only access own data
  • Audit logging for every decryption operation (auditDecryption with timestamp, reason, caller)
  • No raw email content in database—encrypted blobs only
  • Human access forbidden except via explicit "break glass" flow (logged and ticketed)

Computational Cost: Nightly drift analysis at scale requires significant compute. We batch process and use cached embeddings to manage costs, but this creates a tradeoff: drift detection runs daily rather than real-time. We chose accuracy and cost-efficiency over instant detection—for most users, detecting drift within 24 hours is sufficient.

6.3 Future Directions

Multi-Modal Drift: Extend to calendar, Slack, documents—detect when stated project priorities diverge from actual time allocation.

Causal Drift Attribution: Use causal inference to determine why drift occurred (external factors vs internal preference change).

Federated Learning: Enable drift detection without centralized behavioral data through federated drift metrics.

Explainable Recalibration: Improve interpretability of drift signals to help users understand why the system detected drift.

7. Conclusion

We have presented a comprehensive system for behavioral drift detection and conversational recalibration in AI-powered attention management. By continuously monitoring the divergence between stated goals and revealed behavior, automatically detecting temporal contexts, and implementing category-specific progressive trust, our system maintains long-term alignment between user intent and system behavior.

The technical details provided—including algorithms, data schemas, implementation architecture, and theoretical foundations—establish clear prior art for these innovations. We believe these techniques represent meaningful advances in human-AI alignment for personal productivity systems and hope this publication spurs further research in temporal preference learning and proactive recalibration mechanisms.

References

Christiano, P. F., et al. (2017). Deep reinforcement learning from human preferences. NeurIPS.

Gama, J., et al. (2014). A Survey on Concept Drift Adaptation. ACM Computing Surveys.

Goodrich, M. A., & Schultz, A. C. (2007). Human-robot interaction: a survey. Foundations and Trends in Human–Computer Interaction.

Killick, R., et al. (2012). PELT: Optimal Detection of Changepoints with Linear Computational Cost. JASA.

Li, L., et al. (2010). A contextual-bandit approach to personalized news article recommendation. WWW.

Rubner, Y., et al. (2000). The earth mover's distance as a metric for image retrieval. IJCV.

Russo, B., et al. (2018). A Tutorial on Thompson Sampling. Foundations & Trends in ML.

Sadigh, D., et al. (2017). Active preference-based learning of reward functions. RSS.

Scerri, P., et al. (2002). Designing agents for systems with adjustable autonomy. IJCAI.

Appendix A: Pseudocode for Core Algorithms

A.1 Complete Drift Detection Algorithm

def nightly_drift_detection(user_id: str) -> Optional[RecalibrationPrompt]:
    """
    Full drift detection pipeline.
    """
    # 1. Gather data
    stated_goals = load_user_goals(user_id)
    behavior_data = load_interactions(user_id, window_days=14)
    user_history = load_drift_history(user_id)
    
    if len(behavior_data) < 50:  # Insufficient data
        return None
    
    # 2. Compute component drifts
    drift_components = {
        'velocity': compute_velocity_drift(
            stated_goals.response_targets,
            behavior_data.response_times
        ),
        'attention': compute_attention_drift(
            stated_goals.focus_areas,
            behavior_data.time_allocation
        ),
        'completion': compute_completion_drift(
            stated_goals.priorities,
            behavior_data.completion_rates
        ),
        'interruption': compute_interruption_drift(
            stated_goals.notification_settings,
            behavior_data.interrupt_responses
        )
    }
    
    # 3. Compute weighted composite score
    weights = user_history.learned_weights  # personalized per user
    drift_score = sum(weights[k] * drift_components[k] 
                     for k in drift_components.keys())
    
    # 4. Normalize using historical distribution
    drift_percentile = compute_percentile(
        drift_score, 
        user_history.drift_distribution
    )
    
    # 5. Store drift event
    store_drift_event(
        user_id=user_id,
        components=drift_components,
        total_score=drift_score,
        percentile=drift_percentile,
        threshold=user_history.recalibration_threshold
    )
    
    # 6. Check recalibration threshold
    if drift_percentile < user_history.recalibration_threshold:
        return None  # No recalibration needed
    
    # 7. Check recency (don't spam)
    days_since_last = (now() - user_history.last_recalibration).days
    if days_since_last < 7:
        return None  # Too soon
    
    # 8. Generate recalibration prompt
    dominant_component = max(drift_components.items(), key=lambda x: x[1])
    
    prompt = generate_contextual_prompt(
        drift_type=dominant_component[0],
        magnitude=dominant_component[1],
        stated_goals=stated_goals,
        behavior_data=behavior_data
    )
    
    # 9. Update history
    user_history.last_recalibration = now()
    user_history.save()
    
    return RecalibrationPrompt(
        user_id=user_id,
        prompt_text=prompt,
        drift_score=drift_score,
        dominant_component=dominant_component[0]
    )

A.2 Situation Detection Algorithm

def detect_situations(user_id: str, lookback_days: int = 14) -> List[Situation]:
    """
    Detect active situations using multi-signal analysis.
    """
    signals = gather_signals(user_id, lookback_days)
    situations = []
    
    # Pattern matchers for different situation types
    matchers = [
        HiringSprintMatcher(),
        TaxSeasonMatcher(),
        BoardPrepMatcher(),
        ProductLaunchMatcher(),
        ContractNegotiationMatcher()
    ]
    
    for matcher in matchers:
        if match := matcher.detect(signals):
            situations.append(Situation(
                type=matcher.situation_type,
                confidence=match.confidence,
                evidence=match.evidence,
                started_at=estimate_start_date(signals, match),
                expected_duration=matcher.typical_duration,
                urgency_boost=matcher.default_boost,
                relevant_domains=matcher.relevant_domains
            ))
    
    return situations

class HiringSprintMatcher:
    situation_type = 'hiring_sprint'
    typical_duration = 30  # days
    default_boost = 3.0
    relevant_domains = ['recruiting', 'talent', 'hr']
    
    def detect(self, signals: SignalBundle) -> Optional[Match]:
        # Email volume spike
        recruiting_emails = signals.email_counts['recruiting']
        baseline = signals.baseline_counts['recruiting']
        
        if recruiting_emails < baseline * 2:
            return None  # No significant spike
        
        # Calendar confirmation
        interview_count = len([e for e in signals.calendar_events 
                               if 'interview' in e.title.lower()])
        
        if interview_count < 3:
            return None  # Not enough interviews scheduled
        
        # Keyword frequency
        keywords = ['candidate', 'hire', 'talent', 'recruiting', 'interview']
        keyword_mentions = sum(signals.keyword_counts[kw] for kw in keywords)
        keyword_baseline = sum(signals.baseline_keywords[kw] for kw in keywords)
        
        if keyword_mentions < keyword_baseline * 1.5:
            return None  # Not enough keyword signal
        
        # Compute confidence based on signal strength
        confidence = min(1.0, (
            0.4 * (recruiting_emails / baseline / 2) +  # email volume weight
            0.3 * (interview_count / 5) +               # calendar weight
            0.3 * (keyword_mentions / keyword_baseline / 1.5)  # keyword weight
        ))
        
        return Match(
            confidence=confidence,
            evidence={
                'email_volume_ratio': recruiting_emails / baseline,
                'interview_count': interview_count,
                'keyword_frequency_ratio': keyword_mentions / keyword_baseline
            }
        )

Appendix B: Deployment Considerations

B.1 Scalability

Single-User Performance:

  • Drift analysis: ~2-5 seconds/user (PostgreSQL + Python)
  • Can process 10,000 users in ~10 hours (nightly batch)

Optimization Strategies:

  1. Parallel processing (10 workers → 1 hour for 10k users)
  2. Incremental computation (cache intermediate drift metrics)
  3. Sampling for very active users (analyze subset of emails)

B.2 Privacy & Data Retention

Data Minimization:

  • Two-tier encrypted storage: full content (72h) + redacted content (21d)
  • Per-tenant encryption keys, never store plaintext in database
  • Redaction of sensitive content (dollar amounts, PII, health info, M&A terms) before long-term storage
  • Interaction metadata retained for drift detection (no email content)
  • Aggregated drift metrics retained indefinitely

User Control:

  • One-click export of all behavioral data
  • Immediate deletion of all data on account closure
  • Granular controls over which data feeds drift detection
  • Audit log of all decryption operations available to user

Document Version: 1.0
Publication Date: October 29, 2025
License: CC BY 4.0 (Creative Commons Attribution)
Permanent Identifier: https://getprecedent.ai/research/drift-detection-2025

This document establishes prior art for the described techniques and may be cited as:

Kelch, C. (2025). Behavioral Drift Detection and Conversational Recalibration in Personalized Attention Management Systems. Technical Report, Precedent AI.