ElasticSearch Scoring Function Recommendations
Date: 2026-01-18 Focus: Improving listing ranking based on engagement data analysis
Executive Summary
Analysis of the current ElasticSearch scoring function reveals significant opportunities for improvement. The current system uses static signals only (verified, video, location) and misses critical engagement signals that strongly predict listing quality:
- Favorite rate has 0.509 correlation with CVR (strong!)
- From Owner listings have 3x higher CVR
- Freshness boost should be stepped, not continuous
Current Scoring Function Analysis
Architecture
The current system rotates through 4 scoring models every hour:
Hour 1, 5, 9, 13, 17, 21 → VERIFIED modelHour 2, 6, 10, 14, 18, 22 → REFRESH modelHour 3, 7, 11, 15, 19, 23 → OWNERS modelHour 4, 8, 12, 16, 20 → VERIFIED_OFFICES modelCurrent Scoring Formula
score = (verified * 5-25) + (pr / factor) + (has_video * 5) + (user_type * 0-25) + (user_has_sub * 0 or -20) + (user_fee * 0-20) + (project_id > 0 ? 1000 : 0) + // Mobile only (accurate_location ? 0 : -100) + (create_time_hour / factor) + (boosted ? 50-100 : 0) + (ldu / factor) + (cdt / factor) + (is_precise_location ? 100 : 0)Current Model Weights
| Factor | VERIFIED | REFRESH | OWNERS | VERIFIED_OFFICES |
|---|---|---|---|---|
| verified | 25 | 5 | 15 | 5 |
| pr | 10 | 25 | 5 | 25 |
| video | 5 | 5 | 5 | 5 |
| user_type | 15 | 15 | 0 | 25 |
| cdt | 75 | 125 | 75 | 50 |
| user_has_sub | 0 | 0 | —20 | 0 |
| boosted | 100 | 100 | 50 | 100 |
Gaps Identified by Data Analysis
1. No Engagement Signals
| Signal | Current Usage | Data Finding | Recommendation |
|---|---|---|---|
| CTR (impression→view) | ❌ Not used | Decays 4.6x with age | Add as ranking signal |
| CVR (view→contact) | ❌ Not used | Stable ~5-6%, varies by quality | Primary quality signal |
| Favorite rate | ❌ Not used | 0.509 correlation with CVR | High-weight signal |
| Session rate | ❌ Not used | 2.5x higher for high-CVR listings | Medium-weight signal |
| Share rate | ❌ Not used | 3.7x higher for high-CVR listings | Low-weight signal |
2. Indirect Owner Boost
Current approach: OWNERS model penalizes subscribers (user_has_sub: -20)
Problem: This is indirect and only active 25% of the time.
Data finding: Owner listings have 3x higher CVR (13.95% vs 4.8%)
Recommendation: Explicit from_owner boost in all models.
3. Continuous vs. Stepped Freshness
Current approach: cdt / (todayNumber / weight) - continuous decay
Data finding: New listings show distinct performance tiers:
| Days | Avg VPD | Boost Factor |
|---|---|---|
| 0-1 | 33 | 3.5x baseline |
| 2-3 | 23.6 | 2.5x baseline |
| 4-7 | 12.5 | 1.3x baseline |
| 8+ | 9.4 | 1.0x baseline |
Recommendation: Stepped freshness boost for first 7 days.
4. Video Weight Too Low
Current: 5 points
Data finding: Video listings get +49% more views
Recommendation: Increase to 15-20 points.
Recommended Scoring Formula V2
New Fields Required in ElasticSearch
// Add to listing document (computed daily from stats){ "from_owner": true, // Boolean - is advertiser_type = 'owner' "ctr_7d": 0.25, // Float 0-1 - CTR over last 7 days "cvr_7d": 0.06, // Float 0-1 - CVR over last 7 days "favorite_rate_7d": 0.08, // Float 0-1 - Favorites/views last 7 days "session_rate_7d": 0.22, // Float 0-1 - 30-sec sessions/views "days_since_created": 15 // Integer - for stepped freshness}New Weights Model
const SCORES_MODELS_V2 = { ENGAGEMENT_BASED: { // === Static Signals (existing) === verified: 20, pr: 10, video: 15, // Increased from 5 is_precise_location: 100, not_accurate_location: -100, boosted: 100,
// === Freshness (new stepped approach) === freshness_0_3_days: 150, // NEW freshness_4_7_days: 75, // NEW cdt_continuous: 25, // Reduced from 75-125
// === User Signals === from_owner: 50, // NEW - explicit owner boost user_type: 10, user_has_sub: 0, user_has_fees: 10,
// === Engagement Signals (NEW) === ctr_7d: 30, // Normalized 0-1 cvr_7d: 50, // Most important! favorite_rate_7d: 40, // Strong quality signal session_rate_7d: 20, // Engagement depth },};New Script Score
{ query: oldQuery, functions: [ { script_score: { script: { source: ` double score = 0;
// === STATIC SIGNALS === score += doc['verified'].value * ${weights.verified}; score += doc['pr'].value / ${10 / weights.pr}; score += doc['has_video'].value * ${weights.video}; score += doc['is_precise_location'].value == true ? 100 : 0; score += doc['accurate_location'].value > 0 ? 0 : ${weights.not_accurate_location}; score += doc['boosted'].value == 1 ? ${weights.boosted} : 0;
// === STEPPED FRESHNESS BOOST === long daysSinceCreated = doc['days_since_created'].value; if (daysSinceCreated <= 3) { score += ${weights.freshness_0_3_days}; } else if (daysSinceCreated <= 7) { score += ${weights.freshness_4_7_days}; } // Continuous decay for older listings score += doc['cdt'].value / ${todayNumber / weights.cdt_continuous};
// === USER SIGNALS === score += doc['from_owner'].value == true ? ${weights.from_owner} : 0; score += doc['user.type'].value * ${weights.user_type}; score += doc['user.paid'].value > 0 ? ${weights.user_has_sub} : 0; score += doc['user.fee'].value * ${weights.user_has_fees};
// === ENGAGEMENT SIGNALS === // Only apply if listing has sufficient data (>7 days old, >100 impressions) if (daysSinceCreated > 7 && doc['impressions_total'].value > 100) { score += doc['ctr_7d'].value * ${weights.ctr_7d}; score += doc['cvr_7d'].value * ${weights.cvr_7d}; score += doc['favorite_rate_7d'].value * ${weights.favorite_rate_7d}; score += doc['session_rate_7d'].value * ${weights.session_rate_7d}; }
// === PROJECT BOOST (mobile only) === score += doc['project_id'].value > 0 ? ${from_web ? 0 : 1000} : 0;
return Math.round(score); ` } }, weight: 1 } ], score_mode: 'sum'}Implementation Plan
Phase 1: Quick Wins (No New Fields)
Timeline: Can deploy immediately
| Change | Current | New | Impact |
|---|---|---|---|
| Video weight | 5 | 15 | +49% view correlation |
| Add stepped freshness | Continuous only | Add tier bonuses | Better new listing visibility |
// Add to existing script (no new fields needed)long daysOld = (${todayNumber} - doc['cdt'].value);double freshnessBoost = daysOld <= 3 ? 150 : daysOld <= 7 ? 75 : 0;score += freshnessBoost;Phase 2: Add Owner Signal
Timeline: 1-2 days
- Add
from_ownerboolean to ES mapping - Populate from
advertiser_type = 'owner'in listings - Add to scoring:
score += doc['from_owner'].value ? 50 : 0
Phase 3: Add Engagement Signals
Timeline: 1-2 weeks
-
Create daily aggregation job:
SELECT id, views_7d / NULLIF(impressions_7d, 0) AS ctr_7d, contacts_7d / NULLIF(views_7d, 0) AS cvr_7d, favorites_7d / NULLIF(views_7d, 0) AS favorite_rate_7d, sessions_7d / NULLIF(views_7d, 0) AS session_rate_7d FROM listing_stats_7d
2. Normalize values to 0-1 range (use percentiles within category/district)
3. Update ES mapping and indexing pipeline
4. Add engagement signals to scoring formula
### Phase 4: Remove Hourly Model Rotation
**Timeline:** After Phase 3 validated
The hourly rotation (VERIFIED → REFRESH → OWNERS → VERIFIED_OFFICES) was likely designed to give different listing types fair exposure. With engagement-based scoring, this becomes unnecessary:
- Good owner listings will rank well due to `from_owner` boost + high CVR- Fresh listings will rank well due to stepped freshness boost- Verified/quality listings will rank well due to engagement signals
Consider A/B testing removal of rotation in favor of single engagement-based model.
---
## Expected Impact
| Metric | Current | Expected After | Improvement || ------------------------ | -------------- | ---------------- | ---------------- || Avg CTR (search results) | ~25% | ~35% | +40% || Avg CVR (overall) | 5.5% | 6.5%+ | +18% || Owner listing visibility | Varies by hour | Consistent boost | More predictable || New listing success rate | Unknown | Measurable | Data-driven |
---
## A/B Test Design
### Test Groups
- **Control:** Current scoring with hourly rotation- **Treatment:** New scoring with engagement signals
### Success Metrics
1. **Primary:** Overall CVR (contacts / views)2. **Secondary:** - CTR by position (are we showing more relevant listings?) - Time to first contact for new listings - Owner vs Agent performance gap
### Minimum Sample Size
- ~10,000 searches per group- ~2 weeks of data for statistical significance
---
## Monitoring Queries
### Track Engagement Signal Distribution
```sqlSELECT quantile (0.1) (ctr_7d) AS ctr_p10, quantile (0.5) (ctr_7d) AS ctr_p50, quantile (0.9) (ctr_7d) AS ctr_p90, quantile (0.1) (cvr_7d) AS cvr_p10, quantile (0.5) (cvr_7d) AS cvr_p50, quantile (0.9) (cvr_7d) AS cvr_p90FROM listing_engagement_metricsWHERE category = 1Validate Freshness Boost Effect
SELECT days_bucket, count() AS listings, avg(impressions_today) AS avg_impressions, avg(views_today) AS avg_viewsFROM listings_with_new_scoringGROUP BY days_bucketORDER BY days_bucketSummary of Recommendations
| Priority | Change | Effort | Impact |
|---|---|---|---|
| 1 | Add stepped freshness boost | Low | Medium |
| 2 | Increase video weight (5→15) | Low | Low |
| 3 | Add from_owner field and boost | Low | High |
| 4 | Add engagement signals (ctr, cvr, favorites) | Medium | High |
| 5 | Remove hourly model rotation | Low | Medium |
| 6 | A/B test new scoring | Medium | Validation |
The biggest opportunity is adding engagement signals, particularly favorite_rate which has 0.509 correlation with CVR - meaning listings that users favorite are much more likely to convert to contacts.