ElasticSearch Scoring Function Recommendations

Date: 2026-01-18 Focus: Improving listing ranking based on engagement data analysis

Executive Summary

Analysis of the current ElasticSearch scoring function reveals significant opportunities for improvement. The current system uses static signals only (verified, video, location) and misses critical engagement signals that strongly predict listing quality:

Favorite rate has 0.509 correlation with CVR (strong!)
From Owner listings have 3x higher CVR
Freshness boost should be stepped, not continuous

Current Scoring Function Analysis

Architecture

The current system rotates through 4 scoring models every hour:

Hour 1, 5, 9, 13, 17, 21  → VERIFIED model
Hour 2, 6, 10, 14, 18, 22 → REFRESH model
Hour 3, 7, 11, 15, 19, 23 → OWNERS model
Hour 4, 8, 12, 16, 20     → VERIFIED_OFFICES model

Current Scoring Formula

score =
  (verified * 5-25) +
  (pr / factor) +
  (has_video * 5) +
  (user_type * 0-25) +
  (user_has_sub * 0 or -20) +
  (user_fee * 0-20) +
  (project_id > 0 ? 1000 : 0) +        // Mobile only
  (accurate_location ? 0 : -100) +
  (create_time_hour / factor) +
  (boosted ? 50-100 : 0) +
  (ldu / factor) +
  (cdt / factor) +
  (is_precise_location ? 100 : 0)

Current Model Weights

Factor	VERIFIED	REFRESH	OWNERS	VERIFIED_OFFICES
verified	25	5	15	5
pr	10	25	5	25
video	5	5	5	5
user_type	15	15	0	25
cdt	75	125	75	50
user_has_sub	0	0	—20	0
boosted	100	100	50	100

Gaps Identified by Data Analysis

1. No Engagement Signals

Signal	Current Usage	Data Finding	Recommendation
CTR (impression→view)	❌ Not used	Decays 4.6x with age	Add as ranking signal
CVR (view→contact)	❌ Not used	Stable ~5-6%, varies by quality	Primary quality signal
Favorite rate	❌ Not used	0.509 correlation with CVR	High-weight signal
Session rate	❌ Not used	2.5x higher for high-CVR listings	Medium-weight signal
Share rate	❌ Not used	3.7x higher for high-CVR listings	Low-weight signal

2. Indirect Owner Boost

Current approach: OWNERS model penalizes subscribers (user_has_sub: -20)

Problem: This is indirect and only active 25% of the time.

Data finding: Owner listings have 3x higher CVR (13.95% vs 4.8%)

Recommendation: Explicit from_owner boost in all models.

3. Continuous vs. Stepped Freshness

Current approach: cdt / (todayNumber / weight) - continuous decay

Data finding: New listings show distinct performance tiers:

Days	Avg VPD	Boost Factor
0-1	33	3.5x baseline
2-3	23.6	2.5x baseline
4-7	12.5	1.3x baseline
8+	9.4	1.0x baseline

Recommendation: Stepped freshness boost for first 7 days.

4. Video Weight Too Low

Current: 5 points

Data finding: Video listings get +49% more views

Recommendation: Increase to 15-20 points.

Recommended Scoring Formula V2

New Fields Required in ElasticSearch

// Add to listing document (computed daily from stats)
{
  "from_owner": true,           // Boolean - is advertiser_type = 'owner'
  "ctr_7d": 0.25,               // Float 0-1 - CTR over last 7 days
  "cvr_7d": 0.06,               // Float 0-1 - CVR over last 7 days
  "favorite_rate_7d": 0.08,     // Float 0-1 - Favorites/views last 7 days
  "session_rate_7d": 0.22,      // Float 0-1 - 30-sec sessions/views
  "days_since_created": 15      // Integer - for stepped freshness
}

New Weights Model

const SCORES_MODELS_V2 = {
  ENGAGEMENT_BASED: {
    // === Static Signals (existing) ===
    verified: 20,
    pr: 10,
    video: 15, // Increased from 5
    is_precise_location: 100,
    not_accurate_location: -100,
    boosted: 100,

    // === Freshness (new stepped approach) ===
    freshness_0_3_days: 150, // NEW
    freshness_4_7_days: 75, // NEW
    cdt_continuous: 25, // Reduced from 75-125

    // === User Signals ===
    from_owner: 50, // NEW - explicit owner boost
    user_type: 10,
    user_has_sub: 0,
    user_has_fees: 10,

    // === Engagement Signals (NEW) ===
    ctr_7d: 30, // Normalized 0-1
    cvr_7d: 50, // Most important!
    favorite_rate_7d: 40, // Strong quality signal
    session_rate_7d: 20, // Engagement depth
  },
};

New Script Score

{
  query: oldQuery,
  functions: [
    {
      script_score: {
        script: {
          source: `
            double score = 0;

            // === STATIC SIGNALS ===
            score += doc['verified'].value * ${weights.verified};
            score += doc['pr'].value / ${10 / weights.pr};
            score += doc['has_video'].value * ${weights.video};
            score += doc['is_precise_location'].value == true ? 100 : 0;
            score += doc['accurate_location'].value > 0 ? 0 : ${weights.not_accurate_location};
            score += doc['boosted'].value == 1 ? ${weights.boosted} : 0;

            // === STEPPED FRESHNESS BOOST ===
            long daysSinceCreated = doc['days_since_created'].value;
            if (daysSinceCreated <= 3) {
              score += ${weights.freshness_0_3_days};
            } else if (daysSinceCreated <= 7) {
              score += ${weights.freshness_4_7_days};
            }
            // Continuous decay for older listings
            score += doc['cdt'].value / ${todayNumber / weights.cdt_continuous};

            // === USER SIGNALS ===
            score += doc['from_owner'].value == true ? ${weights.from_owner} : 0;
            score += doc['user.type'].value * ${weights.user_type};
            score += doc['user.paid'].value > 0 ? ${weights.user_has_sub} : 0;
            score += doc['user.fee'].value * ${weights.user_has_fees};

            // === ENGAGEMENT SIGNALS ===
            // Only apply if listing has sufficient data (>7 days old, >100 impressions)
            if (daysSinceCreated > 7 && doc['impressions_total'].value > 100) {
              score += doc['ctr_7d'].value * ${weights.ctr_7d};
              score += doc['cvr_7d'].value * ${weights.cvr_7d};
              score += doc['favorite_rate_7d'].value * ${weights.favorite_rate_7d};
              score += doc['session_rate_7d'].value * ${weights.session_rate_7d};
            }

            // === PROJECT BOOST (mobile only) ===
            score += doc['project_id'].value > 0 ? ${from_web ? 0 : 1000} : 0;

            return Math.round(score);
          `
        }
      },
      weight: 1
    }
  ],
  score_mode: 'sum'
}

Implementation Plan

Phase 1: Quick Wins (No New Fields)

Timeline: Can deploy immediately

Change	Current	New	Impact
Video weight	5	15	+49% view correlation
Add stepped freshness	Continuous only	Add tier bonuses	Better new listing visibility

// Add to existing script (no new fields needed)
long daysOld = (${todayNumber} - doc['cdt'].value);
double freshnessBoost = daysOld <= 3 ? 150 : daysOld <= 7 ? 75 : 0;
score += freshnessBoost;

Phase 2: Add Owner Signal

Timeline: 1-2 days

Add from_owner boolean to ES mapping
Populate from advertiser_type = 'owner' in listings
Add to scoring: score += doc['from_owner'].value ? 50 : 0

Phase 3: Add Engagement Signals

Timeline: 1-2 weeks

Create daily aggregation job:

SELECT id, views_7d / NULLIF(impressions_7d, 0) AS ctr_7d, contacts_7d / NULLIF(views_7d, 0) AS cvr_7d, favorites_7d / NULLIF(views_7d, 0) AS favorite_rate_7d, sessions_7d / NULLIF(views_7d, 0) AS session_rate_7d FROM listing_stats_7d

2. Normalize values to 0-1 range (use percentiles within category/district)

3. Update ES mapping and indexing pipeline

4. Add engagement signals to scoring formula

### Phase 4: Remove Hourly Model Rotation

**Timeline:** After Phase 3 validated

The hourly rotation (VERIFIED → REFRESH → OWNERS → VERIFIED_OFFICES) was likely designed to give different listing types fair exposure. With engagement-based scoring, this becomes unnecessary:

- Good owner listings will rank well due to `from_owner` boost + high CVR
- Fresh listings will rank well due to stepped freshness boost
- Verified/quality listings will rank well due to engagement signals

Consider A/B testing removal of rotation in favor of single engagement-based model.

---

## Expected Impact

| Metric                   | Current        | Expected After   | Improvement      |
| ------------------------ | -------------- | ---------------- | ---------------- |
| Avg CTR (search results) | ~25%           | ~35%             | +40%             |
| Avg CVR (overall)        | 5.5%           | 6.5%+            | +18%             |
| Owner listing visibility | Varies by hour | Consistent boost | More predictable |
| New listing success rate | Unknown        | Measurable       | Data-driven      |

---

## A/B Test Design

### Test Groups

- **Control:** Current scoring with hourly rotation
- **Treatment:** New scoring with engagement signals

### Success Metrics

1. **Primary:** Overall CVR (contacts / views)
2. **Secondary:**
   - CTR by position (are we showing more relevant listings?)
   - Time to first contact for new listings
   - Owner vs Agent performance gap

### Minimum Sample Size

- ~10,000 searches per group
- ~2 weeks of data for statistical significance

---

## Monitoring Queries

### Track Engagement Signal Distribution

```sql
SELECT
  quantile (0.1) (ctr_7d) AS ctr_p10,
  quantile (0.5) (ctr_7d) AS ctr_p50,
  quantile (0.9) (ctr_7d) AS ctr_p90,
  quantile (0.1) (cvr_7d) AS cvr_p10,
  quantile (0.5) (cvr_7d) AS cvr_p50,
  quantile (0.9) (cvr_7d) AS cvr_p90
FROM
  listing_engagement_metrics
WHERE
  category = 1

Validate Freshness Boost Effect

SELECT
  days_bucket,
  count() AS listings,
  avg(impressions_today) AS avg_impressions,
  avg(views_today) AS avg_views
FROM
  listings_with_new_scoring
GROUP BY
  days_bucket
ORDER BY
  days_bucket

Summary of Recommendations

Priority	Change	Effort	Impact
1	Add stepped freshness boost	Low	Medium
2	Increase video weight (5→15)	Low	Low
3	Add `from_owner` field and boost	Low	High
4	Add engagement signals (ctr, cvr, favorites)	Medium	High
5	Remove hourly model rotation	Low	Medium
6	A/B test new scoring	Medium	Validation

The biggest opportunity is adding engagement signals, particularly favorite_rate which has 0.509 correlation with CVR - meaning listings that users favorite are much more likely to convert to contacts.