Form 4 (Insider Transactions, SEC Filing)
Data Overview
Form 4 is the “Statement of Changes in Beneficial Ownership” that every director, officer, or ≥ 10 % shareholder (“Section 16 insider”) must file with the SEC no later than two business days after trading the company’s equity or related derivatives. The rule—tightened by Sarbanes-Oxley—gives public investors near-real-time visibility into whether informed insiders are accumulating or disposing of shares, thereby deterring illicit trading and providing a decision-critical transparency feed.
Relevance to predictive modeling
direct signal of management conviction. Net open-market buys by senior officers predict positive 1- to 3-month excess returns, while option-exercise-and-sell combos can flag short setups.
Document anatomy and formats
Each filing is XHTML + embedded XML.
Key blocks are:
header metadata (issuer CIK, insider CIK, officer title)
a primary table for non-derivative transactions
a second table for derivative trades
footnotes with Rule 10b5-1 plan flags or explanations.
Every transaction line carries fields such as transactionDate
, transactionCode
(P = open-market purchase, S = sale, A = grant, M = option exercise), transactionShares
, transactionPricePerShare
, and ownershipNature
(direct vs indirect).
The SEC also publishes a daily Insider Transactions Data Set extracted directly from the XML tags.
Latency profile
Because insiders have exactly two business days to file, alpha depends on speed of capture. EDGAR posts the XML seconds after acceptance—empirically < 15 s. Vendor feeds (e.g., Refinitiv, Sentieo) add NLP tags and arrive within 1–4 minutes. Academic work shows abnormal returns from insider purchases decay sharply after the first trading day, so sub-minute ingestion is material.
Data Processing Pipeline
This is an overview of what the pipeline could look like as part of a first-draft requirements sheet. Teams should refine based on tech stack and custom needs.
Ingest – Subscribe to the EDGAR “ownership.xml” RSS, fall back to vendor push; partition filings by
ticker
and push them into Kafka.Parse – Stream the XML with lxml, explode each
<nonDerivativeTable>
and<derivativeTable>
row into atomic events.Dedup & normalise – Hash (
insiderCIK
,transactionDate
,transactionCode
,price
,shares
) to drop amended 4/As; split mixed option-exercise-and-sell combos into separate events.Entity enrichment – Map insider CIKs to roles (CEO, CFO, director), tenure length, and Rule 10b5-1 plan presence using the SEC’s new checkbox tags.
Feature engineering – Aggregate rolling 5-, 20-, and 60-day nets; compute relative size vs float and dollar-value percentile vs insider’s own history.
Manual QA – Daily spot-check filings with > 5 footnotes or unusually low/high reported prices to catch reporting errors.
Storage – Write point-in-time feature objects to the Feature Store (offline Iceberg, online Redis).
Features for Predictive Modeling
-
{
"ticker": "NVDA",
"event_time": "2024-09-24T00:00:00Z",
"insider_info": {
"insider_name": "STEVENS MARK A",
"insider_id": "mark_a_stevens",
"insider_role": ["Director"],
"is_officer": false,
"is_director": true,
"is_10pct_owner": false,
"relationship_note": "By Trust",
"reporting_entity": "The Envy Trust u/a/d December 7, 2021"
},
"transaction_details": [
{
"security_type": "Common Stock",
"transaction_type": "open_market_sell",
"transaction_code": "S",
"transaction_date": "2024-09-24",
"trade_amount": 165100,
"trade_type": "Disposal",
"transaction_price_usd": 121.2685,
"weighted_price_flag": true,
"transaction_dollar_value": 20031196.35,
"ownership_type": "Indirect",
"ownership_note": "By Trust",
"remaining_shares_post_transaction": 8420117,
"indirect_beneficial_owner_entity": "Envy Trust"
}
// Further entries if multiple transactions on this Form 4
],
"post_transaction_position": {
"total_shares_beneficially_owned": 8420117,
"ownership_form": "Indirect",
"total_pct_of_outstanding": 0.33,
"derivatives_held": 0
},
"historical_context": {
"insider_history_percentile": 94, // percentile vs prior insider sales (size, freq)
"five_day_net_shares": -354200, // net shares bought/sold last 5 days
"twenty_day_aggregate_buy_sell_ratio": 0.7,
"prior_year_transaction_count": 11,
"largest_transaction_past_year": 200000
},
"regulatory_flags": {
"rule_10b5_1_flag": false,
"late_submission_flag": false,
"multi_transaction_flag": false,
"option_exercise_flag": false,
"option_exercise_followed_by_sale_flag": false
}
}
Field Explanations & Grouped Overview
Insider Info (insider_info
)
insider_name / insider_id: Reporter’s name, unique key for join.
insider_role: Role(s) at company (Director, Officer, 10% Owner).
relationship_note: If acting as trustee, executor, or LLC manager.
reporting_entity / indirect_beneficial_owner_entity: Trust or entity name if indirect.
Transaction Details (transaction_details
– list, one per row)
security_type: E.g., Common Stock, Option.
transaction_type / transaction_code: Nature (buy/sell), and SEC code (“P” = purchase, “S” = sell).
transaction_date: Trade execution date.
trade_amount: Number of shares traded.
trade_type: “Acquisition” or “Disposal.”
transaction_price_usd: Usual or weighted average price per share.
weighted_price_flag: True if price is a weighted average (as footnote/legend).
transaction_dollar_value: Value of this trade (trade_amount × price).
ownership_type / ownership_note: Direct/Indirect, trust notes.
remaining_shares_post_transaction: Shares held after this event.
Post-Transaction Position (post_transaction_position
)
total_shares_beneficially_owned: Total ownership after event.
ownership_form: Direct/indirect/combination.
total_pct_of_outstanding: Percentage of total shares outstanding (for materiality).
derivatives_held: Number/options remaining.
Historical Context (historical_context
)
insider_history_percentile: Event size vs insider’s history at this ticker.
five_day_net_shares: Rolling net insider buy/sell last 5 days.
twenty_day_aggregate_buy_sell_ratio: Insider buy/sell ratio over 20 days.
prior_year_transaction_count: How active this insider has been.
largest_transaction_past_year: For benchmarking this trade’s significance.
Regulatory Flags & Anomalies (regulatory_flags
)
rule_10b5_1_flag: Sale under pre-set plan (less predictive).
late_submission_flag: Was the SEC deadline missed.
multi_transaction_flag: Were there multiple transactions in this filing.
option_exercise_flag / option_exercise_followed_by_sale_flag: Structure of trade.
Alpha Hypotheses
• Net open-market buying by senior officers outperforms matched peers by ~40 bp over the following month; effect is strongest when the buy value ranks in the top decile of that insider’s history.NBER
• Clustering signal: multiple insiders buying within five trading days (insider-cluster score ≥ 3) predicts both higher short-term returns and lower idiosyncratic draw-down risk.
• Rule 10b5-1 exclusion: Trades not made under a pre-planned 10b5-1 program carry roughly double the predictive power versus automated plan trades.
• CEO vs director split: CEO purchases carry more signal than director purchases, while director sales carry negligible negative alpha.
Risks and Mitigation
• 10b5-1 program dilution – Many sales are automatic; flagging the new XML checkbox prevents misclassifying them as discretionary sentiment.
• Option-exercise noise – Exercise-and-hold increases insider ownership but is often tax-driven; treat transactionCode=M
buys separately from open-market P
purchases.
• Micro-cap illusion – A small-dollar buy can be a large percent of float in thin names, inflating signals; cap pct_of_float
and apply liquidity filters.
• Back-dated amendments – 4/As can overwrite prior values; always use the latest timestamp but retain originals for audit trails.
• Weekend filing drift – Friday evening filings give minimal reaction time; models should bucket overnight and weekend effects separately.
• Crowding risk – Popular quant shops now trade simple net-buy ratios; orthogonalise with role-rank, buy intensity percentile, and 10b5-1 flags to maintain edge.