ðŸ"" How the Database Gets Updated
The analytics pipeline runs daily at 7:00 AM PST via run_pipeline.py. It can also be triggered manually for any date or as a full rebuild.
Commands:
python run_pipeline.py â€" Transform today's datapython run_pipeline.py --ingest â€" Ingest from APIs + transformpython run_pipeline.py --date 2026-04-01 â€" Process a specific datepython run_pipeline.py --full â€" Full rebuild of all calculated tables
ðŸ"¥ Raw Tables â€" Ingested Data
Source data pulled directly from APIs and CSVs. These are the ground truth â€" never modified by transforms.
ingest_shopify.py. Includes DTC, wholesale (Walmart/Target/Faire), and subscription orders. Excludes cancelled/voided. One row per order.
Source: Shopify GraphQL
orders query â†' customer journey + financial data
| Column | Type | Description |
|---|---|---|
| shopify_id PK | TEXT | Shopify GID (e.g. gid://shopify/Order/5048552259832) |
| order_number | TEXT | Human-readable order number (e.g. #59925) |
| created_at | TEXT | ISO 8601 timestamp when order was placed (UTC) |
| date | TEXT | Order date in PST (YYYY-MM-DD). Used as the canonical date for all reporting. |
| customer_id | TEXT | Shopify customer GID |
| customer_email | TEXT | Customer email address |
| customer_order_index | INTEGER | Nth order for this customer (1 = new customer, 2+ = returning) |
| subtotal | REAL | Order subtotal before shipping/tax (after line-item discounts) |
| total_discounts | REAL | Total discount amount applied to the order |
| total_shipping | REAL | Shipping charged to customer |
| total_price | REAL | Final amount collected (net revenue) |
| total_tax | REAL | Tax amount |
| tags | TEXT | Comma-separated Shopify tags (used for wholesale detection: Walmart, Target, Faire) |
| source_name | TEXT | Checkout source (checkout_next, shopify_draft_order, etc.) |
| financial_status | TEXT | Payment status (PAID, PARTIALLY_REFUNDED, REFUNDED) |
| is_subscription | INTEGER | 1 if any line item has a selling plan (subscription) |
| is_first_subscription | INTEGER | 1 if this is the customer's first subscription order |
| is_wholesale | INTEGER | 1 if tagged Walmart, Target, or Faire |
| wholesale_channel | TEXT | Which wholesale channel (Walmart, Target, Faire, or null) |
| fetched_at | TEXT | When this row was last fetched/updated from Shopify |
shopify_id (unique PK), date
transform_sku.py to calculate True Gross Sales (quantity Ã- price_list_price).
Source: Shopify GraphQL
lineItems within each order
| Column | Type | Description |
|---|---|---|
| id PK | INTEGER | Auto-increment ID |
| order_id | TEXT | FK â†' raw_orders.shopify_id |
| date | TEXT | Order date (denormalized for fast SKU queries) |
| sku | TEXT | Product SKU (e.g. 43111, 80060L) |
| product_name | TEXT | Full product name |
| variant_title | TEXT | Variant (e.g. "25 tea bags", "11 ounce grind") |
| quantity | INTEGER | Units purchased in this line item |
| original_unit_price | REAL | Price per unit before discounts |
| total_price | REAL | Total line item price after discounts |
| selling_plan_id | TEXT | Shopify selling plan GID (non-null = subscription) |
| is_subscription | INTEGER | 1 if purchased via subscription selling plan |
| is_wholesale | INTEGER | 1 if parent order is wholesale |
order_id, date, sku
Source: Shopify GraphQL
discountAllocations on each line item
| Column | Type | Description |
|---|---|---|
| id PK | INTEGER | Auto-increment ID |
| order_id | TEXT | FK â†' raw_orders.shopify_id |
| line_item_index | INTEGER | Position of line item within the order (0-based) |
| sku | TEXT | SKU of the discounted item |
| amount | REAL | Dollar amount of this discount allocation |
| discount_type | TEXT | Type: discount_code, automatic, manual |
| discount_title | TEXT | Name of the discount (e.g. "NEWYEAR", "Free Gift with Purchase") |
| application_typename | TEXT | Shopify type: DiscountCodeApplication, AutomaticDiscountApplication, ManualDiscountApplication |
order_id
Source:
ingest_meta.py â†' reads Daily Meta Ads.csv (generated by daily_meta_report.py from Meta Marketing API, account 559530661221525)
| Column | Type | Description |
|---|---|---|
| date PK | TEXT | Date (YYYY-MM-DD) |
| spend | REAL | Total ad spend ($) |
| impressions | INTEGER | Total impressions |
| clicks | INTEGER | Link clicks |
| reach | INTEGER | Unique users reached |
| purchases | INTEGER | Attributed purchases (1-day click / 7-day click) |
| purchase_value | REAL | Attributed purchase revenue ($) |
| fetched_at | TEXT | When this row was last ingested |
date (unique PK)
Source:
ingest_google.py â†' reads Daily Google Ads.csv (generated by daily_google_report.py from Google Ads API, customer ID 806-386-6750)
| Column | Type | Description |
|---|---|---|
| date PK | TEXT | Date (YYYY-MM-DD) |
| spend | REAL | Total ad spend ($) |
| impressions | INTEGER | Total impressions |
| clicks | INTEGER | Total clicks |
| conversions | REAL | Attributed conversions (fractional â€" Google uses data-driven attribution) |
| conversion_value | REAL | Attributed conversion revenue ($) |
| fetched_at | TEXT | When this row was last ingested |
date (unique PK)
Source:
ingest_shipstation.py → ShipStation REST API /shipments endpoint
| Column | Type | Description |
|---|---|---|
| shipment_id PK | TEXT | ShipStation shipment ID |
| order_id | TEXT | ShipStation order ID |
| order_number | TEXT | Human-readable order number (matches Shopify order number) |
| ship_date | TEXT | Date shipped (YYYY-MM-DD) |
| carrier_code | TEXT | Carrier (dhl_global_mail, stamps_com, fedex) |
| service_code | TEXT | Service (e.g. dhl_parcel_ground, usps_priority_mail) |
| shipment_cost | REAL | Actual carrier cost paid ($) — what we pay to ship |
| insurance_cost | REAL | Insurance cost ($) |
| weight_oz | REAL | Package weight in ounces |
| tracking_number | TEXT | Carrier tracking number |
| to_state | TEXT | Destination state |
| to_country | TEXT | Destination country code |
| is_voided | INTEGER | 1 if voided/cancelled |
| is_return | INTEGER | 1 if return label |
| fetched_at | TEXT | When last fetched from API |
shipment_id (PK), ship_date, order_number
🧠Journey Tables â€" Customer Attribution
Marketing touchpoint data from Shopify's customer journey tracking. Shows how customers discovered and returned to the store before purchasing.
customerJourneySummary GraphQL field. One row per order.
Source:
ingest_shopify.py â†' Shopify GraphQL customerJourneySummary
| Column | Type | Description |
|---|---|---|
| order_id PK | TEXT | FK â†' raw_orders.shopify_id |
| customer_order_index | INTEGER | Customer's nth order |
| days_to_conversion | INTEGER | Days from first visit to purchase |
| first_visit_source | TEXT | First-touch channel (email, direct, google, facebook, etc.) |
| first_visit_source_type | TEXT | Source type classification |
| first_visit_source_description | TEXT | Human-readable description from Shopify |
| first_visit_landing_page | TEXT | First landing page URL |
| first_visit_referrer_url | TEXT | Referrer URL for first visit |
| first_visit_referral_code | TEXT | Referral code if applicable |
| first_visit_at | TEXT | ISO timestamp of first visit |
| first_visit_utm_source | TEXT | UTM source parameter |
| first_visit_utm_medium | TEXT | UTM medium parameter |
| first_visit_utm_campaign | TEXT | UTM campaign parameter |
| first_visit_utm_term | TEXT | UTM term parameter |
| first_visit_utm_content | TEXT | UTM content parameter |
| last_visit_source | TEXT | Last-touch channel before conversion |
| last_visit_source_type | TEXT | Last-touch source type |
| last_visit_source_description | TEXT | Description of last visit |
| last_visit_landing_page | TEXT | Last landing page before conversion |
| last_visit_referrer_url | TEXT | Last referrer URL |
| last_visit_referral_code | TEXT | Last referral code |
| last_visit_at | TEXT | ISO timestamp of last visit before purchase |
| last_visit_utm_* | TEXT | UTM parameters for last visit (source, medium, campaign, term, content) |
| moments_count | INTEGER | Total touchpoints in the customer journey |
| journey_ready | INTEGER | 1 if Shopify has finalized journey data for this order |
order_id (unique PK)
Source:
ingest_shopify.py â†' Shopify GraphQL customerJourneySummary.moments
| Column | Type | Description |
|---|---|---|
| id PK | INTEGER | Auto-increment ID |
| order_id | TEXT | FK â†' raw_orders.shopify_id |
| moment_index | INTEGER | Chronological index of this moment (0-based) |
| occurred_at | TEXT | ISO timestamp when the touchpoint occurred |
| source | TEXT | Channel source (email, direct, google, facebook, etc.) |
| source_type | TEXT | Source type classification |
| source_description | TEXT | Human-readable description |
| landing_page | TEXT | URL the customer landed on |
| referrer_url | TEXT | Referring URL |
| referral_code | TEXT | Referral/affiliate code if present |
| utm_source | TEXT | UTM source parameter |
| utm_medium | TEXT | UTM medium parameter |
| utm_campaign | TEXT | UTM campaign parameter |
| utm_term | TEXT | UTM term parameter |
| utm_content | TEXT | UTM content parameter |
order_id
âšTMï ̧ Calculated Tables â€" Transformed Data
Derived from raw tables by transform scripts. These power the MTD dashboard and reports. Rebuilt daily (or on-demand).
Source:
transform_daily.py â†' aggregates raw_orders + raw_line_items + raw_discount_allocations
Gross formula:
True Gross = Σ(quantity Ã- price_list_price) per line item, using monthly price list. Free items excluded — items with unit_price = $0 (BOGO/gifts) are tracked separately in dtc_free_item_value.
| Column | Type | Description |
|---|---|---|
| date PK | TEXT | Date (YYYY-MM-DD) |
| â€" All-Channel Totals â€" | ||
| total_orders | INTEGER | Total orders (DTC + wholesale) |
| new_orders | INTEGER | New customer orders (customer_order_index = 1) |
| ret_orders | INTEGER | Returning customer orders |
| total_revenue | REAL | Total net revenue (all channels) |
| new_revenue | REAL | Revenue from new customers |
| ret_revenue | REAL | Revenue from returning customers |
| â€" DTC (Direct-to-Consumer) â€" | ||
| dtc_orders | INTEGER | DTC orders only (excludes Walmart/Target/Faire) |
| dtc_new_orders | INTEGER | DTC new customer orders |
| dtc_ret_orders | INTEGER | DTC returning customer orders |
| dtc_gross | REAL | True Gross Sales = Σ(units Ã- monthly price list price) for DTC line items |
| dtc_revenue | REAL | DTC net revenue (total_price from orders) |
| dtc_new_revenue | REAL | DTC revenue from new customers |
| dtc_ret_revenue | REAL | DTC revenue from returning customers |
| dtc_subtotal | REAL | Sum of order subtotals (after line discounts, before shipping) |
| â€" Discount Breakdown â€" | ||
| dtc_discounts | REAL | Total discount amount (all types combined) |
| dtc_subscription_discounts | REAL | Subscription discount allocations |
| dtc_automatic_discounts | REAL | Automatic discount allocations |
| dtc_code_discounts | REAL | Discount code allocations |
| dtc_manual_discounts | REAL | Manual discount allocations |
| dtc_shipping_discounts | REAL | Shipping discount allocations |
| â€" Free Items â€" | ||
| dtc_free_item_value | REAL | Retail value of free items given away |
| dtc_free_item_count | INTEGER | Number of free items |
| dtc_free_item_cogs | REAL | Cost of goods for free items |
| â€" Subscriptions â€" | ||
| sub_orders | INTEGER | Total subscription orders |
| sub_revenue | REAL | Subscription revenue |
| first_sub_orders | INTEGER | First-time subscription orders (new subscribers) |
| recurring_sub_orders | INTEGER | Recurring subscription orders |
| sub_gross | REAL | True Gross for subscription orders only (units × price list price, excl. free items) |
| sub_new_gross | REAL | True Gross for first-time subscription orders |
| sub_new_revenue | REAL | Net revenue from first-time subscription orders |
| sub_recurring_gross | REAL | True Gross for recurring subscription orders |
| sub_recurring_revenue | REAL | Net revenue from recurring subscription orders |
| sub_units | INTEGER | Total units sold via subscription |
| â€" Wholesale â€" | ||
| ws_orders | INTEGER | Wholesale orders (Walmart + Target + Faire) |
| ws_gross | REAL | Wholesale gross sales |
| ws_revenue | REAL | Wholesale net revenue |
| â€" Totals â€" | ||
| subtotal | REAL | Sum of all order subtotals |
| total_discounts | REAL | Sum of all discounts |
| total_shipping | REAL | Sum of customer-paid shipping |
| total_revenue_all | REAL | Total revenue across all channels |
| total_units | INTEGER | Total units sold |
| calculated_at | TEXT | Timestamp when this row was last calculated |
date (unique PK)
Source:
transform_sku.py â†' aggregates raw_line_items + joins price_list + cogs
| Column | Type | Description |
|---|---|---|
| date PK | TEXT | Date |
| sku PK | TEXT | Product SKU |
| units_sold | INTEGER | Total units sold (all channels) |
| gross_sales | REAL | Gross sales = units Ã- price list price |
| net_sales | REAL | Net sales after discounts |
| discounts | REAL | Total discounts applied to this SKU |
| dtc_units | INTEGER | DTC units sold |
| dtc_gross | REAL | DTC gross sales |
| dtc_net | REAL | DTC net sales |
| dtc_discounts | REAL | DTC discounts |
| ws_units | INTEGER | Wholesale units sold |
| ws_gross | REAL | Wholesale gross sales |
| sub_units | INTEGER | Subscription units |
| cogs | REAL | Cost of goods sold (units Ã- cogs_per_unit from cogs table) |
| calculated_at | TEXT | When this row was calculated |
date + sku (composite PK), date, sku
attributed_channel (Meta, Google, Email, Direct, Organic, Other).
Source:
transform_attribution.py â†' joins raw_orders + order_journey
| Column | Type | Description |
|---|---|---|
| order_id PK | TEXT | FK â†' raw_orders.shopify_id |
| date | TEXT | Order date |
| first_touch_source | TEXT | First-touch source (parsed from UTM/referrer) |
| first_touch_medium | TEXT | First-touch medium |
| first_touch_campaign | TEXT | First-touch campaign name |
| last_touch_source | TEXT | Last-touch source before conversion |
| last_touch_medium | TEXT | Last-touch medium |
| last_touch_campaign | TEXT | Last-touch campaign name |
| attributed_channel | TEXT | Simplified channel: Meta, Google, Email, Direct, Organic, Other |
| revenue | REAL | Order revenue |
| is_new_customer | INTEGER | 1 if first order |
| is_subscription | INTEGER | 1 if subscription order |
| days_to_conversion | INTEGER | Days from first visit to purchase |
| touchpoint_count | INTEGER | Number of touchpoints in the journey |
order_id (unique PK), date
Source:
transform_cohorts.py â†' aggregates raw_orders by customer. Only runs on full rebuild (too expensive for daily).
| Column | Type | Description |
|---|---|---|
| customer_id PK | TEXT | Shopify customer GID |
| cohort_month | TEXT | Month of first order (YYYY-MM) |
| first_order_date | TEXT | Date of customer's first order |
| total_orders | INTEGER | Total lifetime orders |
| total_revenue | REAL | Total lifetime revenue |
| ltv_30d | REAL | Revenue within 30 days of first order |
| ltv_90d | REAL | Revenue within 90 days of first order |
| ltv_365d | REAL | Revenue within 365 days of first order |
| is_subscriber | INTEGER | 1 if customer has ever placed a subscription order |
| last_order_date | TEXT | Date of most recent order |
customer_id (unique PK)
ðŸ"‹ Reference Tables â€" Lookups & Constants
Static or semi-static reference data used by transform scripts. Updated when prices change or new COGS data arrives.
Source:
PriceList/monthly/*.json files on OneDrive, generated by generate_monthly_price_reference.py from Shopify GraphQL. 40 months covered (Jan 2023 → Apr 2026)
| Column | Type | Description |
|---|---|---|
| year_month PK | TEXT | Month (YYYY-MM) |
| sku PK | TEXT | Product SKU |
| retail_price | REAL | Retail price for this SKU in this month ($) |
| source_file | TEXT | Source JSON filename |
year_month + sku (composite PK)
Source:
COGS/Cogs.xlsx on OneDrive, imported by create_db.py
| Column | Type | Description |
|---|---|---|
| sku PK | TEXT | Product SKU |
| cogs_per_unit | REAL | Cost per unit ($). 0.0 for non-physical items. |
| effective_date | TEXT | When this COGS rate became effective (may be null) |
| source_sheet | TEXT | Sheet in Cogs.xlsx this came from (e.g. "Jan 2026") |
sku (unique PK)