The complete Pump.fun history: every create, every trade, every graduation
Pump.fun launched the bonding-curve memecoin era on Solana and is still the highest-traffic application on the chain most weeks. The entire ledger is on chain, but reconstructing it from raw blocks is the kind of project that eats a quarter. We did it. Every Pump.fun create, every trade, every bonding-curve graduation since the program first deployed, parsed against the program IDL, decimal-normalized, USD-stamped against the SOL price oracle nearest each slot, and packaged month by month as Parquet. Drop it into DuckDB and ask anything: who minted the ten thousand shortest-lived tokens, which creator wallet has the highest median-time-to-graduation, what does the survivor curve look like across the bonding-curve threshold, where the MEV traffic actually clusters during launches. The real-time stream covers tomorrow's launches; this dataset covers everything before that.
- Pump.fun (bonding curve program)6EF8rrecthR5Dkzon8Nwu78hRvfCKubJ14M5uBEwF6P, The original Pump.fun program. Mints tokens and runs the on-chain bonding curve to graduation
- PumpSwap (post-graduation AMM)pAMMBay6oceH9fJKBRHGP5D4bD4sWpmSwMn52FMfXEA, The AMM tokens settle into after graduation. Joined to the archive so the lifecycle is continuous
- Raydium AMM v4 (alt graduation path)675kPX9MHTjS2zt1qfr1NYHuzeLXfQM9H24wFSUt1Mp8, Earlier graduations route here via the migration helper; tracked for backwards compatibility
Tables in the archive
seven Parquet tables covering the full Pump.fun lifecycle from create to post-graduation
| Event | Type | Description | Frequency | Latency |
|---|---|---|---|---|
| pumpfun_creates | event | One row per token creation: mint, creator wallet, virtual SOL/token reserves, metadata URI, timestamp, slot. | High | — |
| pumpfun_trades | event | One row per buy or sell on the bonding curve. Side, SOL amount, token amount, post-trade reserves, fee, signer. | Very high | — |
| pumpfun_graduations | event | Bonding-curve completion events. Includes graduation slot, final reserves, the AMM the token migrated to, and the LP setup. | Medium | — |
| pumpfun_creator_aggregates | event | Pre-computed per-creator statistics: tokens launched, median graduation rate, total volume routed, time-to-graduation distribution. | Low | — |
| pumpfun_token_lifecycle | event | One row per mint with the full lifecycle joined: create slot, peak market cap, graduation flag, post-graduation AMM, time-to-rug if applicable. | Low | — |
| pumpfun_priority_fees | event | Compute-unit price, priority fee, and Jito tip per Pump.fun transaction. The basis for any MEV or block-builder analysis on memecoin flow. | Very high | — |
| pumpfun_post_graduation_trades | event | Trades on PumpSwap and Raydium for graduated tokens, joined to the original mint. Useful when the question crosses the bonding-curve threshold. | High | — |
Archive scale
last reviewed 2026-04-29
Why anyone needs a Pump.fun archive
The Pump.fun program runs the bonding curves for most memecoin launches on Solana. It's also the most-imitated source of historical questions in Solana research right now. A few that come up every week: which creator wallets graduate tokens at above-average rates, what's the median time from create to first graduation-bound trade, how does Jito tip distribution shift around launch slots, where do snipers actually cluster relative to the curve mid-point.
You can answer every one of those from on-chain data. You can't answer them from on-chain data fast. The Pump.fun program has produced something like 14 million mints and 1.8 billion trades since launch. Decoding those from raw getBlock JSON is a multi-month project for a small team and a non-trivial RPC bill before you start.
The archive collapses that. Pre-decoded Parquet, monthly bundles, schema documented, USD-stamped, ready for DuckDB or pandas. The first day of work is your model query, not your decoder.
What the tables actually look like
Five core tables, two pre-computed aggregates, one continuity table that follows tokens past graduation. Schemas live in every bundle's manifest.json so you don't need to guess.
| Table | Grain | Key columns |
|---|---|---|
| pumpfun_creates | 1 row per mint | mint, creator_wallet, slot, virtual_sol, virtual_tokens, metadata_uri |
| pumpfun_trades | 1 row per buy/sell | mint, signer, side, sol_amount, token_amount, sol_amount_usd, post_reserves |
| pumpfun_graduations | 1 row per graduation | mint, graduation_slot, target_amm, lp_pubkey, final_reserves |
| pumpfun_priority_fees | 1 row per tx | signature, slot, cu_price, prio_fee, jito_tip, mint |
| pumpfun_creator_aggregates | 1 row per creator wallet | creator_wallet, tokens_launched, graduations, grad_rate, ttg_p50, total_volume_usd |
| pumpfun_token_lifecycle | 1 row per mint | mint, create_slot, peak_mcap_usd, graduated, ttg_minutes, last_trade_slot |
| pumpfun_post_graduation_trades | 1 row per AMM trade | mint, amm, swap_id, signer, amount_in_usd, slot |
What teams build with this
ML graduation predictors
Train on pumpfun_creates joined to pumpfun_token_lifecycle. Features include creator history, time-of-day, initial metadata patterns, first-N-buys behavior. Label is graduated (yes/no). Standard binary classifier. The hard part is the data, which is the part we sell.
Sniper-bot backtests
Replay pumpfun_trades on the bonding curve and simulate entry/exit on a candidate strategy. The post-graduation trades table extends the simulation past the AMM transition so the P&L numbers are honest.
Creator leaderboards
Pre-computed in the aggregates table. Plug into a public page or feed it into your due-diligence tool. The same math underlies most “serial creator” signals sold by other vendors at five times the price.
MEV and priority-fee research
pumpfun_priority_fees has cu_price, priority fee, and Jito tip on every transaction. Distribution charts, builder- preference analysis, and pre-launch tip-spike detection all run off this single table.
Rug detection labels
Lifecycle table flags “last trade ever” per mint. Define rug as “peak market cap above X then no trades for Y hours,” label, train. We can also ship pre-labeled bundles on contract with our internal classifier output.
Memecoin journalism and research
“What happened during the X token saga” pieces are usually two queries away in DuckDB. Trades for the mint, joined to creator history, joined to priority-fee spikes. The shape tells the story.
Archive plus live stream is the production pattern
Most production teams run both. The archive is where you backtest, train, and prototype. The real-time Pump.fun stream is where you run inference. Same schema, same field names, same decoding rules, so the model you trained on Parquet works on the gRPC payload without translation.
When a strategy moves from research to production, the only thing that should change is the source of the rows. From read_parquet(...) to a Yellowstone subscribe call. Everything else, the feature extraction, the model, the post-processing, runs unchanged. That's the actual reason we keep historical and live on identical schemas.
The post-graduation table is the bridge for the strategies that span the bonding-curve threshold. Teams that close the position at graduation want one table; teams that hold through migration want both. Both supported in the same bundle.
Pricing and how the archive ships
$200 a month for the rolling archive. That covers every table, refreshed daily, with monthly immutable snapshots published on the first of each month. 30% discount at six months, 50% at twelve. The full historical depth is included from day one of your subscription.
One-off historical pulls (full archive as a one-time download, no recurring updates) are quoted separately. They run higher because the cost is in the parsing, not the delivery, and a one-time customer doesn't amortize the ingest pipeline. Most teams want the recurring tier so the bundle stays current with mainnet.
Custom labels (rug classifications, sniper-detection scores, creator-network clustering) are an engineering project on top of the standard archive. Send us the brief and we'll quote it. The same engineers who ship the live parsed stream do the labeling, so the data quality stays consistent.
Frequently asked questions
Related products
Subscribe to creates, trades, and graduations as they happen. Same schema as the historical archive.
Post-graduation AMM. Useful when the question crosses the bonding-curve threshold.
The full historical-data catalog. Pump.fun is one program; the catalog covers 40+.
Raw getBlock JSON when you want to do your own decoding instead of buying ours.
For the alt graduation path. Pair with the Pump.fun archive to follow tokens through migration.
Subscribe to the archive
$200/mo for the rolling archive plus monthly snapshots. 30% off at 6 months, 50% off at 12. Custom slot ranges and labeled bundles quoted separately.