Historical Dataset

The complete Pump.fun history: every create, every trade, every graduation

Pump.fun launched the bonding-curve memecoin era on Solana and is still the highest-traffic application on the chain most weeks. The entire ledger is on chain, but reconstructing it from raw blocks is the kind of project that eats a quarter. We did it. Every Pump.fun create, every trade, every bonding-curve graduation since the program first deployed, parsed against the program IDL, decimal-normalized, USD-stamped against the SOL price oracle nearest each slot, and packaged month by month as Parquet. Drop it into DuckDB and ask anything: who minted the ten thousand shortest-lived tokens, which creator wallet has the highest median-time-to-graduation, what does the survivor curve look like across the bonding-curve threshold, where the MEV traffic actually clusters during launches. The real-time stream covers tomorrow's launches; this dataset covers everything before that.

Genesis-to-now coverageParquet + CSVPer-month bundlesDecimal + USD normalizedPairs with live streamDune/DuckDB-friendly
On-chain programs

Tables in the archive

seven Parquet tables covering the full Pump.fun lifecycle from create to post-graduation

EventTypeDescriptionFrequencyLatency
pumpfun_createseventOne row per token creation: mint, creator wallet, virtual SOL/token reserves, metadata URI, timestamp, slot.High
pumpfun_tradeseventOne row per buy or sell on the bonding curve. Side, SOL amount, token amount, post-trade reserves, fee, signer.Very high
pumpfun_graduationseventBonding-curve completion events. Includes graduation slot, final reserves, the AMM the token migrated to, and the LP setup.Medium
pumpfun_creator_aggregateseventPre-computed per-creator statistics: tokens launched, median graduation rate, total volume routed, time-to-graduation distribution.Low
pumpfun_token_lifecycleeventOne row per mint with the full lifecycle joined: create slot, peak market cap, graduation flag, post-graduation AMM, time-to-rug if applicable.Low
pumpfun_priority_feeseventCompute-unit price, priority fee, and Jito tip per Pump.fun transaction. The basis for any MEV or block-builder analysis on memecoin flow.Very high
pumpfun_post_graduation_tradeseventTrades on PumpSwap and Raydium for graduated tokens, joined to the original mint. Useful when the question crosses the bonding-curve threshold.High

Archive scale

last reviewed 2026-04-29

Coverage window
Genesis to now
From the first slot Pump.fun deployed through last UTC midnight. Every event in between
Verified 2026-04-29
Tokens minted (cumulative)
14M+
Approximate cumulative count across the full archive window
Verified 2026-04-29
Trades indexed
~1.8B
Decoded buy and sell instructions across the full bonding-curve history
Verified 2026-04-29
Graduations
~38,000
Bonding-curve completions across the full archive
Verified 2026-04-29
Compressed bundle size
6-15 GB / month
tar.zst per month, varies with mainnet activity

Why anyone needs a Pump.fun archive

The Pump.fun program runs the bonding curves for most memecoin launches on Solana. It's also the most-imitated source of historical questions in Solana research right now. A few that come up every week: which creator wallets graduate tokens at above-average rates, what's the median time from create to first graduation-bound trade, how does Jito tip distribution shift around launch slots, where do snipers actually cluster relative to the curve mid-point.

You can answer every one of those from on-chain data. You can't answer them from on-chain data fast. The Pump.fun program has produced something like 14 million mints and 1.8 billion trades since launch. Decoding those from raw getBlock JSON is a multi-month project for a small team and a non-trivial RPC bill before you start.

The archive collapses that. Pre-decoded Parquet, monthly bundles, schema documented, USD-stamped, ready for DuckDB or pandas. The first day of work is your model query, not your decoder.

What the tables actually look like

Five core tables, two pre-computed aggregates, one continuity table that follows tokens past graduation. Schemas live in every bundle's manifest.json so you don't need to guess.

TableGrainKey columns
pumpfun_creates1 row per mintmint, creator_wallet, slot, virtual_sol, virtual_tokens, metadata_uri
pumpfun_trades1 row per buy/sellmint, signer, side, sol_amount, token_amount, sol_amount_usd, post_reserves
pumpfun_graduations1 row per graduationmint, graduation_slot, target_amm, lp_pubkey, final_reserves
pumpfun_priority_fees1 row per txsignature, slot, cu_price, prio_fee, jito_tip, mint
pumpfun_creator_aggregates1 row per creator walletcreator_wallet, tokens_launched, graduations, grad_rate, ttg_p50, total_volume_usd
pumpfun_token_lifecycle1 row per mintmint, create_slot, peak_mcap_usd, graduated, ttg_minutes, last_trade_slot
pumpfun_post_graduation_trades1 row per AMM trademint, amm, swap_id, signer, amount_in_usd, slot

What teams build with this

ML graduation predictors

Train on pumpfun_creates joined to pumpfun_token_lifecycle. Features include creator history, time-of-day, initial metadata patterns, first-N-buys behavior. Label is graduated (yes/no). Standard binary classifier. The hard part is the data, which is the part we sell.

Sniper-bot backtests

Replay pumpfun_trades on the bonding curve and simulate entry/exit on a candidate strategy. The post-graduation trades table extends the simulation past the AMM transition so the P&L numbers are honest.

Creator leaderboards

Pre-computed in the aggregates table. Plug into a public page or feed it into your due-diligence tool. The same math underlies most “serial creator” signals sold by other vendors at five times the price.

MEV and priority-fee research

pumpfun_priority_fees has cu_price, priority fee, and Jito tip on every transaction. Distribution charts, builder- preference analysis, and pre-launch tip-spike detection all run off this single table.

Rug detection labels

Lifecycle table flags “last trade ever” per mint. Define rug as “peak market cap above X then no trades for Y hours,” label, train. We can also ship pre-labeled bundles on contract with our internal classifier output.

Memecoin journalism and research

“What happened during the X token saga” pieces are usually two queries away in DuckDB. Trades for the mint, joined to creator history, joined to priority-fee spikes. The shape tells the story.

Archive plus live stream is the production pattern

Most production teams run both. The archive is where you backtest, train, and prototype. The real-time Pump.fun stream is where you run inference. Same schema, same field names, same decoding rules, so the model you trained on Parquet works on the gRPC payload without translation.

When a strategy moves from research to production, the only thing that should change is the source of the rows. From read_parquet(...) to a Yellowstone subscribe call. Everything else, the feature extraction, the model, the post-processing, runs unchanged. That's the actual reason we keep historical and live on identical schemas.

The post-graduation table is the bridge for the strategies that span the bonding-curve threshold. Teams that close the position at graduation want one table; teams that hold through migration want both. Both supported in the same bundle.

Pricing and how the archive ships

$200 a month for the rolling archive. That covers every table, refreshed daily, with monthly immutable snapshots published on the first of each month. 30% discount at six months, 50% at twelve. The full historical depth is included from day one of your subscription.

One-off historical pulls (full archive as a one-time download, no recurring updates) are quoted separately. They run higher because the cost is in the parsing, not the delivery, and a one-time customer doesn't amortize the ingest pipeline. Most teams want the recurring tier so the bundle stays current with mainnet.

Custom labels (rug classifications, sniper-detection scores, creator-network clustering) are an engineering project on top of the standard archive. Send us the brief and we'll quote it. The same engineers who ship the live parsed stream do the labeling, so the data quality stays consistent.

Frequently asked questions

Every decoded instruction the Pump.fun bonding-curve program has ever emitted, plus the post-graduation trades on PumpSwap and Raydium for graduated tokens. Tables are split by event family: pumpfun_creates, pumpfun_trades, pumpfun_graduations, plus pre-computed aggregate tables for creators, lifecycle, and priority-fee analysis. All Parquet, all decimal-normalized, all USD-stamped against the SOL/USDC price oracle nearest each slot.

Subscribe to the archive

$200/mo for the rolling archive plus monthly snapshots. 30% off at 6 months, 50% off at 12. Custom slot ranges and labeled bundles quoted separately.

Ready to get started?

Get your free API key and start building in under 30 seconds.

Talk to Sales