How is this different from the live Pump.fun Enhanced Stream?

The live Enhanced Stream covers events the moment they land. The archive covers everything before that. Most teams subscribe to both: train your model on the archive, deploy it on the live stream, schema is identical. The archive is for backtesting, ML training, leaderboards, and research; the live stream is for production execution.

How fresh is the archive?

Last full UTC day is appended to the rolling bundle by 06:00 UTC the next day. The monthly bundle for the previous calendar month publishes on the first of every month. If you want a single up-to-date download URL, use the rolling bundle; if you want immutable snapshots for reproducibility, use the monthly.

Can I get the archive starting from a custom slot or date range?

Yes. Custom ranges are quoted by the slot count. The standard SKU is the full archive plus monthly increments; custom slot ranges and from-date bundles are common requests for teams that only need the last few months for a specific paper or product launch.

How does this compare to Bitquery or Dune for Pump.fun?

Bitquery has Pump.fun parsed and exposes it via GraphQL. Best when you want flexible JOINs and can pay per query. Dune's Pump.fun coverage is community-curated SQL on Spellbook, great for one-off charts but capped on CSV export. We sell Parquet you can drop on disk and grep through, at a flat per-month price. Different jobs, different tools. Most ML teams end up on Parquet because the training pipeline likes files better than APIs.

How are USD values computed?

Each trade has a SOL amount in lamports. We resolve the SOL/USDC price from the Pyth or Switchboard oracle nearest the trade's slot and write the resulting USD value to a paired column. The price source and the oracle slot are both retained for audit. If you don't trust our USD column, you have everything you need to compute your own.

What's included in the creator-aggregates table?

Pre-computed per-creator statistics so you don't have to GROUP BY 14 million mints to ask “how many tokens has this wallet launched”. Tokens launched, graduations, graduation rate, total SOL routed, total USD volume, time-to-graduation distribution (p50, p90, p99), median time-to-rug. Refreshes monthly with the rest of the archive.

Can your team build a custom Pump.fun-specific dataset?

Yes. Common requests: per-bonding-curve liquidity-time-series, per-token MEV-flow tables, sniper-detection labels with confidence scores, cross-creator-wallet clustering. Quoted as a one-time engineering project on top of the standard archive subscription.

Historical Dataset

The complete Pump.fun history: every create, every trade, every graduation

Pump.fun launched the bonding-curve memecoin era on Solana and is still the highest-traffic application on the chain most weeks. The entire ledger is on chain, but reconstructing it from raw blocks is the kind of project that eats a quarter. We did it. Every Pump.fun create, every trade, every bonding-curve graduation since the program first deployed, parsed against the program IDL, decimal-normalized, USD-stamped against the SOL price oracle nearest each slot, and packaged month by month as Parquet. Drop it into DuckDB and ask anything: who minted the ten thousand shortest-lived tokens, which creator wallet has the highest median-time-to-graduation, what does the survivor curve look like across the bonding-curve threshold, where the MEV traffic actually clusters during launches. The real-time stream covers tomorrow's launches; this dataset covers everything before that.

Genesis-to-now coverageParquet + CSVPer-month bundlesDecimal + USD normalizedPairs with live streamDune/DuckDB-friendly

On-chain programs

Pump.fun (bonding curve program)6EF8rrecthR5Dkzon8Nwu78hRvfCKubJ14M5uBEwF6P, The original Pump.fun program. Mints tokens and runs the on-chain bonding curve to graduation
PumpSwap (post-graduation AMM)pAMMBay6oceH9fJKBRHGP5D4bD4sWpmSwMn52FMfXEA, The AMM tokens settle into after graduation. Joined to the archive so the lifecycle is continuous
Raydium AMM v4 (alt graduation path)675kPX9MHTjS2zt1qfr1NYHuzeLXfQM9H24wFSUt1Mp8, Earlier graduations route here via the migration helper; tracked for backwards compatibility

Tables in the archive

seven Parquet tables covering the full Pump.fun lifecycle from create to post-graduation

Event	Type	Description	Frequency
pumpfun_creates	event	One row per token creation: mint, creator wallet, virtual SOL/token reserves, metadata URI, timestamp, slot.	High
pumpfun_trades	event	One row per buy or sell on the bonding curve. Side, SOL amount, token amount, post-trade reserves, fee, signer.	Very high
pumpfun_graduations	event	Bonding-curve completion events. Includes graduation slot, final reserves, the AMM the token migrated to, and the LP setup.	Medium
pumpfun_creator_aggregates	event	Pre-computed per-creator statistics: tokens launched, median graduation rate, total volume routed, time-to-graduation distribution.	Low
pumpfun_token_lifecycle	event	One row per mint with the full lifecycle joined: create slot, peak market cap, graduation flag, post-graduation AMM, time-to-rug if applicable.	Low
pumpfun_priority_fees	event	Compute-unit price, priority fee, and Jito tip per Pump.fun transaction. The basis for any MEV or block-builder analysis on memecoin flow.	Very high
pumpfun_post_graduation_trades	event	Trades on PumpSwap and Raydium for graduated tokens, joined to the original mint. Useful when the question crosses the bonding-curve threshold.	High

Archive scale

last reviewed 2026-04-29

Coverage window

Genesis to now

From the first slot Pump.fun deployed through last UTC midnight. Every event in between

Verified 2026-04-29

Tokens minted (cumulative)

14M+

Approximate cumulative count across the full archive window

Verified 2026-04-29

Trades indexed

~1.8B

Decoded buy and sell instructions across the full bonding-curve history

Verified 2026-04-29

Graduations

~38,000

Bonding-curve completions across the full archive

Verified 2026-04-29

Compressed bundle size

6-15 GB / month

tar.zst per month, varies with mainnet activity

Why anyone needs a Pump.fun archive

The Pump.fun program runs the bonding curves for most memecoin launches on Solana. It's also the most-imitated source of historical questions in Solana research right now. A few that come up every week: which creator wallets graduate tokens at above-average rates, what's the median time from create to first graduation-bound trade, how does Jito tip distribution shift around launch slots, where do snipers actually cluster relative to the curve mid-point.

You can answer every one of those from on-chain data. You can't answer them from on-chain data fast. The Pump.fun program has produced something like 14 million mints and 1.8 billion trades since launch. Decoding those from raw getBlock JSON is a multi-month project for a small team and a non-trivial RPC bill before you start.

The archive collapses that. Pre-decoded Parquet, monthly bundles, schema documented, USD-stamped, ready for DuckDB or pandas. The first day of work is your model query, not your decoder.

What the tables actually look like

Five core tables, two pre-computed aggregates, one continuity table that follows tokens past graduation. Schemas live in every bundle's manifest.json so you don't need to guess.

Table	Grain	Key columns
pumpfun_creates	1 row per mint	mint, creator_wallet, slot, virtual_sol, virtual_tokens, metadata_uri
pumpfun_trades	1 row per buy/sell	mint, signer, side, sol_amount, token_amount, sol_amount_usd, post_reserves
pumpfun_graduations	1 row per graduation	mint, graduation_slot, target_amm, lp_pubkey, final_reserves
pumpfun_priority_fees	1 row per tx	signature, slot, cu_price, prio_fee, jito_tip, mint
pumpfun_creator_aggregates	1 row per creator wallet	creator_wallet, tokens_launched, graduations, grad_rate, ttg_p50, total_volume_usd
pumpfun_token_lifecycle	1 row per mint	mint, create_slot, peak_mcap_usd, graduated, ttg_minutes, last_trade_slot
pumpfun_post_graduation_trades	1 row per AMM trade	mint, amm, swap_id, signer, amount_in_usd, slot

What teams build with this

ML graduation predictors

Train on pumpfun_creates joined to pumpfun_token_lifecycle. Features include creator history, time-of-day, initial metadata patterns, first-N-buys behavior. Label is graduated (yes/no). Standard binary classifier. The hard part is the data, which is the part we sell.

Sniper-bot backtests

Replay pumpfun_trades on the bonding curve and simulate entry/exit on a candidate strategy. The post-graduation trades table extends the simulation past the AMM transition so the P&L numbers are honest.

Creator leaderboards

Pre-computed in the aggregates table. Plug into a public page or feed it into your due-diligence tool. The same math underlies most “serial creator” signals sold by other vendors at five times the price.

MEV and priority-fee research

pumpfun_priority_fees has cu_price, priority fee, and Jito tip on every transaction. Distribution charts, builder- preference analysis, and pre-launch tip-spike detection all run off this single table.

Rug detection labels

Lifecycle table flags “last trade ever” per mint. Define rug as “peak market cap above X then no trades for Y hours,” label, train. We can also ship pre-labeled bundles on contract with our internal classifier output.

Memecoin journalism and research

“What happened during the X token saga” pieces are usually two queries away in DuckDB. Trades for the mint, joined to creator history, joined to priority-fee spikes. The shape tells the story.

Archive plus live stream is the production pattern

Most production teams run both. The archive is where you backtest, train, and prototype. The real-time Pump.fun stream is where you run inference. Same schema, same field names, same decoding rules, so the model you trained on Parquet works on the gRPC payload without translation.

When a strategy moves from research to production, the only thing that should change is the source of the rows. From read_parquet(...) to a Yellowstone subscribe call. Everything else, the feature extraction, the model, the post-processing, runs unchanged. That's the actual reason we keep historical and live on identical schemas.

The post-graduation table is the bridge for the strategies that span the bonding-curve threshold. Teams that close the position at graduation want one table; teams that hold through migration want both. Both supported in the same bundle.

Pricing and how the archive ships

$200 a month for the rolling archive. That covers every table, refreshed daily, with monthly immutable snapshots published on the first of each month. 30% discount at six months, 50% at twelve. The full historical depth is included from day one of your subscription.

One-off historical pulls (full archive as a one-time download, no recurring updates) are quoted separately. They run higher because the cost is in the parsing, not the delivery, and a one-time customer doesn't amortize the ingest pipeline. Most teams want the recurring tier so the bundle stays current with mainnet.

Custom labels (rug classifications, sniper-detection scores, creator-network clustering) are an engineering project on top of the standard archive. Send us the brief and we'll quote it. The same engineers who ship the live parsed stream do the labeling, so the data quality stays consistent.

Frequently asked questions

Every decoded instruction the Pump.fun bonding-curve program has ever emitted, plus the post-graduation trades on PumpSwap and Raydium for graduated tokens. Tables are split by event family: pumpfun_creates, pumpfun_trades, pumpfun_graduations, plus pre-computed aggregate tables for creators, lifecycle, and priority-fee analysis. All Parquet, all decimal-normalized, all USD-stamped against the SOL/USDC price oracle nearest each slot.

Related products

Pump.fun real-time stream

Subscribe to creates, trades, and graduations as they happen. Same schema as the historical archive.

PumpSwap Enhanced Stream

Post-graduation AMM. Useful when the question crosses the bonding-curve threshold.

Solana trading datasets (catalog)

The full historical-data catalog. Pump.fun is one program; the catalog covers 40+.

Historical blocks archive

Raw getBlock JSON when you want to do your own decoding instead of buying ours.

Raydium real-time stream

For the alt graduation path. Pair with the Pump.fun archive to follow tokens through migration.

Subscribe to the archive

$200/mo for the rolling archive plus monthly snapshots. 30% off at 6 months, 50% off at 12. Custom slot ranges and labeled bundles quoted separately.

See pricing Talk to sales

Historical Dataset

The complete Pump.fun history: every create, every trade, every graduation

Genesis-to-now coverageParquet + CSVPer-month bundlesDecimal + USD normalizedPairs with live streamDune/DuckDB-friendly

On-chain programs

Pump.fun (bonding curve program)6EF8rrecthR5Dkzon8Nwu78hRvfCKubJ14M5uBEwF6P, The original Pump.fun program. Mints tokens and runs the on-chain bonding curve to graduation
PumpSwap (post-graduation AMM)pAMMBay6oceH9fJKBRHGP5D4bD4sWpmSwMn52FMfXEA, The AMM tokens settle into after graduation. Joined to the archive so the lifecycle is continuous
Raydium AMM v4 (alt graduation path)675kPX9MHTjS2zt1qfr1NYHuzeLXfQM9H24wFSUt1Mp8, Earlier graduations route here via the migration helper; tracked for backwards compatibility

Tables in the archive

seven Parquet tables covering the full Pump.fun lifecycle from create to post-graduation

Event	Type	Description	Frequency
pumpfun_creates	event	One row per token creation: mint, creator wallet, virtual SOL/token reserves, metadata URI, timestamp, slot.	High
pumpfun_trades	event	One row per buy or sell on the bonding curve. Side, SOL amount, token amount, post-trade reserves, fee, signer.	Very high
pumpfun_graduations	event	Bonding-curve completion events. Includes graduation slot, final reserves, the AMM the token migrated to, and the LP setup.	Medium
pumpfun_creator_aggregates	event	Pre-computed per-creator statistics: tokens launched, median graduation rate, total volume routed, time-to-graduation distribution.	Low
pumpfun_token_lifecycle	event	One row per mint with the full lifecycle joined: create slot, peak market cap, graduation flag, post-graduation AMM, time-to-rug if applicable.	Low
pumpfun_priority_fees	event	Compute-unit price, priority fee, and Jito tip per Pump.fun transaction. The basis for any MEV or block-builder analysis on memecoin flow.	Very high
pumpfun_post_graduation_trades	event	Trades on PumpSwap and Raydium for graduated tokens, joined to the original mint. Useful when the question crosses the bonding-curve threshold.	High

Archive scale

last reviewed 2026-04-29

Coverage window

Genesis to now

From the first slot Pump.fun deployed through last UTC midnight. Every event in between

Verified 2026-04-29

Tokens minted (cumulative)

14M+

Approximate cumulative count across the full archive window

Verified 2026-04-29

Trades indexed

~1.8B

Decoded buy and sell instructions across the full bonding-curve history

Verified 2026-04-29

Graduations

~38,000

Bonding-curve completions across the full archive

Verified 2026-04-29

Compressed bundle size

6-15 GB / month

tar.zst per month, varies with mainnet activity

Why anyone needs a Pump.fun archive

The archive collapses that. Pre-decoded Parquet, monthly bundles, schema documented, USD-stamped, ready for DuckDB or pandas. The first day of work is your model query, not your decoder.

What the tables actually look like

Five core tables, two pre-computed aggregates, one continuity table that follows tokens past graduation. Schemas live in every bundle's manifest.json so you don't need to guess.

Table	Grain	Key columns
pumpfun_creates	1 row per mint	mint, creator_wallet, slot, virtual_sol, virtual_tokens, metadata_uri
pumpfun_trades	1 row per buy/sell	mint, signer, side, sol_amount, token_amount, sol_amount_usd, post_reserves
pumpfun_graduations	1 row per graduation	mint, graduation_slot, target_amm, lp_pubkey, final_reserves
pumpfun_priority_fees	1 row per tx	signature, slot, cu_price, prio_fee, jito_tip, mint
pumpfun_creator_aggregates	1 row per creator wallet	creator_wallet, tokens_launched, graduations, grad_rate, ttg_p50, total_volume_usd
pumpfun_token_lifecycle	1 row per mint	mint, create_slot, peak_mcap_usd, graduated, ttg_minutes, last_trade_slot
pumpfun_post_graduation_trades	1 row per AMM trade	mint, amm, swap_id, signer, amount_in_usd, slot

What teams build with this

ML graduation predictors

Sniper-bot backtests

Creator leaderboards

MEV and priority-fee research

Rug detection labels

Memecoin journalism and research

“What happened during the X token saga” pieces are usually two queries away in DuckDB. Trades for the mint, joined to creator history, joined to priority-fee spikes. The shape tells the story.

Archive plus live stream is the production pattern

Pricing and how the archive ships

Frequently asked questions

Related products

Pump.fun real-time stream

Subscribe to creates, trades, and graduations as they happen. Same schema as the historical archive.

PumpSwap Enhanced Stream

Post-graduation AMM. Useful when the question crosses the bonding-curve threshold.

Solana trading datasets (catalog)

The full historical-data catalog. Pump.fun is one program; the catalog covers 40+.

Historical blocks archive

Raw getBlock JSON when you want to do your own decoding instead of buying ours.

Raydium real-time stream

For the alt graduation path. Pair with the Pump.fun archive to follow tokens through migration.

Subscribe to the archive

$200/mo for the rolling archive plus monthly snapshots. 30% off at 6 months, 50% off at 12. Custom slot ranges and labeled bundles quoted separately.

See pricing Talk to sales

The complete Pump.fun history: every create, every trade, every graduation

Tables in the archive

Archive scale

Why anyone needs a Pump.fun archive

What the tables actually look like

What teams build with this

ML graduation predictors

Sniper-bot backtests

Creator leaderboards

MEV and priority-fee research

Rug detection labels

Memecoin journalism and research

Archive plus live stream is the production pattern

Pricing and how the archive ships

Frequently asked questions

Related products

Subscribe to the archive

Ready to get started?

The complete Pump.fun history: every create, every trade, every graduation

Tables in the archive

Archive scale

Why anyone needs a Pump.fun archive

What the tables actually look like

What teams build with this

ML graduation predictors

Sniper-bot backtests

Creator leaderboards

MEV and priority-fee research

Rug detection labels

Memecoin journalism and research

Archive plus live stream is the production pattern

Pricing and how the archive ships

Frequently asked questions

Related products

Subscribe to the archive

Ready to get started?