Why feeding raw OHLCV candles to an LLM is expensive

Q: Why do LLMs miscount candles?

Long flat arrays of numbers are exactly the input LLMs are worst at. They count bars by approximation, not by iterating an index, so they miscount how many candles there are, misread which row is the latest close, and sometimes invent indicator values like an RSI that was never in the data. Sending interpreted state instead of raw numbers removes the arithmetic step where those errors happen.

Q: Can I still get the raw numbers?

Yes. A patternfetch brief still includes the candles compactly in codec.rows, and POST /v1/candles returns the compact candle codec on its own. You get the digested state for reasoning and the raw rows when you need to compute something yourself — without paying full-array token cost on every turn.

Q: Is this investment advice?

No. patternfetch provides impersonal market data and algorithmic signals for informational purposes only. It is not investment, financial, legal or tax advice, not personalized, and not a recommendation to buy, sell or hold any asset.

TL;DR.

A raw OHLCV array is hundreds of numbers per ticker — a 200-candle 4h window is 1,000+ numbers, which serializes to several thousand tokens. That array is re-sent every turn the agent reasons, and multiplied by every ticker it watches. Worse: the model still has to re-derive the structure (patterns, levels, indicators) itself — by doing arithmetic on a long flat list of numbers, which is precisely where LLMs hallucinate (miscounted bars, misread closes, invented RSI values). The fix: precompute the structure server-side and send a token-compact brief — a few hundred tokens of interpreted state plus a one-line summary the agent can act on, with the raw candles still available compactly when you actually need them.

Where the tokens go

One OHLCV candle is six numbers: a timestamp and the open, high, low, close, and volume. That's the irreducible unit. Now scale it:

N candles × 6 numbers. A 200-candle window is ~1,200 numbers before any formatting.
JSON overhead. Wrap each candle as {"t":...,"o":...,"h":...,"l":...,"c":...,"v":...} and you add field names, braces, colons and commas to every single row. Tokenizers split long decimals like 60480.17 into multiple tokens, so a "number" is rarely one token.
Every turn. In an agent loop the context is re-sent on each step. If the candle array sits in context, you pay for it again on turn 2, turn 3, turn 4 — it doesn't amortize.
Every ticker. Watching 10 symbols means 10 arrays in context at once.

Back-of-envelope: a 200-candle window serialized as conventional JSON lands in the low thousands of tokens for a single ticker on a single turn. A digested brief covering that same window — patterns, levels, regime, interpreted indicators, and a summary line — is a few hundred tokens. Multiply both by turns and tickers and the gap is the difference between a context window you can afford to keep open and one that blows your budget.

Exact token counts depend on the tokenizer, decimal precision, and whether you use array-of-arrays vs array-of-objects — but the order of magnitude (thousands vs hundreds) holds across models.

Why it's also error-prone

Cost is only half the problem. Long flat arrays of numbers are the worst-case input for a language model. LLMs don't iterate an index and count; they pattern-match and approximate. On a raw candle array that means:

Miscounted bars. Ask "how many candles since the swing low?" and the model estimates rather than counts — off by a handful is common on 200-row arrays.
Misread latest close. The most decision-relevant value (the last close) is buried at the end of a long list; models routinely grab a nearby row instead.
Invented indicator values. Ask for "the RSI" of a raw array and the model will often produce a number that looks plausible but was never computed from the data — a confident hallucination.

Interpreted state sidesteps all of this. If the structure is computed deterministically server-side and handed to the model as labels and a few key numbers — "rsi":{"v":58.3,"state":"neutral"}, "trend":"up", "double_bottom" (conf 0.86) — there is no long-array arithmetic left for the model to get wrong. It reasons over a clean, small, already-correct summary.

The token-compact alternative

One /v1/brief call to patternfetch returns the digested market state for a ticker and timeframe: detected chart & candlestick patterns, support/resistance levels, a regime label, interpreted RSI/EMA/ATR (value + state, not a raw series), and a one-line nl summary the agent can act on directly. It's a few hundred tokens — and the raw candles are still there, compactly, in codec.rows, for when you genuinely need to compute something yourself.

Before — raw OHLCV (truncated)

{ "candles": [
  [1718000000000,60125.4,60480.0,59890.1,60310.7,1284.3],
  [1718014400000,60310.7,60790.2,60180.0,60655.9,1102.8],
  [1718028800000,60655.9,60980.4,60410.3,60540.1, 998.1],
  [1718043200000,60540.1,60600.0,59980.7,60120.4,1411.6],
  … 196 more rows …
] }
# ~1,200 numbers · re-sent every turn · the model
# still has to find the close, count bars, derive RSI

After — token-compact brief

{
  "header": { "sym":"BTC/USDT","tf":"4h","n":200 },
  "codec": { "rows":"60125.4,60480,59890.1,60310.7,1284;…",
             "sax":"dcefdcbe", "precision":1 },
  "analysis": {
    "patterns":[{"name":"double_bottom","confidence":0.86}],
    "levels":{ "support":[{"price":59820.4}],
               "resistance":[{"price":63450.8}] },
    "regime":{ "trend":"up","strength":0.42,"volPct":2.13 },
    "indicators":{ "rsi":{"v":58.3,"state":"neutral"},
                   "ema":{"v":61240.8,"state":"above_20_50"} },
    "nl":"BTC/USDT: uptrend (moderate), +1.94% last 4h,
          RSI 58.3 (neutral), double_bottom (conf 0.86)."
  }
}
# few hundred tokens · raw rows still in codec.rows

The nl line is a ready-to-reason summary, so the model never has to do arithmetic on raw numbers. The codec.rows field keeps the actual candles available in a compact, comma-packed form if you do need them. See it live →

How to use it

POST a ticker and timeframe to /v1/brief:

curl -X POST https://patternfetch.com/v1/brief \
  -H "Authorization: Bearer pf_…" \
  -H "Content-Type: application/json" \
  -d '{ "ticker": "BTC/USDT", "timeframe": "4h" }'

Free to start — no card:

① No-signup demo — POST /v1/demo returns a real brief with no key. ② Free MCP discovery — agents can tools/list with no key. ③ $0.01 per brief with a free key ($0.05 starter credit), one-click OAuth for MCP (Smithery, Claude.ai, Cursor, Claude Desktop), and pay-per-call via x402 (USDC, no account) or Stripe. Pricing →

FAQ

How many tokens does a candle array use?

One candle is 6 numbers (timestamp, open, high, low, close, volume). A 200-candle window is ~1,200 numbers; serialized as JSON with field names, brackets and commas — and with long decimals split across multiple tokens — that lands in the low thousands of tokens per ticker, re-sent every turn and multiplied by every ticker. A digested brief over the same window is a few hundred tokens.

Why do LLMs miscount candles?

Because long flat number arrays are the input they're worst at. Models approximate rather than iterate an index, so they miscount bars, misread which row is the latest close, and sometimes invent indicator values (like an RSI) that were never in the data. Sending interpreted state instead of raw numbers removes the arithmetic step where those errors happen.

Can I still get the raw numbers?

Yes. A brief still includes the candles compactly in codec.rows, and POST /v1/candles returns the compact candle codec on its own. You get digested state for reasoning and the raw rows when you need to compute something yourself — without paying full-array token cost on every turn.

Is this investment advice?

No. It's impersonal market data and algorithmic signals, for informational purposes only — not advice, not personalized, non-executing. See the disclaimer.

Keep reading