vibe coding claude code token cost tracker
Intro
Recently, I decided to give Claude Code (CC) a go. I used the chat version of Claude and the IDE version in Windsurf and Cursor but I’ve yet to use CC. A former co-worker primarily used CC for all his software development work and attested to its superiority compared to other tools. I’m pretty sure he didn’t even use an IDE but coded entirely with CC and the terminal, claiming it was a big step-up from the previous models and other AI tools that he’s used. With that in mind I figured it I’d give it a go.
While I was using other AI tools I was curious about how many tokens were used to implement a feature and the cost of such tokens. The subscription based AI tools didn’t give any clarity about the actual costs. So, I wondered if there was a way to implement a kind of token cost tracker for CC using CC. I decided this would be my first CC project as I’d be able to use it with future CC projects to get an idea of token usage and costs. With such hard data available it’ll be possible to find ways to optimise usage and costs. I’m not 100% sure it’ll be relevant or useful now but for almost anything that costs money there’s a point in the future where those costs want to be optimised.
Discovery
To start I got CC to do some discovery work for me. I asked if CC could determine if it was even possible to track token and cost usage? After a bit of investigation it discovered that it most certainly was possible thanks to the .claude/projects folder. It stores the main conversation of the sessions and in a folder with the session id it stores data on the subagents. The data is stored in the JSON Lines (.jsonl) format. This is useful for reading JSON formatted text one line at a time as, in a valid JSONL file, each new line is a valid JSON value.
There’s a bunch of different data in the sessions file like various ids, session info, the messages sent, etc. I had no idea that CC stored all this information and more locally. Though the important stuff that I wanted to use for the tracker is the usage information which contains the following:
input_tokens: new tokens sent in this request (not cached)cache_creation_input_tokens: tokens written to the prompt cache for future reusecache_read_input_tokens: tokens read from the prompt cache (significantly cheaper than regular input tokens)output_tokens: tokens generated by Claude in the responseservice_tier- API tier used (“standard” vs “batch”)cache_creation: breakdown of cache writes by TTL tier:ephemeral_5m_input_tokens: tokens cached for 5 minutesephemeral_1h_input_tokens: tokens cached for 1 hour
Calculating Costs
This is exactly what I wanted. However, there was no pricing associated with the usage data. Although it wouldn’t accurately represent the costs of using CC’s subscription I decided to use API pricing. At least this would give me a rough idea about the cost of my usage that I could compare to the flat monthly subscription cost and from that I’ll be able to find out how much I’d save vs using the API.
Functionality
It contains all of the basic functionality:
- Sessions table: lists all the sessions with sub-agents as an expandable under the session.
- Daily usage: summary of the daily usage.
- Aggregation statistics: a set of cards that contain relevant usage totals like input token broken down into new and cached, costs broken down into cached and non-cached, output tokens, and session count.
- Subscription vs API costs: the API costs vs the three different subscriptions.
- Exporting: daily usage and sessions exportable as CSV and JSON.
- Filters: project, session title, and period filters. The session title is handy as I can rename a claude coding session and find that in the sessions. If the session is limited to what the title encapsulates I can get an idea about how much a feature costs to make.
- Sync all sessions: a button that will sync all sessions. If the tracker is used for the first time click this button or if you want sync due to some db error.
The UI leaves much to be desired but it displays relevant information in an clear format. I didn’t think too much about it except to fine tune the aggregated stats and expandable sub-agents section.
Gone Full Vibe Code
The only technical part of this project that I explicitly requested was the use of better-sqlite3 as I wanted an SQL db to store data because I’m familiar with SQL and it’s a straightforward way to store data.
Otherwise, everything else is CC’s suggestion. CC decided to use Vite, Vitest, TypeScript, React, etc. Though I’ve heard of Vite and Vitest I’ve never used either of them but CC decided they fit this use case exactly. With all the talk about the potential of advertising in ChatGPT it made me wonder if there was some way to influence the use of certain packages? :thinking_face:
Also, I didn’t review the code in depth. I glanced over it and can see a lot of room for improvement though I wasn’t that interested in doing it at the time. I was primarily interested in see how well CC can code and, I must say, pretty good!
I still gave it a lot of direction though I didn’t want or need to think about the code. The primary reason for this was because it was a greenfield project and I didn’t have to be concerned about sticking to preexisting coding conventions.
It was fun! I enjoyed the back-and-forth interaction with CC and seeing the functionality being implemented. However, if this was an application that needed to be relied upon by paying customers or some critical functionality I wouldn’t feel comfortable shipping to production without reviewing and understanding the code. Nor would I feel comfortable allowing CC to make all the architectural decisions. I would’ve created a more thorough plan and examined it in more detail and understood the pros/cons, risks/benefits, etc.
But I still would’ve used CC. In my previous role the majority of the dev team was using Windsurf while a couple used CC. I would’ve used CC but the company hadn’t switched over to paying for CC yet. The principle dev had set up an automated workflow with a few CC instances in the cloud. He’d send a plan for it during the night and it’d code away. I’d get emails in ungodly hours of the morning with commits and PRs to review from him.
This is a great idea, but it was still reviewed by him and the team! So, I wouldn’t Gone Full Vibe Code in that context. But here, sure, why not?
Minor Issues
A minor thing I noticed was that as the project got more complicated CC was more likely to make a mistake. The most recent feature I added was to delete sessions. It implemented the backend functionality fine but didn’t realise that the column wasn’t showing in the first page of the table but it did in the second page. It thought it was correct. It took a few iterations of prompting for it to figure out what was wrong though it the mean time it reverted displaying the values of another row. I told it to fix it in a way that restored functionality to both of these issues.
However, it was willing to remove accepted functionality to fix an issue that was created by new functionality it created and by a fix to that regression. This is something I noticed when working with Windsurf as well. If I wasn’t familiar with the functionality that I was working with I would’ve missed it.
A way to solve this is to use a library like Cypress or JavaScript Testing Library though these type of tests wouldn’t really cover basic functionality like displaying values in a table’s column. However, snapshot testing might be more appropriate. It captures changes to what is rendered on the frontend though these have gone out of fashion as they are a brittle and noisy way to test for regressions in rendered output. Another option is regression testing platforms like Percy and Chromatic. And, ofcourse, it would be remiss of me to not suggest getting AI to test its own output in the browser with CC’s Chrome plugin! I’ve never used it, so I’m not sure how easily it could be used in this way, but it could be a way to close the loop and give LLM’s the faculty to test end-to-end.
Another thing I noticed that I thought was odd is that it used packages that were like 2 version older than the current version, sometimes older. It could be that it was trained on old data but it can do websearches so why didn’t it get the latest packages. There were no dependencies conflicts between the most recent packages so that also doesnt explain it. I’m not sure why it did that, but I upgraded them as soon as I realised.
Alternatives
An alternative to this tracker is a repo I found that runs langfuse locally: https://github.com/doneyli/claude-code-langfuse-template. I’ve not given it a thoroughly working through but from what I’ve tested it sees to work and is pretty straight forward to setup. Though it is a bit different to what i’m doing and waaaayyyyy more sophisticated, I think it’s worth giving it a go. I’ve connected it to my CC projects and sessions but still haven’t gotten it to calculate the cost of a session.
Final Thoughts
I enjoyed having an idea come up and being able to quickly and easily see it come to life through CC. I’ve already started to plan out my next project with CC, a bit more thoroughly than this time. I feel like ideas are coming up more easily as well because the friction to implement them is so much lower. I am slightly concerned about my lack of technical understanding of this project but it was a conscious choice made on my part and in future projects I can choose otherwise.
Overall, it’s easy and enjoyable to embrace CC though I’m still concerned about my future as a software developer! But only time will tell whether it’s a valid concern or not.