Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hey folks! I quit my job at Oracle almost a year ago now to build DataStation. It's an app I've wanted as an engineering manager for years. It's entirely open-source and while I've had a few awesome contributors I'm mostly the only person on it. It has been funded out of contract development and savings.

DataStation helps you query a variety of data sources (conventional SQL like PostgreSQL and MySQL, non-SQL like Prometheus or Elasticsearch), files and HTTP APIs. It is not a SQL layer on top of these various APIs like FDW in Postgres or Apache Calcite.

DataStation just tries to abstract away glue code. So in DataStation for Prometheus you query with PromQL. For Elasticsearch you query with Lucene. And for SQL databases you query with their SQL dialect. But you don't need to remember how to use the appropriate library for your language. You just need your own credentials.

DataStation is made of panels (other apps might call them cells) that each produce a result. Panels can refer to other panels. These allow you to build workflows that cross the boundary of a particular datasource. For example you might have some data in a CSV a product manager gave you and the bulk of your data is in PostgreSQL. In DataStation you could pull in the CSV with a File panel and pull in the Postgres data with a Database panel. Then you can join both panel results in a Code panel using your favorite language like Python, Ruby, R, Node, Julia, etc. You can even script Code panels in a SQLite dialect with a bunch of rich addons (url parsing, best-effort date parsing, statistics aggregation, etc.): https://github.com/multiprocessio/go-sqlite3-stdlib.

You can watch a simple introductory video: https://www.youtube.com/watch?v=q_jRBvbwIzU. Or if you want to see that cross-datasource interaction taken to an extreme, check out this video using Postgres metadata to filter log data in Elasticsearch to do historic request analysis on a subset of customers: https://www.youtube.com/watch?v=tIh99YVHoRE.

DataStation is mainly a desktop app today where the end result is that you export graph SVGs or HTML tables or markdown tables or just a CSV file. All this data stays on your laptop so it's as easy to use in a corporate environment as any existing SQL IDE or Jupyter Notebook.

In the last year it's reached 1.5k stars on Github, over 1000 unique users and currently on-average about 40 fairly active users per month (defined as having opened the app more than a few times).

Since it's only just now 12 months old it's been going through a lot of maturing during this time. If you've tried it before and it was buggy or too slow it's probably worth another try now if you're still interested.

DataStation is primarily an Electron app but the code that evaluates panels is written in Go. The Go evaluation code forms the backbone of another app you may have seen around HN, dsq: https://github.com/multiprocessio/dsq, which is a limited version of DataStation as a CLI for querying files with SQL.

In the future I'd like to see more people using it as a server app where my goal is to support read-only dashboards and recurring exports. That part is still work-in-progress.

You can find a ton of tutorials on how to interact with supported databases on the DataStation website: https://datastation.multiprocess.io/docs/.

Looking forward to your feedback!



This is really cool. Maybe in the future you can make a paid version with a bunch of BI features.

In your opinion, how does it compare to PyCharm (Enterprise version) when it's all blinged out with big data tools and integrations? I recently realized that PyCharm is my Data IDE and not just my Python editor. I only use limited features though, so hard for me to compare the extent of functionalities between the two.

Edit: Well, PyCharm won't let you join two different data sources, so that's one big difference!


> Edit: Well, PyCharm won't let you join two different data sources, so that's one big difference!

Right!

On the other hand, any real code IDE will have high-quality autocomplete, jump-to-definition, all that code IDE stuff. In the future DataStation may be able to hook into tree-sitter or LSP but for now it's more like a textarea with syntax highlighting (although the SQL code panel autocomplete is relatively complete).

Similarly, SQL IDEs have better exploration of your database. DataStation can't tell you about which tables or schemas exist yet (although I want it to in the future).

DataStation competes more directly with Python scripts than with SQL IDEs and code IDEs (although there is of course overlap).


It does look at bit like parts of Tableau's desktop product.


I haven't used Tableau but I have had some people show up in Discord to ask about using DataStation as an alternative. So maybe it is similar, but I don't know.


Overall, this looks great. My only concern the the project file being a SQLite db. I'd really like to have something to (usefully) put in version control.


I did the original version in a JSON backed file but I don't really want to go back to that.

It is not unreasonable to store the sqlite database in a git repo: https://stackoverflow.com/a/5435079/1507139.

I'm not yet sure what the right long-term solution is.


hi ya,

Interesting idea. I like the ability to pull multiple datasets together. The one thing they I am curious about is visualisation ... What graphing abilities doqs this app have?


The visualization is not advanced. It supports basic bar charts, line charts, pie charts and tables. I'd like to make this better over time. If you need more advanced visualization you can export any panel as CSV or JSON and bring it into whatever better visualization tool you have.

The biggest reason to use DataStation right now is that it makes it easy to query data and script the results.


Would be nice to have geographical charting abilities or maybe integrate some python charting libraries so the output of the library would just be the chart maybe? Just an idea.


Any reason for not having a web client?


You can run it as a web server! It's just not as commonly done right now since I haven't put much time into integration with cloud providers (stuff like CloudFormation templates I mean) and I don't yet have a public Docker image that is up to date.

https://datastation.multiprocess.io/docs/0.11.0/DataStation_...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: