Most times a julialang.org blog post is posted to HN, I wonder whether the choice of content is well chosen to spread awareness of and interest in Julia.
I write this as a huge Julia fan; I use Julia daily, and it is both my favorite language and the language I know best. So I already think Julia is great. But reading many Julia blogs, especially those from julialang.org, would make me think Julia is only useful for very narrow scientific applications if I didn't already know better.
I like Julia because it's extremely fast and extremely expressive - and I don't just mean "expressive" as in "can be written like a dynamic/scripting language," though it can. I mostly mean that the combination of its type system and multiple dispatch allows some really elegant abstractions.
> Most times a julialang.org blog post is posted to HN, I wonder whether the choice of content is well chosen to spread awareness of and interest in Julia.
That is the area of expertise by the creators, but that puts it on the rest of us Julia fans to promote Julia as a general purpose language and talk about how we use it. I am not a scientist but I find Julia very useful for all sorts of practical stuff. E.g. people seldom talk about how great Julia is for writing shell scripts.
...compared to Python, under some circumstances, disregarding startup time.
Don't oversell it. Don't confuse what Julia aspires to be with what it is, or you'll just turn people off when they feel they've been misled. There are extremely fast ahead-of-time-compiled languages that would have finished their computation while Julia is still JIT-ing its kernel.
Julia startup time is 0.5 seconds on a very weak laptop. That's comparable to the JVM, which is nevertheless fairly popular. You may not want to write command line tools in Julia, but there's not much else that's a serious problem since process startup time is rarely that big of a performance issue.
Disregarding JIT time, Julia is as fast as compiled languages – we benchmark against C and Fortran and don't grade on a curve. This is not an aspiration, it's a well-established performance characteristic in both benchmarks [1] and real-world applications [2].
Regarding JIT time, it's important to understand that this is a fixed cost – for any computation that takes long enough to matter, the time taken by JIT will be a negligible fraction of the time. The reason to exclude JIT time when benchmarking is because most benchmarks don't take much time – tens of milliseconds, compared to which JIT is significant. But we don't actually care about the speed of things that only take tens of milliseconds – we're using benchmarks to extrapolate to things that take much longer, which is why you exclude JIT.
When I last used it, the time to import even two modules was many seconds. I know that's JIT time, but that breaks the flow of developing with Julia like it's a dynamic language. And the JVM is popular, yes, but nobody would call it "extremely fast", particularly for interactive use, because of its startup time.
The time to parse the data I needed was many minutes, slower than Python. Restarting a Jupyter kernel would break my flow every time.
The benchmarks are all for numerical computing, where the hard work is typically offloaded to a BLAS library anyway, and don't involve strings or dictionaries. Let's see a JSON parser in there and not just a Mandelbrot set.
> The benchmarks are all for numerical computing, where the hard work is typically offloaded to a BLAS library anyway, and don't involve strings or dictionaries.
I built an in-memory relational query compiler in Julia which keeps up with Postgres on the Join Order Benchmark.
Last I checked there were still some limitations around stack-allocation of types containing pointers, which is occasionally painful, but other than that it gave me all the tools I could possibly want.
The startup time is annoying, but I typically use Julia from Juno and I restart at most a couple of times per week. Ctrl-shift-enter recompiles the current module, which is usually all I want.
I also stopped using after 0.3 and tried it recently again to speed up some NLP code in Python3 and it hasn't been faster than Python3. Back then I didn't like the poor support for pyjulia, slow string processing, people using unicode symbols as variables, module import system, lack of libraries for more obscure NLP algos, and startup time. I know these aren't major issues for the core users of Julia, but these were my concerns.
The way I wrote the current code in Python was abusing sets and dicts a lot to take advantage of those fast data structs. Rewriting in Julia was fun because it was different and because multiple dispatch is fun.
However, it was roughly the same speed as Python. Ended up sprinkling some Cython on top of Python and resulted in 10x speedup. Didn't take much time to add types/pass pointers instead of strings to functions. I am not at all familiar with C++/Cython.
I think even if they speed strings/dicts up by a lot, there seem to be lots of breaking changes between releases so I wouldn't try it for something big.
I think if Julia is to succeed in the near future in the same way Python is successful for data science, it needs to be more usable for general tasks. Things like web servers, fast JSON parsing, maybe static binaries or easy parallelism. Basically, some more selling points. So far, for me personally Python is faster and easier to read for most of the things I write. At least given comparable amount of work.
A LOT has happened since then. For such a young language which hasn't even reached version 1.0 yet you can expect a lot of changes still.
You should seriously try out Julia again. I think it really works well now. I used to have a lot of issues with it before. Stuff would break, slow startup times etc. Today I don't really have any complaints.
Ok I got a few small ones. The limits on the ability to redefine stuff as you develop in the REPL can be a bit cumbersome.
0.3 was quite some time ago – string performance was certainly a problem back then (an acknowledged one). String performance on 0.6-alpha, released last week, is very good. It should now be at least as good as scripting languages that have traditionally focused on strings (Python, Ruby, Perl). There are a few more tricks we could do in the future that could make this even faster, but they involve a fair amount of GC trickery, so we'll have to see.
I really like the language, and I've recently used Julia in a scientific computing project, but the experience leaves much to be desired (and I'm certain it will be improved).
Startup time isn't the biggest problem, although it is nowhere near "comparable to the JVM". The JVM will start cold, load a Hello World application from a JAR, run it and shut down in 60-80ms. A Julia Hello World app would take at least a second, even when precompiled into a native executable.
The biggest problem, IMO, is packaging and deployment. I used BuildExecutable, which works OK most of the time (although startup times are still slow), but it results in so many shared libraries, with no easy way to cull them. Without BuildExecutable, I couldn't even find documentation on how to deploy a Julia app as a self-contained program (even not precompiled, but in a way that doesn't require Julia to be installed on the user's machine).
My setup is dead simple. I'm on Windows, so I use Notepad++ with syntax highlighting as my editor and do a lot via the REPL. I just use regular Julia, not JuliaPro. Installing packages hasn't been a problem for me, just Pkg.add("PackageYouNeed"). I've had very few problems with package setup in the past and none recently.
On a 4-core CPU I have the environment variable JULIA_NUM_THREADS set to 3 so that when Julia is doing threaded worked there's still a core free for web browsing.
One potential gotcha: the Julia docs will mention that you can access shell commands from the REPL by starting your line with a semicolon, e.g. ;ls. For me this only works if you start Julia from something like Git Bash, not cmd.
Lots of people seem to like Juno (junolab.org) as an IDE. The team behind it has made incredible progress recently, and sometimes I use it for its GUI around the debugger, but for the most part I tend to stick to Notepad++.
So, sorry to change the subject away from pi, but how does Julia do on the code-gen end? Are we talking SIMD type optimizations? Are there hooks in the language for tuning hot-code?
I'm curious because I'm trying to find an example of a "non-scalar" programming language, a la this attempt here[1].
Julia uses LLVM to do code-gen so whatever optimizations are available in LLVM are available to Julia. If the auto-vectorizer in LLVM is not kicking in, it is possible to write explicit SIMD with the https://github.com/eschnett/SIMD.jl package.
I have always been skeptical about 'scientific' languages. Why do you need a special language when any general language + some libraries will do? This is a good example of a feature that only really makes sense in a scientific language.
Numerical programming takes more expressive power than one might imagine. In most languages, numerical primitives like integers and floating-point numbers, and numerical operators like `+` and `[]` (array indexing), are very special and are endowed with enough magic to be usable. E.g. in C, `+` is too polymorphic to be defined as a function; in Python, `+` has special `__radd__` methods to (hackily) emulate multiple dispatch; in Java `int`, `float` and `double` are entirely different kinds of values (non-objects) from normal user-definable objects. In many ways, the fundamental premise of Julia is to design a language with sufficient power and performance that numbers are not special: "primitive" types like `Int`, `Float64` are just defined in normal Julia code, and operators like `+` and `[]` are normal Julia functions like any other.
Julia is my language of choice for writing compilers.
Quasiquoting means that codegen is as easy as string interpolation is in other languages.
quote
let
$(index_inits...)
$(results_inits...)
if $(reduce((a,b) -> :($a && $b), true, index_checks))
let
$(var_inits...) # declare vars local in here so they can't shadow relation names
$body
end
end
tuple($(results...))
end
end
Great introspection into the inference and compilation pipeline, directly from the repl.
It catches type errors early, thanks to the typed multiple dispatch.
julia> xs = []
0-element Array{Any,1}
julia> push!(xs, 42)
1-element Array{Any,1}:
42
julia> push!(xs, "foo")
2-element Array{Any,1}:
42
"foo"
julia> ys = Int64[]
0-element Array{Int64,1}
julia> push!(ys, 42)
1-element Array{Int64,1}:
42
julia> push!(ys, "foo")
ERROR: MethodError: `convert` has no method matching convert(::Type{Int64}, ::ASCIIString)
This may have arisen from a call to the constructor Int64(...),
since type constructors fall back to convert methods.
Closest candidates are:
call{T}(::Type{T}, ::Any)
convert(::Type{Int64}, ::Int8)
convert(::Type{Int64}, ::UInt8)
...
in push! at ./array.jl:432
Plus, I only have to think in one language, but I can write sloppy dynamic heap-allocating-everywhere code in the compiler and with just a bit of thinking emit zero-allocation statically-dispatched code in the output.
I would say there's something magical about the way that Julia is structured that makes it more than just a scientific language. I wrote a DSL for Julia for writing clean combinatorial verilog (haven't tackled sequential yet) in 3 days, using lispish macros. It took another 1 day to hook it up to 'verilator' which transpiles the verilog to C and lets you crosscheck it by loading it back up in Julia.
This is very important if you're planning to build hardware to do specific math - because Julia is incredibly good at mathematical modeling, and you can very rapidly set up comprehensive tests for your hardware designs with confidence.
The closest alternative is chisel, which is written in scala. Although it's more professionally maintained and more fully-featured, it's hard to call the verilog chisel emits "human-readable", and it's harder to set up comprehensive tests - berkeley hardfloat, which is a very impressive project in chisel, had several critical bugs in its implementation (that I found, using julia).
That's really nice. Now we need VHDL.jl as well - maybe different HDL backends could be bolted on? Julia is much better suited for this than Scala due to Julia's macro system.
I'm not sure if "special language" is a particularly apt description of what Julia is. SQL is a "special" language.
At any rate - a lisp-like language with llvm backend, package manager, and a (small, but not trivial) group of users writing real software - what's not to like?
I (still) recommend the talk: "Julia - to lisp or not to lisp?", that gives a quick overview of some of the design choices:
For me, I like that it has a real story trying to balance "actual integer math" (not this silly machine constrained twos-compliment hack" and "you could conceivably write a ray tracer that wasn't unusably slow" :)
As a scientist, it's nice to have an option where I can easily espress my problem, and get a speedy solution at the end. Julia hits this sweet spot.
I translated some of my code from one of those general purpose languages to Julia 0.2, years ago. It was both more readable, and 20x faster. Never looked back.
In R, I can import a CSV, plot a histogram of each column, and fit a linear regression of one column against the others in about 5 minutes and 15 lines of code.
In Python, I can do the same thing if I install the Pandas and Statsmodels libraries first.
Try that in Ruby, Perl, C++, C, Java, Rust, Haskell, Common Lisp, or just about any language you can think of. Good luck.
It's easy to forget that R inhabits a gray area between a full-fledged programming language and a "statistics package" like SAS or SPSS or Stata or GRETL.
Languages like Julia, R and Matlab offer a whole lot in the way of built-in syntax, data structures, functions and constants for scientific computing that you would have to get from libraries in general purpose languages. But unless those languages have the kind of libraries that Python offers, such as Numpy and Pandas, you're not going to have that kind of support.
In addition, scientific languages will have libraries mostly in that domain. You can't beat R's library when it comes to statistics. They will also have good plotting libraries.
What you also get with Julia and Fortran is code designed to be optimized for numerical computing. Python attempts to offer this kind of performance via Numpy, which is Python wrapper on C or Fortran BLAS library. Or by JIT compiling with Numba, which is something Julia does automatically the first time you call a function.
One of the big wins is being REPL/workbook-focused, rather than REPL/workbook as an afterthought.
When I'm building software, I tend to like things to have very tight interfaces -- box everything up into components, understand how they talk to each other, etc. Define interfaces, implementations, types, etc -- In general, optimizing for long-term maintainability.
When I'm exploring data, my thought process is much more "scatter everything about on the desk" and "let me run these 5 lines of code again within the current context". "What does this thing look like", etc. Having "a table of data" as a first-class citizen in the language, with all the libraries assuming that as input and everything optimized to work around / display / visualize such is incredibly useful.
I've done mathematical programming in both C++ and MATLAB. As much as I hate MATLAB as a language, it's way faster and easier to prototype things. There's a huge library of vetted functions for scientific things I'm interested in, and having matrices built into the language is great. MATLAB has tons of warts though, so I'm looking forward to switching to Julia this summer.
One of the biggest warts is the casual elision of vectors and nx1 matrices... (and scalars and 1-arrays and 1x1-matrices). Julia wrestled with this in its early days, but I think the way it does things now is quite nice.
Python Panda default to Null and before that it uses something else to represent missing data. Null value in general is the last thing you do when you don't know how to represent a certain type. If you have a type language with sum type and pattern matching then you don't even need null. And yes R have its own Null type so NA is separate from it.
Erlang have PID as a primitive.
I think any domain specific languages are smaller in rules and syntax, and it makes it very very easy to learn for experts and people of those domains.
This arbitrary-accuracy treatment of pi is cool, but I'd be curious to see an application where greater precision of pi than a float64 is actually useful...
Genuine question, As a professional developer, why would you restrict yourself to such technology?
what we should be asking here is why? give me one scenario where this language is more appropriate than the plethora of other domain specific languages. would you bet your company on such language when there is more mature languages already ?
Are you asking "why do we use scientific languages"? If that's your question, the answer is the same reason that you don't write a webserver in assembly.
If you're asking "why Julia versus other languages" it's that, well, Julia is fighting to answer that question for itself. As far as I can tell:
- Versus R and Octave: performance, coherent syntax, and more features for writing "programs" instead of just "scripts"
- Versus Python + the Scipy stack: its scientific features are built into the language (instead of being an awkward layer on top of it)
- Versus any proprietary platform (SAS, Matlab, etc): it's open source and free-as-in-beer, and therefore not confined to legacy/enterprise applications
I'm a data scientist and I currently use R and Python. I've been wanting to give Julia a try for months, and now that the ecosystem is starting to mature (plotting and data frames are must-haves for me) it's making more sense to spend some time with the language.
edit: looked up "call R from Julia", it looks like there's a n "RJulia" library for this. Assuming there's a Python equivalent? How does this compare to, say, using RPy2 in Python (which is nice but kind of annoying)?
"Why do we use scientific languages?" is very true.
Applause to the Julia contributors for their work on this innovative language with great out of the box support for modern computer chip architectures. However, they have fibbed to build up momentum, particularly their performance benchmarks. The tests are of compiled Julia with OpenBLAS for the benchmarks against out of the box versions of the other languages. Also, the benchmark code in other languages is with a style that is among the slowest implementations for each language. No seasoned programmer in any of those languages would write code in such a way.
It does seem that there is co-ordination to get Julia posts most attention, timing and upvoting.
Nonetheless, credit where it's due. It should become an awesome language, marketing hacks notwithstanding.
We use whatever BLAS is linked to in a commonly available official distribution. Julia was one of the first to take this seriously and bundle a high performance BLAS as the default - and I think more projects are following our lead and doing the same. Also, only one benchmark actually uses BLAS.
As for co-ordination on Julia posts - there is none. We submit all our blog posts to HN, and while some do reach the front page, many others do not.
It would be interesting to see what happens if each language is compiled to the benchmark server, linked to the same BLAS and an expert implementation of the tests in each language was allowed; a scientifically valuable experiment. Just to remove all doubt about the relativities of performance :-)
You're right, using a doubly-recursive algorithm [1] for `fib` is a terribly naive and uncharacteristic way to write it in any language, including Julia. But it's a wonderful proxy for the cost of a function call. It's also quite scientific — there's an absolute truth for the correctness of an implementation. All languages must use a doubly-recursive scheme.
It all depends on what you want to measure. The whole point of the micro-benchmark suite is to test very specific language primitives. I'd argue that the current set of benchmarks are more valuable for that than an "expert" implementation would be — that may end up simply testing the cleverness or resourcefulness of the expert.
The issue of primitive performance seems to be in the background of how algorithms are implemented, which BLAS is running, compilation to server architecture, etc. One might measure the performance of 'very specific language primitives' directly for those language primitives. Stripping out confounding factors feels fundamental.
Of course, such a benchmark it might not have the same marketing hue as claiming that Julia is 553 times faster than Matlab at parsing an integer, for example.
Genuine question, why do you assume it is a restriction? I am a professional developer and not a researcher or scientist which is the original target for Julia. Yet I find Julia to be my favorite language, and I've tried a lot of them. I can't think of any language which I find as expressive as Julia apart from Haskell, but that is exceedingly cumbersome to deal with given the strict type system. Haskell is beautiful but it is overly academic and requires a lot of investment in time and brainpower to work in your favor. Julia on the other hand is quite quick to learn and gets you productive quickly.
I'll mention multiple areas I think Julia excels. If you want to write quickly high performance numerical software then I don't see what the alternatives are.
It is the best language I've encountered for writing shell scripts. Bash is terrible due to the difficulty of factoring the code into functions and the frequency in which you forget to quote variables properly. Many end up thus using Python or Ruby. Ruby seems very popular, but Julia really works better. It has tighter integration with the shell and handles chaining, reading and writing to shell commands much nicer.
It also has all sorts of useful stuff ready to use out of the box and sane function names etc. I find it much more cumbersome to write shell scripts with python. Awkward to call processes. Got to always remember what every little module you got to import. Because of multiple dispatch, Julia allows much better naming of functions.
And as a computer language geek I love powerful LISP macros, but I can't get used to LISP syntax. Julia is the only language I know of which gives access to powerful macros and code generation similar to LISP.
I'm not sure about their implementation of pi in particular. To me it seems elegant, and as someone who has needed arbitrary precision arithmetic, I can see the appeal for how they are handling pi, and appreciate this attention to detail.
As for your broader question, Julia definitely fills a gap that I have been pained by for years. I've used R since it was in beta, and it is slow, which is a pain when you are discussing numerical needs. Yes, you can program in something like C/C++, but that is painful because of its overhead and dependency complexity (although it's surprisingly become less painful over time). Python could be used too, and probably is better at this point in that regard, but it has many of the same problems as R.
Julia is open-source, fast, and well-thought out with regard to modern numerical programming problems. I can write something in Julia and it performs essentially as well as something in C, which is a huge time saver in multiple respects.
I do wish Julia were more general-purpose in its orientation, or that the solutions it offers were coming from a more general-purpose language, but at the moment that doesn't seem to be in the cards. Maybe as it grows it will find use as a more general-purpose language, which is possible; maybe as languages like Rust or Go grow they will occupy this niche as well. Rust is interesting to me in this way, but currently it has little to offer in terms of simplification over C++ for numerics, and Go is not friendly to numerics. I personally like Stanza, but it's in its infancy, and no one probably even knows what I'm talking about.
For whatever reason, my experience has been that numerical programming has been a kind of isolate in programming. Numerical computing has always seemed slightly neglected in programming languages, and languages that have targeted numerical computing have often never been able to shake the "domain specific" label. I've just sort of come to see it as part of the territory.
There's nothing wrong with Python, C, or R. Also, languages change rapidly, so who knows what will happen. At the moment, though, Julia offers the best of all three and the only big downside is lack of libraries, which is becoming less and less of an issue every day (I wouldn't say there's a lack of libraries, more that there's fewer libraries). So I think it's deserving of its current attention.
I guess the question is, why would a systems programmer use C, or a web programmer use Javascript, or a network infrastructure programmer use Erlang, etc. etc. etc.?
I think one of the strengths of this approach is easily implementing numerical methods/approaches from papers and having it just work. If you work in academia or in an R&D field, this is valuable. I don't think Julia is presently positioning itself to be the core language a company's product is based on.
having it "just Work" is relative even in academia and R&D, again, there are domain specific languages being used in these environments that "just work" and have been designed to do so in the most efficient manner.
It's much faster than Python, has optional typing (good thing), and will soon support real multi-threading (support is currently experimental).
It can also import anything in Python in a breeze, if it needs to, but this is secondary imho.
But still, as you see, there is a lot of upside.
I write this as a huge Julia fan; I use Julia daily, and it is both my favorite language and the language I know best. So I already think Julia is great. But reading many Julia blogs, especially those from julialang.org, would make me think Julia is only useful for very narrow scientific applications if I didn't already know better.
I like Julia because it's extremely fast and extremely expressive - and I don't just mean "expressive" as in "can be written like a dynamic/scripting language," though it can. I mostly mean that the combination of its type system and multiple dispatch allows some really elegant abstractions.