Indeed it's becoming quite promising (using crystal already for few pet scripts which were too slow even for rubinius). Where it's most lacking at the moment is gc - it uses stop-world off the shelf boehmgc which is ok but not exactly great for memory heavy tasks.
imho, starting a new language with Boehm GC is a very bad design choice. It means, that one just allocates memory, and does not care for managing it. Even worse, it prevent linking any library, e.g. a 2nd thread running Lua+C, that cares for its own memory, because Boehm GC runs over the complete memory, not only the one the language has to manage.
You basically need 3 types of memory: First for the objects in your language, 2nd for foreign light weight objects, where you only know a pointer, and 3rd for foreign heavy weight objects, that gets their memory from your GC.
> it uses stop-world
A fully concurrent GC is impossible, if variables are mutable. Regardless how tricky your GC delays the problem, there will come a point, where it has to stop all threads to collect the edge cases. This creates the GC dilemma, because currently only number of cores and amount of memory becomes cheaper, while single core performance stayed same for nearly 15 years.
In the beginning Crystal didn't free memory. We needed a GC and Boehm was a super easy way to get that. It worked out of the box with very little effort.
Eventually we can write our own GC or use another one. It's only a matter of time. But right now there are more important things, we think: finishing the language rules, stabilizing things, fixing bugs, completing the standard library and writing documentation.
Nothing is set in stone in a language, things can always evolve and improve.
That's a common misconception, of course mutability gc barriers can be made atomic. But it comes at significant synchronization cost, plus using full shared world like in C does not seem like a good design decision anyway in high level language like crystal.
Which is why I'd be more in favour of refcounting, and let the user make the choice - a simple stop-world gc mark&sweep is alright for tasks which can afford the higher memory usage and pauses (one gains good throughput), or rc - good for low latency, low memory usage (and low throughput and high cache pollution).
Regarding multi core threading, crystal has next to none. All modern gc design decisions depend on how exactly multicore threading will be eventually implemented. hence why fixing gc is not a priority, but rc could be readily useful.