Hi everyone!
I have been working on a new library for writing JVM bytecode with Haskell in a nice, high level way and I’d love some feedback on it! The motivation here is for compilers to the JVM so they can focus on the actual code generation, meanwhile H2JVM takes care of all the messy details like StackMapTable analysis, label/offset resolution, etc.

Here is a quick example taken from the readme. It generates a simple class file with a single method static int add(int, int) which adds 2 numbers:

main :: IO ()
main = do
  -- Define the class name, method descriptor, and access flags
  let className = "Calculator"
      methodDesc = MethodDescriptor -- int (int, int)
        [PrimitiveFieldType JInt, PrimitiveFieldType JInt] 
        (TypeReturn (PrimitiveFieldType JInt))

  -- Construct the class using ClassBuilder
  result <- runPureEff $ runErrorNoCallStack @StackMapError $ runClassBuilder className java8 $ do
    addAccessFlag CPublic
    
    -- add the method, which automatically handles stack map analysis, max stack/locals, etc
    addMethodWithCode "add" [MPublic, MStatic] methodDesc $ do
      emit $ ILoad 0
      emit $ ILoad 1
      emit IAdd
      emit IReturn

  case result of
    Left err -> putStrLn $ "Error building class: " <> show err
    Right (classFile, _) -> do
      -- Serialise the class to a file
      let path = classFilePath classFile -- returns "Calculator.class"
      case classFileBytes classFile of
        Left err -> putStrLn $ "Error generating bytecode: " <> show err
        Right bytes -> LBS.writeFile path bytes

This is a less contrived example from real usage of the library in my compiler, which shows how label resolution “just works”. This implements the > operator in my language in the way you’d probably expect

IR.BinaryOp op lhs rhs -> do
    emitExpr lhs
    emitExpr rhs
    case op of
        IR.GreaterThan -> do
            trueLabel <- newLabel
            endLabel <- newLabel
            emit $ JVM.IfICmp (IfGt trueLabel) -- if_icmpgt jump to trueLabel
            -- false case
            emit JVM.IConst0 -- push 0 onto stack
            emit $ Goto endLabel -- jump to end
            -- true case
            emit $ JVM.Label trueLabel
            emit JVM.IConst1 -- push 1 onto stack
            emit $ JVM.Label endLabel -- jump to end

The generated code here looks something like this:

29: blah (pushing lhs and rhs onto stack)
32: if_icmpgt     39 -- trueLabel resolved as offset 39
35: iconst_0
36: goto          40 -- endLabel resolved as offset 40
39: iconst_1
40: blah blah (code after the binaryop, whatever that may be)

The library is still in very very very early stages (only a very small subset of the instructions and attributes are supported), but I would love some preliminary feedback on things like the design!

Happy to answer any questions anyone might have too :slight_smile:

If you’re interested, here’s the GitHub repo: GitHub - ElaraLang/h2jvm: Haskell library for writing JVM bytecode in a high level format · GitHub

Thanks in advance!

15 Likes

Nice! I like the idea.

What happens if you call addAccessFlag twice (perhaps with different flags)?

1 Like

Looks good and plainly written, I’ll bookmark it for next time I do something with the JVM

Browsed around and reviewed a little, have a few random suggestions


You can avoid the possibility of an UnmarkedLabel error using a MonadFix instance and rec or mdo

emitNewLabel = do
  label <- newLabel
  emit $ JVM.Label label
  pure label
mdo
  emit $ JVM.IfICmp (IfGt trueLabel)
  emit JVM.IConst0
  emit $ Goto endLabel
  emit JVM.IConst1
  trueLabel <- emitNewLabel
  emit JVM.IConst1
  endLabel <- emitNewLabel
  pure ()

I’d rather use rec to spell out the scope, but this case is better with mdo because the labels are used throughout


In CodeState instead of storing a list in reverse order you can use a DList Instruction

Code that uses IfCond is repetitive because it’s denormalised

data IfCond label = IfEq label | IfNe label | IfLt label | IfGe label | IfGt label | IfLe label

Instead you can factor:

  1x+1x+1x+1x+1x+1x
= (1+1+1+1+1+1)x
= 6x

data If label = If Cond label
data Cond = Eq | Ne | Lt | Ge | Gt | Le

Can also keep the current patterns with synonyms like pattern IfEq label = If Eq label and a {-# COMPLETE #-} pragma


In clauses of Pretty instances like this

pretty (InvokeSpecial c n d) =
  "invokespecial" <+> pretty c <> "." <> pretty n <> pretty d

I like to use ViewPatterns (or a synonym) so the shape of the output is less cluttered by calls to pretty, and it’s only called once if used more than once

pretty (InvokeSpecial (pretty -> c) (pretty -> n) (pretty -> d)) =
  "invokespecial" <+> c <> "." <> n <> d

I dunno how “pretty” you actually want for the pretty-printing to be, but I find that Pretty instances in general tend to overuse the explicit non-breaking horizontal (hcat, hsep, <>, <+>) and vertical (vcat, vsep) layouts without enough grouping, and should rather use the grouped combinators that allow breaks between all (cat, sep, surround softline', surround softline) or between any (fillCat, fillSep)

3 Likes

I’m always suspicious of DList in that physically it is essentially storing a list in reverse order, just less clearly: each closure is closing over one new element (actually, over another closure capturing that element) and over the closure recursively referencing the rest of the elements, which is the same structure as a cons cell (in either case, a data structure with two pointers, one pointing to the head and one to the tail), but less explicitly and with extra allocation/indirection (because each element is stored via an additional closure).

Something along the lines of reverse-list feels like a better approach—just a newtype wrapper around a regular list, to make clear that it’s in reverse order. I’ve never used that package, and I’ve always just been temped to make my own wrapper local to a project, allowing just two operations: snoc and toList.

Anyway, side issue but just wanted to share my critique of DList.

But it can also store the list in forward order. You can both cons and snoc

it’s a tree of function compositions

Sure; DList can do more, but for the common case of just needing to accumulate a list in the opposite order of the desired order of the result, it seems like overkill (both in terms of intelligibility and allocation).

I feel like people get the impression that using function composition is somehow leading to some aspect having zero overhead, but it isn’t. And in particular, the final toList is O(n), just like reversing a list. Either way you do it, it will have at least double the execution cost of just accumulating a list in the “wrong” order.

Generally with Chuch-encoding-like constructions, I think it’s cool that you can do it but that doesn’t mean you should; it’s slick but maybe not the best implementation.

2 Likes

Thank you for the feedback!

All addAccessFlag does is prepends it to a list in the underlying state monad, so nothing super interesting. You’re able to use it with multiple different flags in the way you’d expect. I’ll consider adding an overload to add multiple at once.

There’s no check for duplicates, but since the flags compile down to a simple bitmask, nothing weird would happen if you added the same flag twice.

Wow, this is all amazing, thank you so much for such thorough feedback!

The MonadFix suggestion is fantastic, I’m not too familiar with that part of the language but this does seem like a perfect usage!

I’ll try and find a more elegant way of expressing the reversed list, as I see there’s no single consensus about the best option. At the moment I’m probably leaning towards a simple newtype over the list, or perhaps a Sequence?

Great catch for the IfCond :slight_smile:

The Pretty instance isn’t really following any specifically defined rules, though the main principle I’ve been going for is “roughly looking like the output of javap for the purpose of debugging the data you’ve produced”. It’s probably not the most robust implementation in the world, but as I say, mostly for debugging purposes. I’ll definitely do a sweep tidying it up though!

Once again, thanks so much for the detailed feedback and genuinely useful suggestions, I really appreciate it! :slight_smile:

2 Likes

@evincar wondering your thoughts (and anyone else’s of course!) on making the MonadFix approach the only way to emit labels. I think this would make the UnmarkedLabel error mostly impossible (apart from weird cases like sharing labels across CodeBuilders, which could be prevented with a state variable like ST does). However it also forces a certain style upon users, which might be a bit off putting and make the library less approachable. What are your thoughts?

Thanks for the info. I was mostly wondering about a case such as marking something as both public and private—ideally that wouldn’t be possible, but maybe it’s last-one-wins?

Alternatively, if it were an argument to runClassBuilder then it would be syntactically impossible to supply multiple, but then it would also be a required parameter (no defaulting).

Not necessarily a big problem, it’s just the sort of thing that catches my eye with any builder pattern—how to handle restrictions based on other things done in the same “build”.

Indeed—it’s a tree of function compositions, but that’s the implementation, not the exposed interface. So my thoughts are around whether there’s a better way to implement that interface.

Ah, I understand what you mean now! Yes, the last-one ends up winning. I think I’ll move to a cleaner high level interface (probably something more like setPublic etc) that would make this behaviour more obvious and harder to accidentally trip up on.

That’s fair, for appending single instructions it’s probably worse, although I haven’t benchmarked. I think it should be better when you can append chunks of several instructions, so that it works like a builder.

The more important thing imo is to avoid the possibility of a “sign error” by putting <> in a natural order, and reverse-list also helps with that.

1 Like

I wouldn’t mind that, but I am the one who suggested it. If anyone has a compelling reason, you could offer the less-safe API as an escape hatch. But some form of recursive do notation has been around for a long time. mdo was added in GHC 6.0 (May 2003), deprecated in 6.12.1 (December 2009) when rec was added, and then resurrected in 7.6.1 (September 2012). So I think it’s safe to say they’ve settled in.

1 Like

The intention is that you can still manually write the desugared instructions if you need to (I can’t think of many situations where you’d need/want to right now though), so there will always be that escape hatch available. Though that’s a very all-or-nothing approach. I think you’re right though, and it probably should be the only option. Thanks again for the feedback.

Is there a minecraft mod example using this? :laughing:

funny you should mention it, this library is mainly being used for my own language which compiles to the JVM… the original motivation for which was “I wish I could write minecraft plugins in a more functional language” :smiley:

1 Like

Great. Finally we can write a full Haskell OpenComputers OS!

1 Like