Restarting generation phase

Created September 15, 2014 20:55

Hi MPS team! Great work on MPS, I really love it.
We are now applying it in a project where we need to process (typecheck) big data sets from command line (Mihail Muhin knows more about this), so we can not load all of the data into an MPS model at once (the memory footprint gets more than 20MiB and then MPS just dies).
Of course it is possible to chop up the data set and start MPS every time a new for each chunk of the data set, but this incurs the MPS startup time for every chunk, which makes the typechecks extremely slow.
So we would like to be able to go through a generation phase and then just restart it again (without needing to restart MPS itself). Is this possible, and if so, how to do it?

9 comments

Permanently deleted user

Created September 16, 2014 17:47

Could you please elaborate on your build/make process a bit more? Some background regarding your typecheck activity would not hurt as well. I was told you use Ant build and got your own make facet to perform the typecheck task, and the input for the task is a generator outcome. Is this correct? Do you process all/few models with a single generate+typecheck task, or it's rather per-model? Do you use MPS's build language or it's custom solution? Are you bound to API or ready to use internal code?

Eugen Schindler

Created November 05, 2014 13:03

Hi Artem,
Thanks a lot for your reaction! Sorry it took a while, but I was drawn into another area of the project, not giving much time for this aspect.

To elaborate more:

Indeed, we are using a build facet (see below) to run the checks. We tried to do it in the generation phase, but the checks you can run there are different. We also tried the code below inside the generation phase, which leads to an abort in MPS.
Indeed we are doing the checks from command line. For this end, we have created a separate build.xml for a solution that imports datasets from textfiles in MPS and then runs checks on them. The logic for doing importing and checking is not inside the solution, but rather inside languages that are used by the solution. The importing of the datasets from text is done in the generation phase, doing a number of transformations and resolves and the checking is then done on the last transient model which goes in the custom build facet below.
We process all models with a single generate task. So we just call "ant generate" from the command line.
To create the build solution, we use MPS's build language.
For this work, we are not necessarily bound to API. It's no problem if we have to use internal code.
Because of the inherent size of the data that has to be processed, it would be helpful to restart the generation phase multiple times internally without having to restart MPS. In this way, we could import a piece of the data, do checking/processing, then unload/delete the models that have already been checked and import the next piece of data and check it, and so on.
Maybe we are stuck in just one solution direction (as often happens when one tries to ask for somebody's help, already thinking in solutions). If you have other suggestions, those would also be very helpful :-) The real problem is: importing and checking sets of data that are inherently large for MPS in-memory (so that lead to out-of-memory problems for MPS), so being able to over and over again delete, import, check, delete, import, check, etc.

Any hints/suggestions are really appreciated!

<buildfacet>
facet PerformTypeCheck extends <none> {
  Required: Generate, Make, TextGen

  <no optional facets>
  Targets:
  target VerifyTypes overrides <none> weight default {
    resources policy: consume (module, model, retainedModels, status) GResource
  }

  Dependencies:
  after generate
  before textGen
  before reconcile
  before make

  <no properties>
  <no queries>
  <no config>
  (progressMonitor, input)->void {
      begin work checkTypes covering ALL units of total work left, expecting 1 units;
      System.out.println("================================================================");
      list<string> errors = CheckingHelper.performCheck(input.select({~it => it.status.getOutputModel(); }).toArray);
      if (errors.isNotEmpty) {
      report error "typecheck failed";
      // failure;
    }
    finish checkTypes;
  }
}
</buildfacet>
<checkinghelper>
public static list<string> performCheck(final SModel[] models) {
  final list<string> errors = new arraylist<string>;

  System.out.println("Verifying typesystem rules:");

  final TypesystemChecker checker = new TypesystemChecker();

  ModelAccess.instance().runReadAction(new Runnable() {
    public void run() {
      for (SModel sm : models) {
        for (SNode root : SModelOperations.getRoots(sm, null)) {
            Set<IErrorReporter> errorReporters = null;
            try {
              System.out.println("- Verifying " + root.getPresentation());
              errorReporters = checker.getErrors(root, sm.getRepository());
            } catch (IllegalStateException e) {
              System.out.println("Error:");
              System.out.println(e.getMessage());
            }
            for (IErrorReporter reporter : errorReporters) {
              if (reporter.getMessageStatus().equals(MessageStatus.ERROR)) {
if (reporter.reportError().startsWith("a class should have")) { continue; }
SNode node = reporter.getSNode();
if (!((CheckingTestsUtil.filterIssue(node)))) { continue; }
System.out.println("Error: " + reporter.reportError());
errors.add(reporter.reportError());
              }
            }
        }
      }
    }
  });
  return errors;
}
</checkinghelper>

Permanently deleted user

Created November 05, 2014 13:12

If you already have a solution that works, and the problem is performance of starting the entire MPS system, I would suggest to try Nailgun. Works great to improve performance when you need to start the JVM many times and remove the time it takes to load classes and initialize the JVM: http://www.martiansoftware.com/nailgun/

Eugen Schindler

Created November 07, 2014 10:12

Thanks for your quick reply, Fabien!
Nailgun is a very good suggestion, but I am afraid that in the infrastrucure in which we have to run this application, Nailgun is not allowed by the organization.

We were also thinking about possibly optimizing MPS using something like JET (http://www.excelsiorjet.com/), but it's very questionable whether this is going to work at all.

Otherwise some way of restarting generation phases within MPS would be a great help to this project.

Permanently deleted user

Created November 18, 2014 10:56

Let's see if I understand the process right:
1) You've got models, and a generation process (GP1) gives you datasets you serialize (with textgen at the end of GP1) into text file
2) There's another generation process (run on a custom solution), that picks text files from step 1, translates them into model (? - is that what you retrieve with it.status.getOutputModel() in your facet?)
3) Check the model from the step 2.

It looks a bit complicated to be true ;)

Unless the process could be simplified, would you consider writing own ant task, using <generate> as an example? Guess, if you control make process for your models, you could slice the data the way it doesn't ruin MPS.

Eugen Schindler

Created November 18, 2014 12:49

Hi Artem,

Thanks for your quick reply!
Textgen is not used at all.
We just have a generator that imports data from files. It is a multistep
(including resolving) which has a few steps in the generator. The last
transient model is the one that is processed in the facet.

We tried running in a separate ant task, but the problem with that is that
we cannot get the last transient model, which is the one we need to check.
That's why we introduced the build facet (which gets the last transient
model as input). We also tried to run the check within the generation
phase, but the call to checker.getErrors() yields a different check than
the one in the build-facet, with the final transient model - maybe the
models are not completely transformed in that phase?). So it seems there
really is no way around doing this in a build-facet.

So the design is as follows: ant generate –> MPS (subsystem) startup from
build –> import a text file into MPS model –> compute some dependencies
to other textfiles –> recursively import the depended-upon textfiles into
MPS models –> construct references (resolve dependencies from the first
imported model to the depended-upon imported models) –> constraint-check
--> type-check

We can now run this whole cycle for one textfile. But if we want to run
this cycle multiple times (to process multiple text-files), we always have
to quit MPS and load it again. If we could find a way to reboot the cycle
(so to wipe all the imported models and just load a new text-file into
models with all its dependencies) and rerun the check, we would lose the
startup-overhead (loading all classes) of MPS.
Right now the startup is more than 2/3 of the whole runtime, so if we can
factor that out (so it's only incurred on the first text-file that is
imported and for all the others the MPS infra classes are already preloaded
in memory), we have a major performance gain, which will be necessary to
successfully land the tool in the environment where it will be used.
But this has to be done inside MPS and cannot be done in ant-environment,
as the ant tasks always quit the whole of the MPS infra and reboot it
(unless there is a way to prevent this).

I looked at the "Nailgun"-solution offered by Fabien Campagne, but this
brings with it security problems, as well as problems with distributing the
run. In this application, MPS has to be run as a kind of build-tool on a
parallellized build server park, so Nailgun will be difficult to
use in such a situation.

Does this clarify things a bit or are you now more confused? I'm afraid
it's difficult to explain :-) At least I hope that you can offer some
advice...

Thanks a lot in advance!

Best regards,

Eugen

Permanently deleted user

Created November 18, 2014 14:03

Let's elaborate a bit more. As I see it now, there's inner, MPS-generator phase:
1) There's a solution with model M1, using language L1.
2) Generator for L1 loads some text files and uses this to augment model at hand
3) Other generators take a chance to play around with the model

which is wrapped into outer, MPS-make phase:

1) ant generate task does 'make' for the project (which includes aforementioned (1) solution with M1)
2) Make subsystem detects desired make facets and their order, and launches GenerateFacet followed by PerformTypeChecks
3) GenerateFacet runs: here inner phase jumps in as outlined above
4) Make passes control to PerformTypeChecks

Assumptions (correct me I'am wrong):
1) You can control chunks for generation/type checks by solutions/models (i.e. there might be M2 next to M1 or in a separate solution to do another part of the job)
2) Final transient model is huge, thus you can't feed M2, M3, etc along with M1 to a single generation phase.
3) You run this outside of MPS, from Ant. I'm confused with

But this has to be done inside MPS and cannot be done in ant-environment...

4) There's strong reason to use generator to load text files and to populate models during generation step

It seems if we can inject your code (loop over models with generation chunks) between (1) and (2) of MPS-make phase, to repeat (2)-(4) for each chunk, that would do the trick. Does this sound like a solution you're looking for (MPS process startup cost are associated with ant task preparations to run make in step (1))?

Eugen Schindler

Created November 18, 2014 16:49

That's quite a correct and nice description of the essence of the procedure.

4) There's strong reason to use generator to load text files and to

populate models during generation step

This assumption is not really necessary for us, but we don't know a useful
way to augment models with imports from textfiles, other than reduction
rules, which are part of generation.

What would be the super solution is to lose the startup-overhead (as I
understand now from your last analysis, it's actually not MPS startup time
but rather ant task preparations to run make).
Right now, the most time is located here:
generate:
[echo] generating
----> here is where it takes some time before the actual generation starts
<—-
[generate] Picked up _JAVA_OPTIONS: -Dawt.useSystemAAFontSettings=on

Dswing.aatext=true
Dswing.defaultlaf=com.sun.java.swing.plaf.gtk.GTKLookAndFeel

The other way would be to factor out the startup time for the chunks by the
injection you described. Here comes the confusion with assumption number 3
(running outside of ant): I meant "ant generate" versus your own "ant
modelcheck" task which you can also make without using a facet (but this is
what we couldn't get to work).

If you know a way to eliminate the extra ant actions (e.g. by factoring the
import of the textfiles out into another part of MPS), that would be really
cool. If that doesn't work, it would also be pretty good to know how to do
the earlier mentioned injection :-)

Thanks again for so much useful info, and I hope you can help with a bit of
the details on the "how".

Permanently deleted user

Created November 18, 2014 17:55

Eugen, I'd suggest a "field experiment" to explore possible solution. <generate> task supports multiple <chunk> elements, these chunks are executed in order specified, with distinct make/generate for set of modules referenced from the chunk. There's only single ant task/MPS project initialization procedure in this case.

Provided there are solutions S1 and S2 which specify fractions of the job to do, alter your build script so that <generate> task has two chunks, with S1 and S2, respectively. Run the script to check if this change makes any difference. If it helps, we can think about generating these chunks into build script automatically.

Please sign in to leave a comment.