Restarting generation phase Follow
Hi MPS team! Great work on MPS, I really love it.
We are now applying it in a project where we need to process (typecheck) big data sets from command line (Mihail Muhin knows more about this), so we can not load all of the data into an MPS model at once (the memory footprint gets more than 20MiB and then MPS just dies).
Of course it is possible to chop up the data set and start MPS every time a new for each chunk of the data set, but this incurs the MPS startup time for every chunk, which makes the typechecks extremely slow.
So we would like to be able to go through a generation phase and then just restart it again (without needing to restart MPS itself). Is this possible, and if so, how to do it?
We are now applying it in a project where we need to process (typecheck) big data sets from command line (Mihail Muhin knows more about this), so we can not load all of the data into an MPS model at once (the memory footprint gets more than 20MiB and then MPS just dies).
Of course it is possible to chop up the data set and start MPS every time a new for each chunk of the data set, but this incurs the MPS startup time for every chunk, which makes the typechecks extremely slow.
So we would like to be able to go through a generation phase and then just restart it again (without needing to restart MPS itself). Is this possible, and if so, how to do it?
Please sign in to leave a comment.
Thanks a lot for your reaction! Sorry it took a while, but I was drawn into another area of the project, not giving much time for this aspect.
To elaborate more:
Any hints/suggestions are really appreciated!
<buildfacet>
facet PerformTypeCheck extends <none> {
Required: Generate, Make, TextGen
<no optional facets>
Targets:
target VerifyTypes overrides <none> weight default {
resources policy: consume (module, model, retainedModels, status) GResource
}
Dependencies:
after generate
before textGen
before reconcile
before make
<no properties>
<no queries>
<no config>
(progressMonitor, input)->void {
begin work checkTypes covering ALL units of total work left, expecting 1 units;
System.out.println("================================================================");
list<string> errors = CheckingHelper.performCheck(input.select({~it => it.status.getOutputModel(); }).toArray);
if (errors.isNotEmpty) {
report error "typecheck failed";
// failure;
}
finish checkTypes;
}
}
</buildfacet>
<checkinghelper>
public static list<string> performCheck(final SModel[] models) {
final list<string> errors = new arraylist<string>;
System.out.println("Verifying typesystem rules:");
final TypesystemChecker checker = new TypesystemChecker();
ModelAccess.instance().runReadAction(new Runnable() {
public void run() {
for (SModel sm : models) {
for (SNode root : SModelOperations.getRoots(sm, null)) {
Set<IErrorReporter> errorReporters = null;
try {
System.out.println("- Verifying " + root.getPresentation());
errorReporters = checker.getErrors(root, sm.getRepository());
} catch (IllegalStateException e) {
System.out.println("Error:");
System.out.println(e.getMessage());
}
for (IErrorReporter reporter : errorReporters) {
if (reporter.getMessageStatus().equals(MessageStatus.ERROR)) {
if (reporter.reportError().startsWith("a class should have")) { continue; }
SNode node = reporter.getSNode();
if (!((CheckingTestsUtil.filterIssue(node)))) { continue; }
System.out.println("Error: " + reporter.reportError());
errors.add(reporter.reportError());
}
}
}
}
}
});
return errors;
}
</checkinghelper>
Nailgun is a very good suggestion, but I am afraid that in the infrastrucure in which we have to run this application, Nailgun is not allowed by the organization.
We were also thinking about possibly optimizing MPS using something like JET (http://www.excelsiorjet.com/), but it's very questionable whether this is going to work at all.
Otherwise some way of restarting generation phases within MPS would be a great help to this project.
1) You've got models, and a generation process (GP1) gives you datasets you serialize (with textgen at the end of GP1) into text file
2) There's another generation process (run on a custom solution), that picks text files from step 1, translates them into model (? - is that what you retrieve with it.status.getOutputModel() in your facet?)
3) Check the model from the step 2.
It looks a bit complicated to be true ;)
Unless the process could be simplified, would you consider writing own ant task, using <generate> as an example? Guess, if you control make process for your models, you could slice the data the way it doesn't ruin MPS.
Thanks for your quick reply!
Textgen is not used at all.
We just have a generator that imports data from files. It is a multistep
(including resolving) which has a few steps in the generator. The last
transient model is the one that is processed in the facet.
We tried running in a separate ant task, but the problem with that is that
we cannot get the last transient model, which is the one we need to check.
That's why we introduced the build facet (which gets the last transient
model as input). We also tried to run the check within the generation
phase, but the call to checker.getErrors() yields a different check than
the one in the build-facet, with the final transient model - maybe the
models are not completely transformed in that phase?). So it seems there
really is no way around doing this in a build-facet.
So the design is as follows: ant generate –> MPS (subsystem) startup from
build –> import a text file into MPS model –> compute some dependencies
to other textfiles –> recursively import the depended-upon textfiles into
MPS models –> construct references (resolve dependencies from the first
imported model to the depended-upon imported models) –> constraint-check
--> type-check
We can now run this whole cycle for one textfile. But if we want to run
this cycle multiple times (to process multiple text-files), we always have
to quit MPS and load it again. If we could find a way to reboot the cycle
(so to wipe all the imported models and just load a new text-file into
models with all its dependencies) and rerun the check, we would lose the
startup-overhead (loading all classes) of MPS.
Right now the startup is more than 2/3 of the whole runtime, so if we can
factor that out (so it's only incurred on the first text-file that is
imported and for all the others the MPS infra classes are already preloaded
in memory), we have a major performance gain, which will be necessary to
successfully land the tool in the environment where it will be used.
But this has to be done inside MPS and cannot be done in ant-environment,
as the ant tasks always quit the whole of the MPS infra and reboot it
(unless there is a way to prevent this).
I looked at the "Nailgun"-solution offered by Fabien Campagne, but this
brings with it security problems, as well as problems with distributing the
run. In this application, MPS has to be run as a kind of build-tool on a
parallellized build server park, so Nailgun will be difficult to
use in such a situation.
Does this clarify things a bit or are you now more confused? I'm afraid
it's difficult to explain :-) At least I hope that you can offer some
advice...
Thanks a lot in advance!
Best regards,
Eugen
1) There's a solution with model M1, using language L1.
2) Generator for L1 loads some text files and uses this to augment model at hand
3) Other generators take a chance to play around with the model
which is wrapped into outer, MPS-make phase:
1) ant generate task does 'make' for the project (which includes aforementioned (1) solution with M1)
2) Make subsystem detects desired make facets and their order, and launches GenerateFacet followed by PerformTypeChecks
3) GenerateFacet runs: here inner phase jumps in as outlined above
4) Make passes control to PerformTypeChecks
Assumptions (correct me I'am wrong):
1) You can control chunks for generation/type checks by solutions/models (i.e. there might be M2 next to M1 or in a separate solution to do another part of the job)
2) Final transient model is huge, thus you can't feed M2, M3, etc along with M1 to a single generation phase.
3) You run this outside of MPS, from Ant. I'm confused with
4) There's strong reason to use generator to load text files and to populate models during generation step
It seems if we can inject your code (loop over models with generation chunks) between (1) and (2) of MPS-make phase, to repeat (2)-(4) for each chunk, that would do the trick. Does this sound like a solution you're looking for (MPS process startup cost are associated with ant task preparations to run make in step (1))?
4) There's strong reason to use generator to load text files and to
This assumption is not really necessary for us, but we don't know a useful
way to augment models with imports from textfiles, other than reduction
rules, which are part of generation.
What would be the super solution is to lose the startup-overhead (as I
understand now from your last analysis, it's actually not MPS startup time
but rather ant task preparations to run make).
Right now, the most time is located here:
generate:
[echo] generating
----> here is where it takes some time before the actual generation starts
<—-
[generate] Picked up _JAVA_OPTIONS: -Dawt.useSystemAAFontSettings=on
The other way would be to factor out the startup time for the chunks by the
injection you described. Here comes the confusion with assumption number 3
(running outside of ant): I meant "ant generate" versus your own "ant
modelcheck" task which you can also make without using a facet (but this is
what we couldn't get to work).
If you know a way to eliminate the extra ant actions (e.g. by factoring the
import of the textfiles out into another part of MPS), that would be really
cool. If that doesn't work, it would also be pretty good to know how to do
the earlier mentioned injection :-)
Thanks again for so much useful info, and I hope you can help with a bit of
the details on the "how".
Provided there are solutions S1 and S2 which specify fractions of the job to do, alter your build script so that <generate> task has two chunks, with S1 and S2, respectively. Run the script to check if this change makes any difference. If it helps, we can think about generating these chunks into build script automatically.