Experience migrating legacy Cobol/PL1 to Java

问题

ORIGINAL Q: I'm wondering if anyone has had experience of migrating a large Cobol/PL1 codebase to Java?

How automated was the process and how maintainable was the output?

How did the move from transactional to OO work out?

Any lessons learned along the way or resources/white papers that may be of benefit would be appreciated.

EDIT 7/7: Certainly the NACA approach is interesting, the ability to continue making your BAU changes to the COBOL code right up to the point of releasing the JAVA version has merit for any organization.

The argument for procedural Java in the same layout as the COBOL to give the coders a sense of comfort while familiarizing with the Java language is a valid argument for a large organisation with a large code base. As @Didier points out the $3mil annual saving gives scope for generous padding on any BAU changes going forward to refactor the code on an ongoing basis. As he puts it if you care about your people you find a way to keep them happy while gradually challenging them.

The problem as I see it with the suggestion from @duffymo to

Best to try and really understand the problem at its roots and re-express it as an object-oriented system

is that if you have any BAU changes ongoing then during the LONG project lifetime of coding your new OO system you end up coding & testing changes on the double. That is a major benefit of the NACA approach. I've had some experience of migrating Client-Server applications to a web implementation and this was one of the major issues we encountered, constantly shifting requirements due to BAU changes. It made PM & scheduling a real challenge.

Thanks to @hhafez who's experience is nicely put as "similar but slightly different" and has had a reasonably satisfactory experience of an automatic code migration from Ada to Java.

Thanks @Didier for contributing, I'm still studying your approach and if I have any Q's I'll drop you a line.

回答1:

Update 6/25: A friend just ran across the NACA Cobol to Java converter. Looks quite interesting, it was used to translate 4m lines of Cobol with 100% accuracy. Here's the NACA open source project page. The other converters I've seen were proprietary, and the materials were conspicuously lacking success stories and detailed example code. NACA is worth a long look.

Update 7/4: @Ira Baxter reports that the Java output looks very Cobol-esque, which it absolutely does. To me, this is the natural result of automatic translation. I doubt we'll ever find a much better translator. This perhaps argues for a gradual re-write approach.

Update 2/7/11: @spgennard points out that there are some Cobol compilers on the JVM, for example Veryant's isCobol Evolve. These could be used to help gradually transition the code base, though I think the OP was more interested in automated source conversion.

I'd be very cautious about this. (I used to work for a company that automatically corrected Cobol and PL/I programs for Y2K, and did the front end compiler that converted many dialects of Cobol into our intermediate analytic form, and also a code generator.) My sense is that you'd wind up with a Java code base that still would be inelegant and unsatisfying to work with. You may wind up with performance problems, dependencies on vendor-supplied libraries, generated code that's buggy, and so on. You'll certainly incur a huge testing bill.

Starting from scratch with a new object-oriented design can be the right approach, but you also have to carefully consider the decades of stored knowledge represented by the code base. Often there are many subtleties that your new code may miss. On the other hand, if you're having a hard time finding staff to maintain the legacy system, you may not have a choice.

One gradual approach would be to first upgrade to Cobol 97. This adds object-orientation, so you can rewrite and refactor subsystems individually when you add new functionality. Or you could replace individual subsystems with freshly-written Java.

Sometimes you'll be able to replace components with off-the-shelf software: we helped one very large insurance company that still had 2m lines of code in a legacy language it created in the 1950s. We converted half of it to Y2K compliant legacy language, and they replaced the other half with a modern payroll system they bought from an outside vendor.

回答2:

It was clearly our intent to obtain initial java code that was very close to the original cobol in order to facilitate the migration of people: they find the good old app they wrote in cobol in exact same structure.

one of our most important goals was to keep initial developers on board: that's the way we found to achieve it. When application migrated to Java, those people can start make it more OO as they further develop / refactor it.

If you don't care about migrating people, you can use other strategy.

This 1-to-1 conversion also made 100% automated conversion simpler & faster: the good consequence is that we made our recurring savings (3 millions euros / year) much faster: we estimate 12-18 months. Those early savings can clearly be reinvested in OO refactoring

feel free to contact me: didier.durand@publicitas.com or mediaandtech@gmail.com

didier

回答3:

My experience is similar but slightly different. We have a large and old code base in Ada (0.5Mloc over 15+years ) that was recently converted to Java. It was outsourced to a company that provided combination of automated/manual conversion. They also did testing to verify that the Ada and Java systems behaved the same.

Some parts of it where written in Ada 95 (ie had the possibility of OOP) but most of it wasn't

Now yes the code is not up to the same standards of code written in Java in the first place but we have been using it since then successfully (18 months now) with no major issues. The major advantage we got was now we can find more developers to maintain our code base with the skills to produce maintainable code. (Any one can develop in Ada but like any other language if you don't have the experience in it you can end up with unmaintainable code)

回答4:

From a risk avoidance point of view, the NACA approach absolutely makes sense. Reusing their tools might not. They used the developing of the tools to get their people up to speed in java and linux.

The result of the NACA conversion is not going to be good enough, or even OO, and makes it difficult to hire new people. But it is testable, can be refactored, and you can plug in better translators.

[edit] Ira, you don't seem to be very risk-aware.

Sending the cobol programmers to a java course is not going to make them write usable object-oriented code. That takes a few years. During that time, their productivity will be very low, and you can basically throw away all the code they write the first year. In addition you'll lose 10-20% of your programmers, who are not willing or capable of making the transition. Lots of people do not like going back to beginner status, and it is going to influence the pecking order, as some programmers pick up the new language a lot faster than others.

The NACA approach allows the business to continue working, and puts no unneeded pressure on the organisation. The time-schedule for the conversion is independent. Having a separate translator, in java, writen by OO experts, allows a gradual exposure to java for the old team. Writing the test cases increases domain knowledge in the new java team.

The real oo system is the translator, and that is the place to plug in better translators. Make it easy to do that, and you do not have to touch the generated code. If the generated code is ugly enough, that is what will happen automatically: :)

the old programmers will change the cobol input;
the new java ones will change the translator.

[running the translator once] is a bad strategy. Don't do that. And if you need to edit the generated code, maintain a mapping back. That can be automated. And should be. It is a lot easier to do these kind of things in a Smalltalk image, but you can do it with files. There are people with a lot of experience maintaining different views on the same artifact: chip designers come to mind.

The translator should be instrumented, so you can create the daily counts of e.g.

cobol input components;
OO java input components;
cobol style output components;
OO style output components.

You might want to read: Peter van den Hamer & Kees Lepoeter (1996) Managing Design Data: The Five Dimensions of CAD Frameworks, Configuration Management, and Data Management, Proceedings of the IEEE, Vol. 84, No. 1, January 1996

[moving Cobol platforms] Moving from Cobol on the mainframe to Cobol on Windows/Linux could have been a viable strategy for the NACA team, but the question was about moving to java. If the long-term goal is to have a modern OO system, and to get there with as little operational risk as possible, the NACA approach is sound. It is only step one, though. A lot of refactoring is going to follow.

回答5:

I'm surprised nobody has mentioned Semantic Design's DMS Software Reengineering Toolkit. I looked into COBOL conversion in the past. I was working on "automatic programming" back then. Before writing a translator, I looked up a bunch of previous efforts and products in that area. Semantic Designs' GLR-based tool was the best of the bunch.

That was many years ago. At the time, the tool translated COBOL to a modern language, refactored it, pretty printed it, etc. Here's the link to it now.

http://www.semdesigns.com/Products/DMS/DMSToolkit.html

They're still around. They've expanded the tool. It's more general. It might help people doing automated conversions or customizing a conversion tool. It's designed to be expandable and tweakable similarly to what Stephan pointed out. Thanks to Cyrus also for mentioning SoftwareMining. I'll look into them too if I run into a COBOL migration in the future.

回答6:

You are speaking of reengineering. The good thing is that a lot of people worldwide tries to do this. The bad thing is that there are a lot of problems concerning legacy applications reengineering: starting from missing sources and up to complex algorithms from compiler construction and graph theory fields.

Idea of automatic translation is very popular, until you will try to convert something. Usually the result is awful and unmaintainable. It is more unmaintainable than original complicated application. From my point of view, every tool that allows automatic translation from legacy to modern language is very marketing oriented: it says exactly what people want to hear "translate your application from ... to Java once, and forget!", than you are buying a contract, and then you understand that you very tightly depends on the tool (because you can't make any change to your application without it!).

Alternative approach is "understanding": the tool, that allows you very detailed understanding of your legacy application. And you can use it for maintenance, or for documenting, or for reinventing on new platform.

I know a little about Modernization Workbench history before Microfocus bought it last year and moved development to another country. There was great number of complex analysis tools, and number of supported target languages (including Java). But no client really used automatic code generation, so the development of generation part was frozen. As far as I know PL/I support was mostly implemented, but it was never finished. But still you can try, may be this is what you are looking for.

回答7:

I just looked at the NACA page and docs. From their documentation:

"The generated java uses a Cobol-like syntax. It's as close as possible from original Cobol syntax, within of course the limits of the Java language. Generated code doesn't look like classical native java and is not object oriented from the application point of view. This is a by design strong choice, to enable a smooth migration of Cobol developers to the Java environment. The goal is to keep business knowledge in the hand of people who wrote the original Cobol programs."

I didn't see an example, but the quote gives a strong flavor of the result. Its COBOL coded in Java.

You can always build a "Translator" from one language to another, by simply coding an interpreter in the target langauge. That's IMHO an absolutely terrible way to translate a langauge as you end up with the worst of both worlds: you don't get the value of the new language, and you still have to have knowledge of the old one to keep the result alive. (No wonder this thing is called a "Transcoder"; I'd never heard this term before).

The argument for this stunt is to dump the costs of the mainframe. Where's the evidence that the costs of working on the converted program don't swamp the savings? I suspect the truth is that the operations people lowered their cost by dumping the mainframe, and they couldn't care less that the maintenance tasks got more expensive. While that may be rational for the operations guys, its a stupid choice for the orgnization as a whole.

Heaven help people that are a victim of this tool.

EDIT May 2010: I found an example of NACA's output; one of their testcases. This is absolutely magnificent JOBOL. Its a good thing they are keeping their COBOL programmers and don't want to hire any Java programmers. As your read this, be sure you remember this is Java code.

/*
 * NacaRTTests - Naca Tests for NacaRT support.
 *
 * Copyright (c) 2005, 2006, 2007, 2008 Publicitas SA.
 * Licensed under GPL (GPL-LICENSE.txt) license.
 */

import idea.onlinePrgEnv.OnlineProgram;
import nacaLib.varEx.*;

public class TestLong extends OnlineProgram
{
  DataSection WorkingStorage = declare.workingStorageSection();

  Var W3 = declare.level(1).occurs(10).var();
  Var V9Comp010 = declare.level(5).pic9(10).var();
  Var V9Comp014V4 = declare.level(5).pic9(14, 4).var();
  Var VX10 = declare.level(5).picX(10).var();

  public void procedureDivision()
  {
    setAssertActive(true);

    move("9876543210", VX10);
    assertIfDifferent("9876543210", VX10);

    move(VX10, V9Comp010);
    long l = V9Comp010.getLong();
    assertIfFalse(l == 9876543210L);

    multiply(1000, V9Comp010).to(V9Comp014V4);
    assertIfFalse(9876543210000L == V9Comp014V4.getLong());

    String cs = V9Comp010.toString();
    cs = V9Comp014V4.toString();
    assertIfDifferent("9876543210000.0000", V9Comp014V4);

    inc(V9Comp010);
    assertIfFalse(9876543211L == V9Comp010.getLong());

    CESM.returnTrans();
  }

Kids: This is only done by professionals. Do not attempt this at home.

来源：https://stackoverflow.com/questions/1029974/experience-migrating-legacy-cobol-pl1-to-java

标签

java

migration

cobol

code-migration