Data in Motion

Oberhuber, Martin to E4

Boris wrote:

> I have written a blog post about the bloat topic:

http://borisoneclipse.blogspot.com/2008/10/avoiding-bloat.html

Great posting!

It inspired me to think about another potential reason for bloat: duplication of graphics, images and NLS Strings

We do have ISharedImages, but more often than not a plugin is forced to duplicate some image or resource String because the original owner of that image or String is (rightfully) hiding that resource and doesn't want to expose it as API.

But isn't that a development-time / installation-time restriction only? Why do we need to bother the plugins at runtime with duplicates of the same Strings and images getting loaded into Memory, in multiple ImageRegistries etc… I'd not be surprised if we could get rid of a good deal of the “Running out of PermGen Space” issue with a different approach to loading Images and Strings at runtime.

I'm thinking about a SINGLE ImageRegistry / StringRegistry (could also be a database), from which plugins can acquire an image/String by means of a Key (that could be an int number, for instance). At the time a bundle is installed, all its images / Strings are placed into the common Database. Clients get a Key in return, which they can use at runtime to retrieve their data. Identical data is collapsed.

Sounds interesting? Or am I missing something? Uninstalling a bundle would certainly not be easy, and I guess that the String and Image database would likely be an add-only thing to avoid lifecycle issues.

I think we should work on the Wiki pages, adding additional ideas for reducing Bloat as we find them. I'm just struggling to find time for that…

Cheers, – Martin Oberhuber, Senior Member of Technical Staff, Wind River Target Management Project Lead, DSDP PMC Member http://www.eclipse.org/dsdp/tm

—————-

Boris Bokowski to E4

I'm thinking about a SINGLE ImageRegistry / StringRegistry
(could also be a database), from which plugins can acquire
an image/String by means of a Key (that could be an int
number, for instance). At the time a bundle is installed, all
its images / Strings are placed into the common Database.
Clients get a Key in return, which they can use at runtime
to retrieve their data. Identical data is collapsed.

Sounds interesting? Or am I missing something? Uninstalling
a bundle would certainly not be easy, and I guess that the
String and Image database would likely be an add-only thing
to avoid lifecycle issues.

We have had great success in the past with JFace's LocalResourceManager. As long as the keys are not taking up more space than what you save through collapsing the real data, it sounds like a good idea.

LocalResourceManager is an object whose lifecycle is managed by its owner (usually some plugin), so lifecycle issues are not that problematic. Each resource manager has a parent resource manager, but all resources (in the JFace case: Images, Colors, Fonts) are managed at the root resource manager, and reference-counted there. By disposing your LocalResourceManager, the reference counts for all resources you contributed get decremented automatically. This also avoids the Singleton pattern… your all-caps SINGLE made me extremely nervous.

Not sure how you would use something like this for strings though. Given a list of potentially not unique string objects, you can use a hash to uniquefy them, but it's computationally expensive if done synchronously and eagerly. (The resources plugin will uniquefy all strings used for markers, but it's done in a background job:https://bugs.eclipse.org/bugs/show_bug.cgi?id=244631#c9)

Boris

Eric Moffatt to E4

+1 for using the existing ResourceManager for SWT resources; it's mature code that has worked well for us.

As far as Strings go Wikipedia indicates that java already does some form of this through a technique called 'interning', why should we try to compete? A quick google lead to some suprising (to me) optimizations such as using substrings of existing interned strings to represent others…

Onwards, Eric

Boris Bokowski to E4

Eric Moffatt wrote on 10/15/2008 09:36:56 AM:

As far as Strings go Wikipedia indicates that java already does some
form of this through a technique called 'interning', why should we
try to compete?

Because there is never an easy answer … interned strings use that especially precious “perm space” memory on some VMs. See for example: http://forums.sun.com/thread.jspa?threadID=741223

Boris

Tom Schindl to E4

Hi,

This [1] blog post is also interesting.

1988 dups of String “id” 504 dups of String “true”

Tom

[1]http://kohlerm.blogspot.com/2008/05/analyzing-memory-consumption-of-eclipse.html

Mark_Melvin@amis.com to E4

I think something similar already exists in the platform but is internal. I'm not sure how stable/useful it is but Ctrl+Shift+T for “StringPool”.

Mark.

Kevin McGuire to E4

This is a good discussion. Stepping back, I'm seeing three broad topics:

1: Strings (well discussed already)

2: Images Sharing an image makes it API. Thus there is always a “do we want to keep this around forever” discussion. Also, we'd like to remain free to change the contents of the image but there's concern about images re-used in a different/unknown context and whether the change could be bad for the consumer. That said, I think the tendency has been to open up more and more images, with the idea being that they're likely to stick around anyway so we might as well share them. It doesn't address the content change issue though.

3: Code Reducing code bloat has both a positive impact on the size of the SDK (as do the above) but to me, more importantly, a positive impact on the required brain size to program Eclipse. And a positve impact on the maintenance cost, both of the SDK and the downstream plugins. I've always believed that code breeds more code. I was wondering if folks on this thread had specific ideas on simplifying our code weight.

That said, I suppose strings and images are an easier place to start.

Regards, Kevin

Markus Kohler to eclipse-incuba.

Hi all,

http://kohlerm.blogspot.com/

thats me :)

Actually I have a lot of experience with avoiding duplicated Strings and so far it almost always turned out to be a good thing (for performance and scalability) to avoid those duplicates.

The implementation String.intern on modern JVM's is not that bad anymore, probably not as good as our (SAP) optimizations that we build into our own JVM.

Those interned Strings should be long lived and therefore is not a big problem that they will live in permspace.

It's pretty easy to build something that can work as replacement for String.intern, but i it's not that easy to do it without a big memory overhead.

The problem is that we would need specialized implementation of a ConcurrentWeakHashSet for Strings. The biggest problem is that WeakReferences itself need quite a bit of memory.

If you need help with this topic just let me know.

I have a new command ready for the Memory Analyzer that will simplify finding those duplicates very much.

Regards, Markus

Oberhuber, Martin to E4

Thanks Kevin.

I fully agree that reducing the code bloat is likely the most interesting area. Looking at Winstone [1], “servlet functionality without the bloat that full J2EE compliance introduces”, I got the idea that reducing code bloat is likely related to getting rid of less-often-needed functionality.

Which is interesting in the context of a component platform like Eclipse. Where does the bloat start? Is Equinox bloated? Likely not, though there exist commercial embedded OSGi implementations like which I think are smaller. Is the core Platform bloated? The Eclipse SDK? The Modeling Package? Ganymede? Galileo?

Given that a component platform is made to mix-and match components, it's getting harder and harder to avoid bloat / duplication the more the components are separate from each other. There needs to be constant communication and interaction between the core framework and all adopters in order to reduce bloat. One way of reducing bloat in the large application, is pushing functionality down into the core framework (which might, incidentally, be perceived as making the framework bloated).

On the other hand, reducing bloat in the Framework is possible by factoring out functionality into optional modules (which are not always necessarily loaded).

But I'm getting too philosophical. To me, the key for avoiding code bloat is providing small but powerful abstractions and programming models in the core framework. And the constant reaching out to the community to actually use the new, modern, flexible frameworks instead of the older style ones… thinking of modern flexible Eclipse core.expressions here, along with Commands and ui.menus, versus the older action framework for instance.

Without doubt, e4 with all the e3 compatibility layers will be bloated. What are we going to do to make the e4 core slick, such that people WANT to use the new mechanisms?

Strings and Images

You probably mis-read me. In my model, images and Strings are not shared at development / install time and are thus not API. But when a bundle gets installed, all those resources are thrown into a common database such that they can be shared at runtime, and they don't need to remain in memory all the time.

I'm assuming that from all the NLS Bundles we have in a running Eclipse instance today, typically not more than 20% are ever needed at runtime (and the rest are error messages which are displayed very rarely). Yet their NLS classes are loaded into memory, so the Strings are also loaded - without ever being referenced. Using a database could allow keeping these Strings and images on disk until they are actually needed.

I'm not exactly sure what magic technology could be used at install time in order to turn the NLS references we have at compile-time today into database-references at runtime. But it's an interesting thought. And, as a matter of fact there exist classfile rewriters such as RetroWeaver [2] and runtime code weavers such as AspectJ [3], so perhaps it's somehow possible?

At least it should be possible if we came up with a different mechanism for referencing externalized resources…

[1] http://winstone.sourceforge.net/#whatIs [2] http://retroweaver.sourceforge.net/ [3] http://www.eclipse.org/aspectj/

Cheers, – Martin Oberhuber, Senior Member of Technical Staff, Wind River Target Management Project Lead, DSDP PMC Member http://www.eclipse.org/dsdp/tm

———————– Kevin McGuire to E4

Hi Martin,

Some great points.

The kind of bloat I'd like to trim is the overly complex frameworks. This is probably the hardest to cut down though because of API considerations. I suspect that the most egregious cases have grown out of a simple initial model, with overtime new requirements added, so that its kind of grown lopsided and out of proportions. Add a kitchen here, a sunroom there, a new garage… and now you have to go through the basement to have breakfast. Our big advantage today is that presumably we understand how Eclipse is being used (vs. 10 years ago when we'd hope for cleverness and luck). So I'd love to see some simplifying frameworks that still capture a broad usage. Getting things like singletones is a good step.

But this is all speaking very generally. The hope is that there are some common simplifying patterns we can apply, but I fear that the reality is that each case will need to be examined invidiually.

I don't mind the platform being big if its because its including common pieces that avoid downstream plugins/products from having to duplicate. Its its purpose, provided of course that the likelyhood of reuse is high enough to warrant everyone getting it. Optional packages/plugins are one option but I think the packagings have gotten sufficiently large and complex that its hard to know what you can exclude, as anyone who has tried to build an RCP app which uses IResources can attest to.

Wrt the string and image discussion, you're right, I misunderstood :)

Regards, Kevin

John Arthorne to E4

I agree code bloat is a big problem, and it gets worse every year (Eclipse SDK size is 37MB/55MB/87MB/155MB in 1.0/2.0/3.0/3.4 respectively). More importantly for this stage in e4, it's not something that's easily optimized later, unlike strings and images. Here are some thoughts on ways to reduce code bloat:

- Be more cautious about introducing API. We have lots of bloat from releasing multiple attempts at a given API, which we then need to support forever and can never remove. The common pattern is that we introduce API in release N. Clients only start heavily adopting the API after release N, and this feedback often results in API changes being needed in release N+1 (rinse, repeat). In more recent releases we have become more aggressive at pulling back API that wasn't ready or didn't have sufficient client feedback, to avoid having to indefinitely support API mistakes and limitations. I'd like to see this happen in e4 too. Go wild early on in the development cycle and add what you want, but as the release approaches, only “graduate” API that is proven. I would even advocate some kind of review by project leads for any significant new API, where the API designer has to demonstrate that the API is platform quality, is thoroughly documented, and has multiple clients using it before it gets approved for release as real API.

- Do regular code weeding. As code is developed, there always accumulates dead code, images, and other resources that were used for awhile and then no longer used. We should use coverage analysis and other tools to root out and remove unused code/images/strings on a regular basis. This often happens when classes are copied and the new user only uses a subset of the copied functionality.

- Cap and trade. This might be a bit over the top, but I have wondered about having a quota on plug-in or component sizes such that they can only grow by so much in a given release. Extra growth can be compensated by making reductions elsewhere. I don't expect this idea to be too popular, but what I'm looking for here is some incentive for developers to avoid bloat. Often there is very little incentive for a developer to avoid bloat, and in a crunch it's too easy to copy wads of code from other plug-ins to get the job done. Or, a large third party library will be added because using some small portion of it will save a bit of work. Code hygiene work such as the weeding mentioned above are also the first things to get dropped when people become busy. It seems without a concrete motivation to avoid bloat, it will inevitably occur.

John

James Blackburn to E4

On Wed, Oct 15, 2008 at 3:08 PM, Mark_Melvin@amis.com wrote:

I think something similar already exists in the platform but is internal.
I'm not sure how stable/useful it is but Ctrl+Shift+T for “StringPool”.

Your ctrl-shift-T is a good illustration of 'code bloat' ;)

For me that brings up: org.eclipse.core.internal.jobs.StringPool org.eclipse.core.internal.preferences.StringPool org.eclipse.core.internal.utils.StringPool

All of which are identical.

Eclipse then has a org.eclipse.core.internal.utils.StringPoolJob which calls 'shareStrings(StringPool)' on participating IStringPoolParticipants. Given that StringPool.add() uses a HashMap to reimplement String.intern() I wonder what the performance difference is between StringPool and String.intern()…

James

Eike Stepper to E4

Hi e4 list,

I hope it's ok to jump in here…

From former experiences with building tools on top of the Platform I know another important source for bloat (if it has not been mentioned before).

A good example is the CVS support which implements very cool functionality. Unfortunately, IIRC, it has two major disadvantages for programmatic users:

[1] It's mostly internal, i.e. it has no API for important core functionality like checkout, create tag, … [2] The separation of reusable core functionality from UI stuff is not very nice.

For me that always led to duplicating a lot of code and introducing unnecessary UI dependencies. I'd really wish that in e4 more attention would be paid to providing useful functionality through convenient APIs that are UI independent as much as the functionality allows for.

Cheers /Eike

Gunnar Wagenknecht to eclipse-incuba.

John Arthorne schrieb:

[…] Go wild early on in the development cycle and add what you
want, but as the release approaches, only “graduate” API that is
proven. I would even advocate some kind of review by project leads
for any significant new API, where the API designer has to
demonstrate that the API is platform quality, is thoroughly
documented, and has multiple clients using it before it gets approved
for release as real API.

I'm afraid that this can turn into a boomerang. Having too less and too restrictive APIs introduces a different kind of bloat. For example, just look at CVS and JDT. There are quite a bunch of plug-ins out there which simply duplicated code from those components to mimic functionality offered by them. Unfortunately, not all of them keep up with the optimizations which happen continuously in subsequent releases in CVS and JDT.

Making a decision here is certainly not easy.

-Gunnar

Kevin McGuire to E4

Wow, this thread has reminded of how incredibly complex this problem is. Two completely different and valid perspectives:

Perspective A: the guys who maintain a framework

Since APIs are forever, grow them carefully, since to do otherwise introduces code we can never get rid of, which all applications must pay for forever.

Perspective B: the guys who consume the framework

The more you expose as API for me, the more I can reuse. This avoids code duplication, resulting in smaller applications.

The problem of course is that its difficult to have people reuse something if the contracts aren't clear, and these often only become clear through usage, which can take several releases.

This suggests a model whereby APIs must evolve. We've already talked about avoiding handing out interfaces for others to implement, lesson learned there (ugh). I think a good thing we've done though is marked APIs as provisional for a period. However, often the APIs are only provisional within a release. Maybe we need to think of ways of expanding this kind of model over a longer period. For example, with CVS, its probably *now* ok to go and open up more API, because the reality is, the stuff's not going to change a lot because for one we understand the problem well now, and for another nobody wants to mess with code that works. On the other hand, by now the copies have been made so who'd go back and refactor to use the new API? (Assuming we can actually either remember or detect the copies).

Kevin

John Arthorne to E4

James Blackburn wrote on 10/16/2008 05:42:40 AM:

Eclipse then has a org.eclipse.core.internal.utils.StringPoolJob which
calls 'shareStrings(StringPool)' on participating
IStringPoolParticipants. Given that StringPool.add() uses a HashMap
to reimplement String.intern() I wonder what the performance
difference is between StringPool and String.intern()…

As Markus mentions, String.intern() performance was known to degrade as the interned pool got very large. This may be improved in newer VMs. The other problem is that interned strings were traditionally not garbage collected. There is some debate in the VM community over whether garbage collection of interned strings is legal according to the language spec. The Sun VM now garbage collects interned strings, but this isn't true for other VMs. This lack of garbage collection makes String#intern() inappropriate for any strings that may not be around forever (such as resource names, marker attributes, etc). The StringPool concept works a bit differently from interning: it is used in a background task that periodically walks over all strings and uniquifies them in the pool. After the pass is completed, the pool object is discarded. This avoids any extra memory overhead, as you would get with weak references.

As for the multiple copies, the class is so trivial it didn't deserve promoting to API. Some better technique may appear down the road, or String#intern may become a better option, at which point we can get rid of them.

John

Oberhuber, Martin to E4

Whow,

this is growing into a really interesting discussion indeed.

We've identified several very different kinds of bloat, and I think that the best we can do for now is try to list them and think about ways to deal with them. I've tried to capture what we've discussed so far on the Wiki [1], feel free to edit! For some kinds of bloat, there is clearly a design decision to make between forces pulling in different directions (like promoting to API vs. keeping internal; or push-into-framework vs. pull-into-optional-extension).

One interesting thought here is, if we make it possible to mix-and-match an Eclipse based product on a finer granularity than today (e.g.: Take the Faceted Project Framework from WTP, but not all the rest of WTP), it might also help to reduce some unnecessary duplication. P2 might help getting the Installable Units more fine granular.

I tend to agree with Kevin that likely the most important place for reducing bloat is in unnecessary duplications of API. Coming up with a single recommended way of doing things will make it easier to code against Eclipse, and free our minds from remembering duplicate ways of doing things. We may need to keep backward compatibility layers around for a while, but if “the recommended way” of doing things is attractive, I'd hope that people will jump on that way soon.

One might call this “psychological problem”, I'd probably rather call it “understanding, adoption, correctness and maintenance problem” as opposed to the performance issues.

The other (performance) aspect of things is also interesting, but can probably be addressed later in the game.

[1] http://wiki.eclipse.org/E4/Pervasive_Themes#Reducing_Bloat

Cheers, – Martin Oberhuber, Senior Member of Technical Staff, Wind River Target Management Project Lead, DSDP PMC Member http://www.eclipse.org/dsdp/tm

Krzysztof Daniel to eclipse-incuba.

As it was said before there is a huge conflict of interests between developers & adopters.

It is a common guideline, almost requirement, to create new API only where there is at least one consumer. And this is a big problem. A consumer does not have to be an expert in particular area. His requirement may be just a part of bigger functionality, or some not-necessary-adequate point of view. More over, he probably tries to solve his problem, and does not care about quality of Eclipse solution (because commiters do it). So, commiters analyze, code, test, analyze again, discuss, create some more code, and…

…API is finished when there is no more time (This is a lesson learned from API workshop on last EclipseCon).

What happens next? New adopters arrive. Adopters of stable releases, which are believed to be well designed and stable (and they are indeed in most cases). The real, big feedback appears, and API evolves, but due to strict rules it is necessary to maintain binary & contract compatibility.

I believe this is a problem - that the true feedback & adoption occurs after the API is frozen.

Yes, I agree with some previous posts: we certainly need API evolution approach in longer than release cycle and more feedback about provisional API.

I think it would be good to allow provisional API in Eclipse releases and make it stable if the changes during new cycle are small enough. Of course this solution has certain disadvantages - some code will be unstable despite it is public. At this point we could encourage/force adopters to give us feedback, It is for their good - the more feedback they give the more chances some functionality will graduate to API.

We could thing also about automatic refactoring scripts or some refactoring tools that would support upgrading to next release. Or just point to critical places in the code and indicate what should be done.

– Christopher Daniel Technical Support Engineer Eclipse Support Center IBM Software Group

Oberhuber, Martin to E4

Hi Krzysztof,

I'm afraid that I don't understand what you want to say.

The concept of provisional API in Eclipse exists today. We have provisional API in releases – For Debugging, for instance, some provisional API to support better customization of views has been committed in Eclipse 3.2 and is still provisional as of today. See slide 4 of the EclipseCon 2008 DD presentation [1] for reference. There are even clients of the provisional API, but it'll be promoted to official finalized API only once everybody agrees that it's the right way to go.

The point to be clear about is, that provisional API will always live in “internal” packages [2]. Once the API promotes to public, existing clients of the provisional API need to be refactored to the new (non- internal) namespace. This need for refactoring might be one of the reasons why some clients are reluctant to adopt provisional API. Perhaps we'll need to better educate clients here, that this kind of simple rename refactoring doesn't really hurt that much. The only thing that it truly requires, is that clients which adopt provisional API need to be developed along with upcoming Eclipse releases: When a release makes the API public, the client needs to be updated at the same time as Eclipse in order to support it.

Note that as per the Eclipse Guidelines, all packages need to be exported [3], internal or not, so everybody can adopt them. This seems to be exactly what you are requesting?

I agree that we need more feedback on API while it is still provisional, do you have any ideas for better soliciting such feedback?

[1] EclipseCon 2008 Device Debugging (DD) Update

   http://www.eclipsecon.org/2008/index.php?page=sub/&id=45
   http://www.eclipsecon.org/2008/sub/attachments/Device_Debugging_Project_Update.pdf

[2] http://wiki.eclipse.org/Provisional_API_Guidelines

[3] http://wiki.eclipse.org/Export-Package

Cheers, – Martin Oberhuber, Senior Member of Technical Staff, Wind River Target Management Project Lead, DSDP PMC Member http://www.eclipse.org/dsdp/tm

Krzysztof Daniel to E4

Hi Martin,

The sources you have mentioned also say: [2] After the API freeze, there is really no such thing as “provisional API”. Either it is complete and committed platform API, or it is internal code. Moreover it reads: [2] Technically, a provisional API can change arbitrarily or be removed at any time without notice. I believe that this document indicates that there is strong “API”-“Internal” division after the release (and maybe this is only my interpretation).

In the [4] I have found interesting table:

Platform API	yes	yes	yes	yes	yes	public
Provisional	yes	yes	yes	yes	not quite	public

The table is bigger of course, but only those two cases are important for us. Note the public package in the provisional API. It is contradictory to the [2] unless provisional API becomes API after every release.

I am happy to see that someone is already using provisional API independent from release cycle, even happier that The TPTP project uses Provisional APIs for all new API introductions: the new API is released as provisional in release X and then hardened into platform APIs in release X+1. (4)

Maybe it would be good to ask TPTP team how this approach works in real environment?

What I'd expect from Eclipse is to adopt slightly modified TPTP strategy - harden provisional API when it has not changed for one development cycle. Provisional API should be in public packages and has clear javadoc that would:

* warn the user that the API *may* change. * encourage the user to register to particular mailing list/watch some wiki pages

On the other hand we could think about support from API tools, that would throw warnings/errors when the client access provisional API, and about completely new tool that would ask the user to register on mailing list, display messages from developer etc.

[4] http://wiki.eclipse.org/Eclipse_Quality

Example: Commiter A creates: package org.eclipse.component;

/** * This is provisional API. It is believed to be stable, but still may change. For your own good subscribe to the list component-dev-provisional to be notified about changes. * @provisional * @since 3.5 */ public interface IInterface{

      public void method1();
      public void method2();

}

Now some client implements that interface. It is warned (by API tools) and prompted to register to the newsgroup. During the development cycle we may ask clients if they got what they expected and if we can do anything better.

Eclipse 3.5 is released. A lot of people uses that interface, and it appears that third method is necessary. So the commiter adds method3(), posts to concrete mailing list and all customers are informed about compatibility breaking change. Also steps necessary to adapt new method are described (and in the future maybe some more advanced refactoring scripts).

Eclipse 3.6 is released with that change, than 3.7 without change, so the @provisional tag should be deleted and… no refactoring at that point .

I hope that my vision is easier to understand now .

– Christopher Daniel Technical Support Engineer Eclipse Support Center IBM Software Group

Oberhuber, Martin to E4

Hello Krzysztof,

[2] After the API freeze, there is really no such thing as “provisional API”. Either it is complete and committed platform API, or it is internal code. Hm… I know this passage and agree that it is not in line with my current thinking. Perhaps Boris, John or McQ could explain? For me, it seems OK to have provisional API live in “internal” packages even across multiple releases. Just assuming that some provisional API was proposed and just didn't get ready in time. This can't be a reason for never ever working on promoting it in the future? The passage cited does not seem logical, and I'm wondering if it could be rewritten for clarification.

[2] Technically, a provisional API can change arbitrarily or be removed at any time without notice. Yes, I agree that this doesn't sound friendly for provisional API adopters at all, and does not help encouraging early adoption. I think that we should consider more helpful policies, such as

* Provisional API can be removed at any milestone build, provided that it has been marked @deprecated in the previous milestone build.

Discussing such policies for provisional API (making a recommendation for all Eclipse projects to adopt them) would be a good topic for the Architecture Council. The goal of the discussion should be increasing early adoption and feedback for provisional API. I'd just like to get some initial feedback on this idea before I file a bug on the Architecture Council component (well in fact anybody can file such a bug).

Note the public package in the provisional API. It is contradictory to the [2]

I assume that in the cited table, the “public” means that the packages are exported and publicly visible (which is recommended for EVERY package including those that have an “internal” segment in their package name. In this light, this isn't a contradiction.

new API is released as provisional in release X and then hardened into platform APIs in release X+1.

Well that also seems too blindly following a principle. The API is ready when it is ready and shouldn't be promoted just because it's old. A better description should be

 "We have a constant project plan item to work on hardening API that has
   been introduced as provisional in the previous release".

which says that it's an item to work on, regardless of to what extent the goal is reached.

Provisional API should be in public packages and has clear javadoc that would: But that's exactly what's happening. The packages are public today. They just have some “internal” segment in their name. Perhaps what you're missing is some clear indication in the package name that separates provisional API from all the wade of other internal stuff. That's something worth discussing and specifying. As far as I know, some projects have adopted a naming scheme like

 org.eclipse.platform.internal.core.provisional.api

which is lengthy but does provide the requested separation. I'm against placing provisional API into non-internal packages since it would way too easily be taken for hardened. Although some API Tooling tags could perhaps help here.

Coming up with a generally recommended naming scheme for provisional API might be worth another bug on the Architecture Council component for discussion.

Thanks for bringing up the issue and being persistent. I think that we're touching on some important issues here, and I'm looking forward to the followup discussions.

Cheers, – Martin Oberhuber, Senior Member of Technical Staff, Wind River Target Management Project Lead, DSDP PMC Member http://www.eclipse.org/dsdp/tm

John Arthorne to E4

The difference in policy between [2] and [4] is because those documents are written by different people for different audiences. [2] was written by me at the request of the Eclipse Project PMC to codify our guidelines (which had been around much longer but never written down). Those guidelines were only intended to be for the Eclipse top-level project. I don't know the history of document [4], but I suspect it was written by the Foundation with the intent to provide guidelines that all top-level projects could live with. I don't know what projects follow them, but I know some projects put provisional API in non-internal packages. There was never broad agreement across projects on this point which likely led to that compromise table with several different flavours of pseudo-API.

[2] After the API freeze, there is really no such thing as “provisional API”. Either it is complete and committed platform > API, or it is internal code.
Hm… I know this passage and agree that it is not in line with my current
thinking. Perhaps Boris, John or McQ could explain?

API is primarily a contract that implies long term support and a promise of compatibility across releases. If something is provisional and subject to change, then there is no implied support or compatibility, so it is not API. This doesn't mean it can't become API at some point in the future. We certainly do have packages with “*.internal.provisional.*” in one release, which is then validated and polished into real API in the next release. But, at the time the package is “internal.provisional”, it is internal code and subject to change. I'm happy to clarify it, but I don't understand what is unclear or illogical about it.

John

Oberhuber, Martin to E4

I'm happy to clarify it, but I don't understand what is unclear or illogical about it.

I think it's a matter of defining what “provional API” means. In my opinion, it could also be named “API under construction”. Given that the thing under construction was not promoted into API at some release, does not mean that all workers are pulled off the construction area and it's left in its current non-finished state forever. It also doesn't mean that the process of soliciting community input is stopped at this point and the area turns into something internal that only committers should care about.

I'm really trying to come up with a process for this API construction work that helps soliciting input, by helping early adopters at all stages, clarifying (and unifying) the policies based on what we think are best practices.

From that point of view, I'd think that when the team (committers + community) cannot agree on freezing some API at one point, they should likely continue to work on things until they are happy. In other words, provisional API may remain – in the state that a project is using for provisional API, which is “internal.provisional” packages for the Eclipse project, with the promise (or rather non-promise) to change it at any time without notice.

As said, I'd like to file two AC bugs to further discuss it (one for the process, the other for the naming policy), if we get some consensus that it's a topic worth discussing. What do you think, John?

Cheers, – Martin Oberhuber, Senior Member of Technical Staff, Wind River Target Management Project Lead, DSDP PMC Member http://www.eclipse.org/dsdp/tm

John Arthorne to E4

As said, I'd like to file two AC bugs to further discuss it (one for the process, the other for the naming policy), if we get some consensus that it's a topic worth > discussing. What do you think, John?

Definitely worth discussing. I'm less convinced that the discussion will lead to consensus, but it's still worth bringing up. One valid issue brought up in the past is that projects in their early stages face very different demands and community pressures than mature projects, so there isn't necessarily a one-size-fits-all policy. But this makes it an ideal discussion on the arch council where there is representation from all projects.

John

Boris Bokowski to E4

What we (the Eclipse Platform team) call API is defined in this article: http://www.eclipse.org/articles/article.php?file=Article-API-Use/index.html

In particular, packages with “internal”, “examples”, or “tests” is not considered API.

APIs are interfaces with _specified_ and _supported_ behaviour. Specified in the sense that they come with Javadoc that explains how the APIs are supposed to work, and supported in the sense that we will fix bugs, and honour the API specification.

Sometimes, the term “API contract” is used. But when you think about it, you will realize that you can only call something a contract if you know the parties involved in that contract. In a sense, the Platform APIs are more accurately described as a promise, because the Platform team, unilaterally, makes promises about the interfaces of their software components, namely that those interfaces will be supported, and only be evolved in a way that is binary compatible.

The situation becomes a little more complex when you start to think about software development over time, as opposed to just looking at a particular point in time. The promises as described above only apply to releases of our software. During the development cycle, all bets are off (i.e. interfaces may change at any time, week-to-week binary compatibility is not an issue). Of course, we try to do what makes sense from an engineering standpoint and don't change interfaces willy-nilly - so in practice, interfaces don't change that much during development. In fact, towards the end of a development cycle, we have what we call the “API freeze”, usually M6, after which we exert tight control over which aspects of our interfaces are still allowed to change.

We have tried to avoid calling interfaces “API” when they do not come with specified and supported behaviour, in an attempt to avoid confusion. If you say “provisional API”, you probably mean “specified, but not supported in the same way as real API”, which is somewhat nebulous in the absence of a more concrete definition.

One potential definition is the following: use the name segment “internal.provisional” in packages with specified interfaces, when the component owners are not making support promises, or at best very little promises, such as they will not remove the interfaces in maintenance releases. Of course, these interfaces are not considered API in the strong sense.

Another potential definition would be if “provisional API” was not a unilateral promise, but a bilateral contract between known parties. In this case, it would become possible to negotiate the terms, for example, for how long will this interface be supported? Will it be evolved in a binary compatible way? Etc. (Note that the terms would only apply to those parties who explicitly entered into that contract - no free rides.)

It is important to realize that both potential definitions are only practical if the provider of the interfaces, and the consumer of the interfaces, have coordinated release cycles. The reason for this is that once you allow contracts like this, or less strong promises, provider components and consumer components cannot work together unless what they think of as their interface is the same. In other words, consumers would have to specify tighter version bounds for the bundles they depend on.

To sum up, you can “play these games” within a product, or perhaps between components that participate in the annual Eclipse release train. But for outside consumers, or commercial products built using Eclipse components, I don't see how interfaces that do not fall under the strong definition of “API” would be useful, or even usable.

Boris

Konstantin Komissarchik to E4

I have been enjoying reading this exchange. I was going to stay out, but when the subject turned to provisional API I decided to take the time to share some lessons learned the hard way at WTP. I am putting my thoughts in a blog post so that they can be more widely read. Take a look if you are interested. You might even learn what not to do. ;)

http://lt-rider.blogspot.com/2008/10/creating-api-lessons-learned.html

(From an earlier thread, but relevent)

Hi,

I think having IConfigurationElements mapped to actual Java objects is a very good idea. The Riena project is using that now for roughly 4 month with an implementation in its codebase that allows to

create Java objects from Extensions
define Interfaces for the ExtensionPoint schema
inject the Java Objects into any object that is interested in its information (making the using code independant of extensions but simply dependant on the interface object)
automatically re-injects the Objects if extensions are added or removed (by installing uninstalling bundles)
create java instances for those attributes where the type is java

Riena has defined an API that uses Extensions and OSGi Services in a very similar way. You can inject Services or Extensions using one API. We have a short not yet complete description in the wiki http://wiki.eclipse.org/Riena_Getting_Started_with_injecting_services_and_extensions and of course the code is in the latest M4 build of Riena (http://wiki.eclipse.org/Riena_Getting_started) and we have quite a number of Testcases to show how injecting Extensions and Services works using the API.

cheers

christian campo

Some time ago (in April) I described what I call MagicInterfaces:

http://dev.eclipse.org/mhonarc/lists/eclipse-incubator-e4-dev/msg00283.html http://dev.eclipse.org/mhonarc/lists/eclipse-incubator-e4-dev/msg00285.html

The basic idea is very similar to the reima service injection: Magic interfaces create adapters (using java.lang.reflect.Proxy) from interfaces to a given implementation of a kind of DOM like “datastores” (like IConfigurationElements, IDialogSettings, IMemento, ILaunchConfiguration, IPreferenceStore, … or even EMF).

It changes the focus: you define the interface and a factory generates at runtime implementations of the interfaces and adapts the interface to the underlying data structure. Instead of writing schemas you write interfaces. It should be possible to generate the schema from the interface. The advantage is that you don't have to worry about the underlying implementation. It is hidden. And you act in the world of java. Refactoring etc. works.

Once the old DOM like interface are only needed for backward compatibility, one can probably invert the adaption and generate (or probably hand write) the old generic interfaces EMF (or whatever is used as the underlying implementation).

What I really want to see is - I want ONE way to access all the different generic DOM like data structures - I don't want to generate code all the time. I want to use it “ad hoc” by going to a factory that creates the interfaces at runtime - I want to generate code if I feel like (e.g. when performance matters) - I want to access the data using interface when I feel like - I want to access the same data reflective when it is appropriate - I want to be able to subclass existing interfaces

EMF provides some of this and something like the reima injection service or MagicInterfaces provide the missing pieces.

Is this Bloat? Maybe in the beginning. As the old interfaces retire the bloat might go away. It is no bloat when it comes to the user. All you do is define some interfaces and some magic service binds it to the implementation….. Minimal dependencies to any implementation. The small gate is the factory that injects the implementation.

Michael

Data in Motion

User Tools

Site Tools

Sidebar

Page Tools