Repository Structure

JWPlatt · Post by **JWPlatt** » Thu Apr 28, 2011 11:02 pm

There has been some back and forth on repository structure and review process. I think we have a decent plan on structure, if not process, so let's discuss technical or user questions about structure in this thread. I was debating with myself whether to split the "Put the Python in the repo" thread as a start to this one, but how to split it now is beyond me. Please just read it as a prerequisite and primer about needed improvements.

viewtopic.php?f=91&t=539

We'll get to writing a wiki page on this as soon as we can. This thread should actually help with creating content for it. First, I want to share my understanding of rarified's current proposal for repo structure. I don't want to leave anyone wondering what's up. What follows is as concise an outline as I can make it, so I'm probably skipping a lot of detail, especially in the tutorial sense where a lot can be written. This is just to get the idea across.

One detail you should keep in mind is about the CWE default branch. Obviously, we're not just going to leave the default branch inoperable with missing SDKs. The default branch (sometimes referred to as the "openuru" branch) will be the functionally equivalent or compatible version of the Uru client Cyan distributes with MOULa and which also works with MOSS. The intent is to give you a functional image of MOULa.

The basic outline:

There will be only one repo per MOULa project - no server-side clones.
The MOULa projects are, currently, CWE (client, plugin) and MOULSCRIPT (Uru game scripts).
The "cyan" branch will have exactly what has been provided from Cyan Worlds upon release of open source.
The "moula" branch of any MOULa repo will exactly (MOULSCRIPT) or functionally (CWE) mirror what is running on MOULa, per Mark DeForest.
The "dev" branch will be the low-barrier branch where any developer can push their changesets for review and testing.
The "default" branch will contain anything that has been submitted by developers, reviewed and tested - the latest, stable code.

This strategy removes the separate repo server-side clone such as MOULSCRIPT-dev because what's there now will be in MOULSCRIPT's dev branch instead.

The OpenUru.org team will have commit to "cyan" (or any branch). I hope we will acquire more trusted leaders over time. I would expect the "default" branch committers to be those who have earned the merit. I suspect "dev" branch commit access would be available to any developer. As you can tell with 'expect' and 'suspect,' we're closing in pretty well on structure and process, but not so much yet on how to approach merit within the community. Some more reading in that book I've been pushing might reveal something reasonable for all of us. All branches would have guest/anonymous read access.

Mark does not seem averse to doing the pushes to the "cyan" or "moula" branches himself. The "default" changes, presumably, might be pulled by Mark, tested, approved, installed on Cyan's MOULa shard, then pushed to our "moula" branch, by Mark.

So what that means to you and me is that the team would merge changes from the "dev" branch to the "default" branch after a review/test. But we would not usually commit those changes to "moula" unless it's on the MOULa shard and Mark is too busy or something.

Concerning Crucible performance, any review process the reviewer(s) prefer should be fine. If we trust the person to have done the review and make a single post to that effect somewhere, though preferably once in Crucible for the record, that's good so long as it works for the purpose.

Christian Walther · Post by **Christian Walther** » Fri Apr 29, 2011 10:49 am

JWPlatt wrote:
The "dev" branch will be the low-barrier branch where any developer can push their changesets for review and testing.

Shouldn’t every contribution be done in its own branch, so that it can be reviewed and merged individually, rather than all on top of each other on the same branch? OK, there can be multiple heads on the same branch in Mercurial, but wouldn’t it still be easier to be able to refer to the heads by name than by revision ID (“dev-cwalther-cursors” instead of “47e836da7543”)?

Is this “dev” branch supposed to get regular merges from “default” (or whatever) so that contributions can start from something that’s close to what the contribution will eventually be merged into? Wouldn’t it be easier if contributions would branch off “default” directly?

Perhaps I just don’t understand how you imagine this to work. Can you show an example revision graph? Here’s what I would imagine (and also corresponds more or less to what people are doing over on GitHub). Notice that there is no “dev” branch:

Code: Select all

              o---o---o <- dev-cwalther-cursors
             /
o---o---o---o---o---o <- default
                 \
                  o---o <- dev-someone-something

after "something" is reviewed and merged, someone else has started another contribution, and cwalther has merged from default to keep his branch up to date:

                      o---o---o <- dev-someoneelse-somethingelse
                     /
              o---o-+--o------o <- dev-cwalther-cursors
             /      |        /
o---o---o---o---o---o---o---o---o <- default
                 \     /
                  o---o <- dev-someone-something

I don’t know if that works from an access control point of view when you want to give everyone push access to the same repository as “default” lives in.

JWPlatt · Post by **JWPlatt** » Fri Apr 29, 2011 12:07 pm

I'm okay with developer branches as you depict them. It is somethnig we discussed. I think rarified is concerned about creating a confusing mess in the graph. Organized is good. I think ACL works for your suggestions, but rarified is the Foundry expert. He'll be along...

rarified · Post by **rarified** » Fri Apr 29, 2011 1:54 pm

As JW pointed out, I did think about using a branch-per-project paradigm, and had concerns about the eventual complexity of the branch name space. That's the sole reason for not specifying that in the original proposal. But that's why it's a recommendation, and if you and others think it adds more value than it takes in complexity, then I'd be happy to try it.

However, consider a slightly different perspective for a minute. The notion of creating (or using) at the "root" repository branches for each contributor's project is somewhat in the "central repository" paradigm, ala Subversion. And it makes sense if everyone always uses the root repository for day-to-day development.

But in the case of Mercurial and Git, the repository really is distributed. Once I've cloned from the OpenUru repository my copy is able to support creation of new branches, handle commits, and other development activity in isolation. I was thinking that a developer would do most of his day-to-day repository activities locally, with a merge/push to the "dev" branch only when a particular project was ready for consideration of merging to the default branch. That way traffic (and collisions between projects) would not be very high in "dev", and other contributors would only have to pay attention to merge issues when a particular contribution was close to being completed.

You might point out that there might be multiple contributors to a project, and they would like to have access to a common repository. Again, completely correct, and if the OpenUru repository is the most convenient place to do that then a per-project branch is appropriate. But I also hoped that if a project had multiple contributors that they might coordinate their activities in a project-local repository that would be insulated from the OU repository activity (unless explicit resynching took place), and that the project-local repository sharing would be handled by the project participants. I felt that if such groups existed they would have the skills to share a repository. And in such a local repository the group could create their own branch (even calling it dev-<contributor>-<name>) for their development, and only attempting a merge to "dev" internally when ready to push up to OU.

One concern I have been wrestling with is the accumulation of insignificant delta information at the top level repository. I've stated on many occasions my desire to preserve the history of development and changes related to components we're managing. But there is a threshold of what's useful; is it helpful to keep deltas of the form "Oops, forgot to add closing brace" in the long-term history? So I've been thinking about asking for contributions to the "dev" branch to be distilled down to Hg patches that can be imported rather than full repository synchronizations. Or as I suggested above, have contributors develop on private "dev-X-Y" branches, and only merge and commit to the "dev" branch at milestones where they expect things to be ready. I'm open to either preferences or alternatives.

I know this post isn't a firm decision with "do it this way", but I'm not omniscient and don't assume I can think of everything from the onset. Let's give this topic a couple more days to have people contribute and we'll select a refinement of what's here at that point.

_R

JWPlatt · Post by **JWPlatt** » Fri Apr 29, 2011 2:03 pm

At the same time, we have also asked for incremental changesets for ease of breaking up reviews rather than reviewing entire projects. There would seem to be a compromise where where dev branches should be worked on locally, but pushed as the smaller pieces are ready for consideration. That's what "distributed" is all about anyway, as you point out. Intimate development groups can manage things however they want, then push here.

Nye_Sigismund · Post by **Nye_Sigismund** » Fri Apr 29, 2011 2:20 pm

One thing that might not have been said yet is communication. An advantage of something like Git is pages like this:

https://github.com/H-uru/Plasma/network

Any developer can simply look at the network and go "Ah, so x is doing this, y is doing that". I'm not sure how that could be done here, other than judious use of this forum, which to be fair isn't that bad a way of doing things.

I think I can back lots of developer branches based on that idea alone. If everyone works locally or on independent clones, it's far harder to find out what's going on and what you can do, and that kills a desire to participate. Besides, programmers are an individualistic bunch.

JWPlatt · Post by **JWPlatt** » Fri Apr 29, 2011 2:51 pm

If programmers are individualistic, the larger the development community becomes, the more distributed it will be and the less likely it is that everyone is going to agree that Bitbucket or github are da bomb. The tracking and community tools you mention are something I want - to be able see the big picture at a glance. But the picture is only as big as the particular service and there's an inevitability of fanout. So while the repo services are very helpful to those of like mind, people will break off into their own groups and use their own services. That's okay and expected; we can't tell people what to use, nor do we want to. We can only recommend. Personally, I've been recommending Bitbucket/Mercurial. We do have an account there and will be testing things.

rarified · Post by **rarified** » Fri Apr 29, 2011 5:23 pm

Nye_Sigismund wrote:One thing that might not have been said yet is communication. An advantage of something like Git is pages like this: ...

Well, to be precise, that is not a feature of Git per se, but rather the Github environment that utilizes Git. (Although to the casual user the distinction is probably irrelevant.) It's the same thing that Bitbucket provides for Mercurial repository projects. And we have encouraged people to propose uses for Bitbucket in the overall scheme of things.

But I'm not convinced that Fisheye and JIRA cannot provide the same capabilities. We currently have organized JIRA along project lines, but with tagging we could couple projects with similar characteristics such as "networking" so they're easily searched. And Fisheye supports general discussions coupled to code, not just reviews. No one has explored that.

Your point about categorization is a good one, I'll put on the ToDo list the task of exploring topical discussions linked to code.

_R

Christian Walther · Post by **Christian Walther** » Fri Apr 29, 2011 7:19 pm

rarified wrote:However, consider a slightly different perspective for a minute. The notion of creating (or using) at the "root" repository branches for each contributor's project is somewhat in the "central repository" paradigm, ala Subversion. And it makes sense if everyone always uses the root repository for day-to-day development.
…

Oh, I wasn’t proposing that contributors push upstream all the time while they’re at work. Of course they will use their own repositories and may have additional branches and different histories there than what they finally push (or request to be pulled) upstream.

rarified wrote:One concern I have been wrestling with is the accumulation of insignificant delta information at the top level repository. I've stated on many occasions my desire to preserve the history of development and changes related to components we're managing. But there is a threshold of what's useful; is it helpful to keep deltas of the form "Oops, forgot to add closing brace" in the long-term history?

Probably not, but it’s easy to request that such commits be squashed before going upstream.

rarified wrote:So I've been thinking about asking for contributions to the "dev" branch to be distilled down to Hg patches that can be imported rather than full repository synchronizations. Or as I suggested above, have contributors develop on private "dev-X-Y" branches, and only merge and commit to the "dev" branch at milestones where they expect things to be ready.

I’m OK with patches as long as they’re “augmented” patches that preserve commit metadata, e.g. from “hg export”. Not plain “diff” output. But then there’s really no difference to push/pull (other than that the originator doesn’t have to have their repository publicly accessible, but with the existence of Bitbucket or maybe a similar service here that’s probably not an issue).

Well, we can try the “dev” branch idea (I think I understand it now – it’s basically deliberate serialization of contributions). If activity here remains as low as it has been for the last three weeks it will probably work fine. If activity should rise to the level sustained in the H-uru fork, I expect it to be a bottleneck. But maybe that’s what we want.

Nye_Sigismund wrote:One thing that might not have been said yet is communication. An advantage of something like Git is pages like this:

https://github.com/H-uru/Plasma/network

Any developer can simply look at the network and go "Ah, so x is doing this, y is doing that". I'm not sure how that could be done here, other than judious use of this forum, which to be fair isn't that bad a way of doing things.

I agree that the GitHub network graph is useful, but I think we’re not too far away from having that here. If you pull from all repositories you’re interested in, any Mercurial GUI can give you such a graph. The only difference is that you need to do the pulling manually, while GitHub automatically takes into account all repositories (those it knows because they’re also on GitHub and marked as forks). Is it that part that you’re missing?

Repository Structure

Repository Structure

Re: Repository Structure

Re: Repository Structure

Re: Repository Structure

Re: Repository Structure

Re: Repository Structure

Re: Repository Structure

Re: Repository Structure

Re: Repository Structure