Migrating Source Code Git-to-Git

Migrating source code is a pain in the butt, I know.  There are about 9 million variations, and one of interest to me – git to github.com. 

There are a number of tools to clean up your git history and prepare to move.

  • Git and Scripting
  • BFG Repo Cleaner
  • Git-Python
  • JGit

I found Git-Python a bit cumbersome, BFG Repo Cleaner more than I needed/wanted, and Git / Scripting too much work. After some prototyping, I opted for JGit from Eclipse and some Git knowhow.

First, I switched to the source Git Repo branch I wanted to migrate and exported the commit list.

git rev-list HEAD > commits.txt

which results in

7452e8eb1f287e2ad2d8c2d005455197ba4183f2

baac5e4d0ce999d983c016d67175a898f50444b3

2a8e2ec7507e05555e277f214bf79119cda4f025

This commits.txt is useful down the line.

I am a Maven disciple so, I created a maven java project with Java 1.8 and the following dependencies:

        <dependency>

            <groupId>org.eclipse.jgit</groupId>

            <artifactId>org.eclipse.jgit</artifactId>

            <version>${jgit.version}</version>

        </dependency>

        <dependency>

            <groupId>com.google.guava</groupId>

            <artifactId>guava</artifactId>

            <version>20.0</version>

        </dependency>

        <dependency>

            <groupId>org.slf4j</groupId>

            <artifactId>slf4j-nop</artifactId>

            <version>1.7.25</version>

        </dependency>

I used the JGit to check the list of commits (note the REPO here must have .git at end).

try (Git git = Git.open(new File(SOURCE_GIT_REPO))) {

            printHeaderLine();

            System.out.println("Starting Branch is " + git.getRepository().getBranch());

            Iterator<RevCommit> iter = git.log().call().iterator();

            while (iter.hasNext()) {

                RevCommit commit = iter.next();

                String binSha = commit.name();

                commits.add(binSha);

            }

        }

I flip it around, so I can process OLDEST to NEWEST

        Collections.reverse(commits);

I used the git log (LogCommand in JGit) to find out all the times a FILE was modified, and do custom processing:

try (Git git = Git.open(new File(REPO))) {

            LogCommand logCommand = git.log().add(git.getRepository().resolve(Constants.HEAD)).addPath(fileName.replace(REPO, ""));

            Set<String> years = new HashSet<>();

            for (RevCommit revCommit : logCommand.call()) {

                Instant instant = Instant.ofEpochSecond(revCommit.getCommitTime());

// YOUR PROCESSING

            }

        }

}

To find out the files specifically in the HEAD of the repo, gets the files and paths, and puts it in a List

try (Git git = Git.open(new File("test/.git"))) {

            Iterator<RevCommit> iter = git.log().call().iterator();

            if (iter.hasNext()) {

                RevCommit commit = iter.next();

                try (RevWalk walk = new RevWalk(git.getRepository());) {

                    RevTree tree = walk.parseTree(commit.getId());

                    try (TreeWalk treeWalk = new TreeWalk(git.getRepository());) {

                        treeWalk.addTree(tree);

                        treeWalk.setRecursive(true);

                        while (treeWalk.next()) {

                            headFiles.add(treeWalk.getPathString());

                        }

                    }

                }

            } 

        }

}

I built a history of changes.

try (Git git = Git.open(new File("test/.git"))) {

            Iterator<RevCommit> iter = git.log().call().iterator();

            while (iter.hasNext()) {

                RevCommit commit = iter.next();

                try (DiffFormatter df = new DiffFormatter(DisabledOutputStream.INSTANCE);) {

                    df.setRepository(git.getRepository());                    df.setDiffComparator(RawTextComparator.DEFAULT);

                    df.setDetectRenames(true);


                    CommitHistoryEntry.Builder builder =                            CommitHistoryEntry.builder().binsha(commit.name()).commitTime(commit.getCommitTime()).authorEmail(commit.getAuthorIdent().getEmailAddress()).shortMessage(commit.getShortMessage()).fullMessage(commit.getFullMessage());

                    RevCommit[] parents = commit.getParents();

                    if (parents != null && parents.length > 0) {

                        List<DiffEntry> diffs = df.scan(commit.getTree(), parents[0]);

                        builder.add(diffs);

                    } else {

                        builder.root(true);

                        try (RevWalk walk = new RevWalk(git.getRepository());) {

                            RevTree tree = walk.parseTree(commit.getId());

                            try (TreeWalk treeWalk = new TreeWalk(git.getRepository());) {

                                treeWalk.addTree(tree);

                                treeWalk.setRecursive(true);

                                while (treeWalk.next()) {                                   

builder.file(treeWalk.getPathString());

                                }

                            }

                        }

                    }

                    entries.add(entry);

                }

            }

I did implement the Visitor pattern to optimize the modifications to the commit details and cleanup any bad Identity mappings (folks had many emails and names which I unified) and cleanedup the email addresses.

Next, I created a destination git (all fresh and ready to go):

try (Git thisGit = Git.init().setDirectory(new File(REPO_DIR)).call()) {

   git = thisGit;

     }

One should make sure the path exists, and it doesn’t matter if you have files in it.

Commit the files in the git directory… you can commit without FILES!

CommitCommand commitCommand = git.commit();

        // Setup the Identity and date

        Date aWhen = new Date(entry.getCommitTime() * 1000);

        PersonIdent authorIdent =

                new PersonIdent(entry.getAuthorName(), entry.getAuthorEmail(), aWhen, TimeZone.getDefault());

        commitCommand.setCommitter(authorIdent);

        commitCommand.setAllowEmpty(true);

        commitCommand.setAuthor(authorIdent);

        commitCommand.setMessage(entry.getShortMessage());

        commitCommand.setNoVerify(true);

        commitCommand.setSign(false);

        commitCommand.call();

Note, you can set to almost any point in time.  As long as you don’t sign it, it’ll be OK.  I don’t recommend this as a general practice.

To grab the file, you can do a tree walk, and resolve to the object ID.

try (TreeWalk treeWalk = new TreeWalk(git.getRepository());) {

                            treeWalk.addTree(tree);

                            treeWalk.setRecursive(true);

                            int localCount = 0;

                            while (treeWalk.next()) {

                                String fileName = treeWalk.getPathString();

ObjectId objectId = treeWalk.getObjectId(0);

            ObjectLoader loader = git.getRepository().open(objectId);

            String fileOutput = GIT_OUTPUT + "/" + binSha + "/" + fileNameWithRelativePath;

            int last = fileOutput.lastIndexOf('/');

            String fileOutputDir = fileOutput.substring(0, last);

            File dir = new File(fileOutputDir);

            dir.mkdirs();

            // and then one can the loader to read the file

            try (FileOutputStream out =

                    new FileOutputStream(GIT_OUTPUT + "/" + binSha + "/"

                            + fileNameWithRelativePath);) {

                // System.out

                byte[] bytes = loader.getBytes();

                if (hasBeenModified(bytes, fileNameWithRelativePath)) {

                    loader.copyTo(out);

                    count++;

                    result = true;

                }

            }

Note, I did check if the file was duplicate, it saved a couple of steps.

If you want to add files, you can set:

commitCommand.setAll(addFiles);

git.add().addFilepattern(file).call();

Git in the background builds the DIFFs for any file that is not specially treated as binary in the .gitattributes file.

For each commit, I loaded the file – checking for stop-words, checked the copyright header, check the file type, and compared it against the head . 

Tip, if you need to reset from a bad test.

git reset --hard origin # reset the branch

rm -rf .git # reset the repo (also be sure to remove the files)

The moving of the branch, you can execute

cd <GIT_REPO>

git checkout <BRANCH_TO_MIGRATE>

git reset --hard origin

git pull

git gc --aggressive --prune=now

git push git@github.com:<MyOrg>/<DEST_REPO>.git <BRANCH_TO_MIGRATE>:master

Note, I did rename the master branch in the repo prior.  Voila.  550M+ Repo moved and cleaned up.

The repo is now migrated, and up-to-date.  I hope this helps you.

References

Rename Branch in Git

https://multiplestates.wordpress.com/2015/02/05/rename-a-local-and-remote-branch-in-git/

 

Rewrite History

https://help.github.com/en/articles/removing-sensitive-data-from-a-repository

https://stackoverflow.com/questions/tagged/git-rewrite-history

 

BFG Repo Cleaner

https://github.com/rtyley/bfg-repo-cleaner

https://rtyley.github.io/bfg-repo-cleaner/

 

JGit

https://www.vogella.com/tutorials/JGit/article.html

https://github.com/eclipse/jgit

http://wiki.eclipse.org/JGit/User_Guide#Repository

https://www.programcreek.com/java-api-examples/?class=org.eclipse.jgit.revwalk.RevWalk&method=parseCommit

https://www.eclipse.org/forums/index.php/t/213979/

https://stackoverflow.com/questions/46727610/how-to-get-the-list-of-files-as-a-part-of-commit-in-jgit

https://github.com/centic9/jgit-cookbook/blob/master/src/main/java/org/dstadler/jgit/porcelain/ListNotes.java

https://stackoverflow.com/questions/9683279/make-the-current-commit-the-only-initial-commit-in-a-git-repository

https://stackoverflow.com/questions/40590039/how-to-get-the-file-list-for-a-commit-with-jgit

https://doc.nuxeo.com/blog/jgit-example/

https://github.com/centic9/jgit-cookbook/blob/master/src/main/java/org/dstadler/jgit/api/ReadFileFromCommit.java

https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit.test/tst/org/eclipse/jgit/api/AddCommandTest.java

https://stackoverflow.com/questions/12734760/jgit-how-to-add-all-files-to-staging-area

https://github.com/centic9/jgit-cookbook/blob/master/src/main/java/org/dstadler/jgit/porcelain/DiffFilesInCommit.java

 


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.