Migrating source code is a pain in the butt, I know. There are about 9 million variations, and one of interest to me – git to github.com.
There are a number of tools to clean up your git history and prepare to move.
- Git and Scripting
- BFG Repo Cleaner
- Git-Python
- JGit
I found Git-Python a bit cumbersome, BFG Repo Cleaner more than I needed/wanted, and Git / Scripting too much work. After some prototyping, I opted for JGit from Eclipse and some Git knowhow.
First, I switched to the source Git Repo branch I wanted to migrate and exported the commit list.
git rev-list HEAD > commits.txt
which results in
7452e8eb1f287e2ad2d8c2d005455197ba4183f2
baac5e4d0ce999d983c016d67175a898f50444b3
2a8e2ec7507e05555e277f214bf79119cda4f025
This commits.txt is useful down the line.
I am a Maven disciple so, I created a maven java project with Java 1.8 and the following dependencies:
<dependency>
<groupId>org.eclipse.jgit</groupId>
<artifactId>org.eclipse.jgit</artifactId>
<version>${jgit.version}</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>20.0</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-nop</artifactId>
<version>1.7.25</version>
</dependency>
I used the JGit to check the list of commits (note the REPO here must have .git at end).
try (Git git = Git.open(new File(SOURCE_GIT_REPO))) {
printHeaderLine();
System.out.println("Starting Branch is " + git.getRepository().getBranch());
Iterator<RevCommit> iter = git.log().call().iterator();
while (iter.hasNext()) {
RevCommit commit = iter.next();
String binSha = commit.name();
commits.add(binSha);
}
}
I flip it around, so I can process OLDEST to NEWEST
Collections.reverse(commits);
I used the git log (LogCommand in JGit) to find out all the times a FILE was modified, and do custom processing:
try (Git git = Git.open(new File(REPO))) {
LogCommand logCommand = git.log().add(git.getRepository().resolve(Constants.HEAD)).addPath(fileName.replace(REPO, ""));
Set<String> years = new HashSet<>();
for (RevCommit revCommit : logCommand.call()) {
Instant instant = Instant.ofEpochSecond(revCommit.getCommitTime());
// YOUR PROCESSING
}
}
}
To find out the files specifically in the HEAD of the repo, gets the files and paths, and puts it in a List
try (Git git = Git.open(new File("test/.git"))) {
Iterator<RevCommit> iter = git.log().call().iterator();
if (iter.hasNext()) {
RevCommit commit = iter.next();
try (RevWalk walk = new RevWalk(git.getRepository());) {
RevTree tree = walk.parseTree(commit.getId());
try (TreeWalk treeWalk = new TreeWalk(git.getRepository());) {
treeWalk.addTree(tree);
treeWalk.setRecursive(true);
while (treeWalk.next()) {
headFiles.add(treeWalk.getPathString());
}
}
}
}
}
}
I built a history of changes.
try (Git git = Git.open(new File("test/.git"))) {
Iterator<RevCommit> iter = git.log().call().iterator();
while (iter.hasNext()) {
RevCommit commit = iter.next();
try (DiffFormatter df = new DiffFormatter(DisabledOutputStream.INSTANCE);) {
df.setRepository(git.getRepository()); df.setDiffComparator(RawTextComparator.DEFAULT);
df.setDetectRenames(true);
CommitHistoryEntry.Builder builder = CommitHistoryEntry.builder().binsha(commit.name()).commitTime(commit.getCommitTime()).authorEmail(commit.getAuthorIdent().getEmailAddress()).shortMessage(commit.getShortMessage()).fullMessage(commit.getFullMessage());
RevCommit[] parents = commit.getParents();
if (parents != null && parents.length > 0) {
List<DiffEntry> diffs = df.scan(commit.getTree(), parents[0]);
builder.add(diffs);
} else {
builder.root(true);
try (RevWalk walk = new RevWalk(git.getRepository());) {
RevTree tree = walk.parseTree(commit.getId());
try (TreeWalk treeWalk = new TreeWalk(git.getRepository());) {
treeWalk.addTree(tree);
treeWalk.setRecursive(true);
while (treeWalk.next()) {
builder.file(treeWalk.getPathString());
}
}
}
}
entries.add(entry);
}
}
I did implement the Visitor pattern to optimize the modifications to the commit details and cleanup any bad Identity mappings (folks had many emails and names which I unified) and cleanedup the email addresses.
Next, I created a destination git (all fresh and ready to go):
try (Git thisGit = Git.init().setDirectory(new File(REPO_DIR)).call()) {
git = thisGit;
}
One should make sure the path exists, and it doesn’t matter if you have files in it.
Commit the files in the git directory… you can commit without FILES!
CommitCommand commitCommand = git.commit();
// Setup the Identity and date
Date aWhen = new Date(entry.getCommitTime() * 1000);
PersonIdent authorIdent =
new PersonIdent(entry.getAuthorName(), entry.getAuthorEmail(), aWhen, TimeZone.getDefault());
commitCommand.setCommitter(authorIdent);
commitCommand.setAllowEmpty(true);
commitCommand.setAuthor(authorIdent);
commitCommand.setMessage(entry.getShortMessage());
commitCommand.setNoVerify(true);
commitCommand.setSign(false);
commitCommand.call();
Note, you can set to almost any point in time. As long as you don’t sign it, it’ll be OK. I don’t recommend this as a general practice.
To grab the file, you can do a tree walk, and resolve to the object ID.
try (TreeWalk treeWalk = new TreeWalk(git.getRepository());) {
treeWalk.addTree(tree);
treeWalk.setRecursive(true);
int localCount = 0;
while (treeWalk.next()) {
String fileName = treeWalk.getPathString();
ObjectId objectId = treeWalk.getObjectId(0);
ObjectLoader loader = git.getRepository().open(objectId);
String fileOutput = GIT_OUTPUT + "/" + binSha + "/" + fileNameWithRelativePath;
int last = fileOutput.lastIndexOf('/');
String fileOutputDir = fileOutput.substring(0, last);
File dir = new File(fileOutputDir);
dir.mkdirs();
// and then one can the loader to read the file
try (FileOutputStream out =
new FileOutputStream(GIT_OUTPUT + "/" + binSha + "/"
+ fileNameWithRelativePath);) {
// System.out
byte[] bytes = loader.getBytes();
if (hasBeenModified(bytes, fileNameWithRelativePath)) {
loader.copyTo(out);
count++;
result = true;
}
}
Note, I did check if the file was duplicate, it saved a couple of steps.
If you want to add files, you can set:
commitCommand.setAll(addFiles);
git.add().addFilepattern(file).call();
Git in the background builds the DIFFs for any file that is not specially treated as binary in the .gitattributes file.
For each commit, I loaded the file – checking for stop-words, checked the copyright header, check the file type, and compared it against the head .
Tip, if you need to reset from a bad test.
git reset --hard origin # reset the branch
rm -rf .git # reset the repo (also be sure to remove the files)
The moving of the branch, you can execute
cd <GIT_REPO>
git checkout <BRANCH_TO_MIGRATE>
git reset --hard origin
git pull
git gc --aggressive --prune=now
git push git@github.com:<MyOrg>/<DEST_REPO>.git <BRANCH_TO_MIGRATE>:master
Note, I did rename the master branch in the repo prior. Voila. 550M+ Repo moved and cleaned up.
The repo is now migrated, and up-to-date. I hope this helps you.
References
Rename Branch in Git
https://multiplestates.wordpress.com/2015/02/05/rename-a-local-and-remote-branch-in-git/
Rewrite History
https://help.github.com/en/articles/removing-sensitive-data-from-a-repository
https://stackoverflow.com/questions/tagged/git-rewrite-history
BFG Repo Cleaner
https://github.com/rtyley/bfg-repo-cleaner
https://rtyley.github.io/bfg-repo-cleaner/
JGit
https://www.vogella.com/tutorials/JGit/article.html
https://github.com/eclipse/jgit
http://wiki.eclipse.org/JGit/User_Guide#Repository
https://www.programcreek.com/java-api-examples/?class=org.eclipse.jgit.revwalk.RevWalk&method=parseCommit
https://www.eclipse.org/forums/index.php/t/213979/
https://stackoverflow.com/questions/46727610/how-to-get-the-list-of-files-as-a-part-of-commit-in-jgit
https://github.com/centic9/jgit-cookbook/blob/master/src/main/java/org/dstadler/jgit/porcelain/ListNotes.java
https://stackoverflow.com/questions/9683279/make-the-current-commit-the-only-initial-commit-in-a-git-repository
https://stackoverflow.com/questions/40590039/how-to-get-the-file-list-for-a-commit-with-jgit
https://doc.nuxeo.com/blog/jgit-example/
https://github.com/centic9/jgit-cookbook/blob/master/src/main/java/org/dstadler/jgit/api/ReadFileFromCommit.java
https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit.test/tst/org/eclipse/jgit/api/AddCommandTest.java
https://stackoverflow.com/questions/12734760/jgit-how-to-add-all-files-to-staging-area
https://github.com/centic9/jgit-cookbook/blob/master/src/main/java/org/dstadler/jgit/porcelain/DiffFilesInCommit.java