Guide to Removing Unwanted Files and Folders from GitHub Repository

Guide to Removing Unwanted Files And Folders from GitHub Repository

Introduction

While working on any project, developers of today are likely to use GitHub to maintain the code and control edits and deployment. For this, they create various commits with files or even entire folders that aren’t relevant to the project, but instead contain code that may end up creating vulnerabilities for the project itself. They may also sometimes include confidential company information, and leaking this online can be disastrous.

To help prevent such incidents, our DEV IT specialist has prepared this blog post to help you remove files and folders from your GitHub repository.

Guiding steps to perform this activity

Note: People with write permissions to a repository can rename a branch in the repository. People with admin permissions can rename the default branch.

There are 2 possible ways to remove files / folders from the repository.

  1. Standard GitHub commands
  2. Using BFG Repo-Cleaner (like application)

Let’s see how the first method works to remove files / folders from the repository.

1. Using standard GitHub command line:

  • Clone repository to you machine: git clone repo_URL
  • Go to your repository folder: cd repo_folder
  • for remote in `git branch -r | grep -v /HEAD`; do git checkout –track $remote ; done

Note: repeat steps d and e for each folder in sequence

  • git filter-branch –index-filter ‘git rm -rf –cached –ignore-unmatch

     DIRECTORY_NAME/’ –prune-empty –tag-name-filter cat — –all

  • git for-each-ref –format=”%(refname)” refs/original/ | xargs -n 1 git update-ref -d
  • rm -Rf .git/logs .git/refs/original
  • git gc –prune=all –aggressive

Note: Steps h & I will push all branches and tags for repositories.

  • git push origin –all –force
  • git push origin –tags –force
  • Check Repository size: git count-objects -vH

2. Second method is simple using the BFG Repo-Cleaner tool.

This method is simpler, faster alternative to git-filter-branch for cleansing bad data out of your Git repository history:

  1. Removing Crazy Big Files
  2. Removing Passwords, Credentials & other Private data

The git-filter-branch command is enormously powerful and can-do things that the BFG can’t – but the BFG is much better for the tasks above, because:

  • First clone your repository

     git clone –bare https://project/repository project-repository

  • Go to your repository folder

     cd project-repository

  • A. To remove folder use following command – java -jar bfg.jar –delete-folders DIRECTORY_NAME

      B. To remove files – java -jar bfg.jar –delete-files *.extension

  • The BFG will update your commits and all branches and tags, so they are clean, but it doesn’t physically delete the unwanted stuff. Examine the repo to make sure your history has been updated, and then use the standard git gc command to strip out the unwanted dirty data, which Git will now recognize as surplus to requirements:

     git reflog expire –expire=now –all && git gc –prune=now –aggressive

  • Push code back to your remote repository

     git push –mirror https://project/new-repository

Wrapping Up

After following all the steps mentioned above, you should be able to remove any unneeded files and folders from your GitHub repository. Please note that doing so will create new HASH for commits and history. If you get stuck somewhere, or have any queries regarding GitHub, please feel free to drop a comment or contact us on our website.