Development
Setting Up Development Environment
Prerequisites
Python 3.10 or higher
Git
pip and virtualenv
Clone the Repository
git clone https://github.com/yourusername/ffiec-data-collector.git
cd ffiec-data-collector
Create Virtual Environment
# Create virtual environment
python -m venv venv
# Activate on macOS/Linux
source venv/bin/activate
# Activate on Windows
venv\Scripts\activate
Install Development Dependencies
# Install package in editable mode with dev dependencies
pip install -e ".[dev,docs]"
Project Structure
ffiec-data-collector/
├── ffiec_data_collector/ # Main package
│ ├── __init__.py
│ ├── downloader.py # Core downloader implementation
│ └── thumbprint.py # Website validation system
├── docs/ # Documentation
│ ├── conf.py # Sphinx configuration
│ ├── index.rst # Documentation index
│ └── *.md # Documentation pages
├── examples/ # Example notebooks and scripts
│ └── ffiec_data_collection_demo.ipynb
├── tests/ # Test suite
│ ├── test_downloader.py
│ └── test_thumbprint.py
├── setup.py # Package configuration
├── requirements.txt # Core dependencies
├── .readthedocs.yaml # Read the Docs config
└── README.md # Project README
Running Tests
Unit Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=ffiec_data_collector --cov-report=html
# Run specific test file
pytest tests/test_downloader.py
# Run with verbose output
pytest -v
Integration Tests
# Test actual downloads (requires internet)
pytest tests/test_integration.py -m integration
# Skip integration tests
pytest -m "not integration"
Test Coverage
# Generate coverage report
pytest --cov=ffiec_data_collector --cov-report=term-missing
# Generate HTML coverage report
pytest --cov=ffiec_data_collector --cov-report=html
# Open htmlcov/index.html in browser
Code Quality
Formatting with Black
# Format all code
black ffiec_data_collector/
# Check formatting without changes
black --check ffiec_data_collector/
# Format specific file
black ffiec_data_collector/downloader.py
Linting with Flake8
# Run linter
flake8 ffiec_data_collector/
# With specific configuration
flake8 --max-line-length=100 ffiec_data_collector/
Type Checking with MyPy
# Run type checker
mypy ffiec_data_collector/
# With stricter settings
mypy --strict ffiec_data_collector/
Building Documentation
Local Documentation Build
cd docs
# Build HTML documentation
make html
# Clean and rebuild
make clean html
# Open in browser (macOS)
open _build/html/index.html
# Open in browser (Linux)
xdg-open _build/html/index.html
Documentation Formats
# Build different formats
make latexpdf # PDF via LaTeX
make epub # ePub format
make json # JSON format
Building and Publishing
Build Distribution Packages
# Install build tools
pip install build twine
# Build source and wheel distributions
python -m build
# Check distribution files
ls dist/
Test with TestPyPI
# Upload to TestPyPI
python -m twine upload --repository testpypi dist/*
# Test installation from TestPyPI
pip install --index-url https://test.pypi.org/simple/ ffiec-data-collector
Publish to PyPI
# Upload to PyPI
python -m twine upload dist/*
# Verify installation
pip install ffiec-data-collector
Contributing
Development Workflow
Fork the repository on GitHub
Clone your fork locally
Create a feature branch
git checkout -b feature/your-feature-name
Make your changes
Run tests to ensure nothing broke
Commit your changes
git add . git commit -m "Add your feature description"
Push to your fork
git push origin feature/your-feature-name
Create a Pull Request on GitHub
Code Style Guidelines
Follow PEP 8 style guide
Use type hints for all functions
Add docstrings to all public functions and classes
Keep line length under 100 characters
Use descriptive variable names
Commit Message Format
<type>: <subject>
<body>
<footer>
Types:
feat: New featurefix: Bug fixdocs: Documentation changesstyle: Code style changesrefactor: Code refactoringtest: Test additions or changeschore: Build process or auxiliary tool changes
Example:
feat: add support for multi-period UBPR downloads
- Implement UBPR_RATIO_FOUR product type
- Add period validation for multi-period products
- Update documentation with new examples
Closes #123
Debugging
Enable Debug Logging
import logging
# Enable debug logging
logging.basicConfig(level=logging.DEBUG)
# Or for specific module
logger = logging.getLogger('ffiec_data_collector.downloader')
logger.setLevel(logging.DEBUG)
Inspect HTTP Traffic
import requests
import logging
from http.client import HTTPConnection
# Enable HTTP debugging
HTTPConnection.debuglevel = 1
# Configure logging
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
Debug ViewState Issues
from ffiec_data_collector import FFIECDownloader
downloader = FFIECDownloader()
downloader.initialize()
# Check ViewState values
print(f"ViewState: {downloader._viewstate[:50]}...")
print(f"ViewStateGenerator: {downloader._viewstate_generator}")
Website Structure Updates
When FFIEC updates their website structure:
1. Capture New Thumbprint
python -m ffiec_data_collector.thumbprint capture
2. Compare Changes
from ffiec_data_collector import ThumbprintValidator
from pathlib import Path
validator = ThumbprintValidator()
# Load old and new thumbprints
old = PageThumbprint.load(Path("old_thumbprint.json"))
new = validator.capture_thumbprint(
"https://cdr.ffiec.gov/public/pws/downloadbulkdata.aspx"
)
# Compare
print(f"Old hash: {old.structural_hash}")
print(f"New hash: {new.structural_hash}")
3. Update Code
Update the extraction logic in downloader.py to handle the new structure.
4. Test Changes
# Test with new structure
downloader = FFIECDownloader()
result = downloader.download_latest(Product.CALL_SINGLE)
assert result.success
5. Submit Pull Request
Include:
Updated thumbprint files
Code changes to handle new structure
Test results showing successful downloads
Performance Optimization
Connection Pooling
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
# Configure connection pooling and retries
session = requests.Session()
retry = Retry(
total=3,
read=3,
connect=3,
backoff_factor=0.3
)
adapter = HTTPAdapter(
pool_connections=10,
pool_maxsize=10,
max_retries=retry
)
session.mount('http://', adapter)
session.mount('https://', adapter)
Timeout Configuration
# Set appropriate timeouts
downloader.session.timeout = (10, 300) # (connect, read) in seconds
Release Process
1. Update Version
Update version in:
setup.pyffiec_data_collector/__init__.pydocs/conf.py
2. Update Changelog
Add entry to CHANGELOG.md with:
Version number
Release date
Changes summary
3. Create Release
# Tag the release
git tag -a v2.0.0 -m "Release version 2.0.0"
# Push tags
git push origin v2.0.0
4. Build and Upload
# Clean old builds
rm -rf dist/ build/
# Build distributions
python -m build
# Upload to PyPI
python -m twine upload dist/*
5. Update Documentation
Documentation on Read the Docs updates automatically from the main branch.
Support
For development questions:
Open an issue on GitHub
Check existing issues and pull requests
Review the documentation
License
This project is licensed under the Mozilla Public License 2.0 (MPL 2.0). See LICENSE file for details.