I would like to use PLINK data files (.bed, .fam, .map) etc. in Python. There is a module called pyplink.
Install
Can install with pip
pip install pyplink
How to use
Assuming that there are a series of files such as foo.bed foo.fam foo.bim in the current directory
from pyplink import PyPlink
pyp = PyPlink("foo")
Now we have an object called pyp. This is an object that combines .bed, .fam, and .bim files. Each member information can access each information.
pyp.get_fam()
pyp.get_nb_samples()
pyp.get_bim()
pyp.get_nb_markers()
markerNames = pyp.get_bim().iloc[:,5]
Specify marker name to obtain genotype.
If you set it to acgt you can get base information.
pyp.get_geno_marker(markerNames[0])
pyp.get_acgt_geno_marker(markerNames[0])
It is also possible to obtain the marker ID and genotype as an iterator.
markers = ["rs7092431", "rs9943770", "rs1578483"]
for marker_id, genoypes in pyp.iter_geno_marker(markers):
  print(marker_id)
  print(genotypes, end="\n\n")
Sample script
About all markers on chromosome 23 We obtain all genotypes of male samples
for marker_ID, genotypes in pyp.iter_geno_marker(y_markers):
    male_genotypes = genotypes[males]
    print("{:d} total genotypes".format(len(genotypes)))
    print("{:d} genotypes for {:,d} males ({} on chr{} and position {:,d})".format(
        len(male_genotypes),
        males.sum(),
        marker_ID,
        all_markers.loc[marker_ID, "chrom"],
        all_markers.loc[marker_ID, "pos"],
    ))
    break
Get the Minor allele frequency and genotype of the specified marker
founders = (all_samples.father == "0") & (all_samples.mother == "0")
markers = ["rs7092431", "rs9943770", "rs1587483"]
for marker_ID, genotypes in pyp.iter_geno_marker(markers):
    valid_genotypes = genotypes[founders.values & (genotypes != -1)]
    maf = valid_genotypes.sum()/(len(valid_genotypes)*2)
    print(marker_ID, round(maf, 6), sep="\t")
    print(genotypes)
