So I had to write a perl script that finds all the k-mere of length 15 in a 2L chromosome of Drosophila melanogaster. Lead them into a hash, and counts the number of occurrences of each k-mer. A k-mer is a sequence of length K taken from a longer sequence.  I need to loop through the hash and print each k-mer on a line, followed by a tab, then the number of occurrences of that k-mer. Then I have open a file handle for writing output to uniqueKmersEndingGG.fasta, change the window length from 15-23, go through the hash of k-mere and only print out the first 1000 that occur and end with GG, put a FASTA header before each k-mer.

---------------------------------------------------------------------------------------------------------------------------------------

 

So I had to write a perl script that

         finds all the k-mere of length 15 in a 2L chromosome of Drosophila melanogaster.

         Lead them into a hash

         counts the number of occurrences of each k-mer.

 

A k-mer is a sequence of length K taken from a longer sequence.

 

I need to loop through the hash and print each k-mer on a line, followed by a tab, then the number of occurrences of that k-mer.

 

Then I have open a file handle for writing output to uniqueKmersEndingGG.fasta, change the window length from 15-23, go through the hash of k-mere and only print out the first 1000 that occur and end with GG, put a FASTA header before each k-mer.

 

Vocab:

         Drosophila melanogaster : fruit fly

         2L chromosome of Drosophila melanogaster

         k-mere of length 15

         FASTA

         K-mere length 15 end with GG 2L chromosome of Drosophila melanogaster

          

Perl References

         http://stackoverflow.com/questions/5948360/perl-read-a-file-into-an-array

         http://www.perlmonks.org/?node_id=73439

 

 

Biology References

         http://flybase.org/reports/FBsp00000001.html

        

         https://www.biostars.org/p/16396/

        mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select chrom,size from chromInfo limit 5'

        The sequences are long strings : http://genome.ucsc.edu/cgi-bin/hgTracks?db=dm3&chromInfoPage=

        http://blast.ncbi.nlm.nih.gov/Blast.cgi

        FASTA Format http://prodata.swmed.edu/promals/info/fasta_format_file_example.htm

        http://en.wikipedia.org/wiki/FASTA_format

        http://code.izzid.com/2011/10/13/How-to-write-a-fasta-file-in-perl.html


 

 

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select chrom,size from chromInfo limit 5'

 

--defaults-file=# Only read default options from the given file #.

--defaults-extra-file=# Read this file after the global files are read.

--defaults-group-suffix=#

Also read groups with concat(group, suffix)

--login-path=# Read this path from the login file.

 

Variables (--variable-name=value)

and boolean options {FALSE|TRUE} Value (after reading options)

--------------------------------- ----------------------------------------

auto-rehash TRUE

auto-vertical-output FALSE

bind-address (No default value)

character-sets-dir (No default value)

column-type-info FALSE

comments FALSE

compress FALSE

debug-check FALSE

debug-info FALSE

database hg19

default-character-set auto

delimiter ;

enable-cleartext-plugin FALSE

vertical FALSE

force FALSE

named-commands FALSE

ignore-spaces FALSE

init-command (No default value)

local-infile FALSE

no-beep FALSE

host genome-mysql.cse.ucsc.edu

html FALSE

xml FALSE

line-numbers TRUE

unbuffered FALSE

column-names TRUE

sigint-ignore FALSE

port 0

prompt mysql>

quick FALSE

raw FALSE

reconnect FALSE

shared-memory-base-name (No default value)

socket (No default value)

ssl FALSE

ssl-ca (No default value)

ssl-capath (No default value)

ssl-cert (No default value)

ssl-cipher (No default value)

ssl-key (No default value)

ssl-crl (No default value)

ssl-crlpath (No default value)

ssl-verify-server-cert FALSE

table FALSE

user genome

safe-updates FALSE

i-am-a-dummy FALSE

connect-timeout 0

max-allowed-packet 16777216

net-buffer-length 16384

select-limit 1000

max-join-size 1000000

secure-auth TRUE

show-warnings FALSE

plugin-dir (No default value)

default-auth (No default value)

histignore (No default value)

binary-mode FALSE

connect-expired-password FALSE

 

C:\Program Files\MySQL\MySQL Workbench CE 6.1.6>mysql --user=genome --host=geno

me-mysql.cse.ucsc.edu -D hg19

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 14577255

Server version: 5.6.10-log MySQL Community Server (GPL)

 

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

 

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

 

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

 

mysql> describe chromeInfo

-> ;

ERROR 1146 (42S02): Table 'hg19.chromeInfo' doesn't exist

mysql> describe chromInfo

-> ;

+----------+------------------+------+-----+---------+-------+

| Field | Type | Null | Key | Default | Extra |

+----------+------------------+------+-----+---------+-------+

| chrom | varchar(255) | NO | PRI | | |

| size | int(10) unsigned | NO | | 0 | |

| fileName | varchar(255) | YES | | NULL | |

+----------+------------------+------+-----+---------+-------+

3 rows in set (0.10 sec)

 

 

mysql> select * from chromInfo;

+-----------------------+-----------+----------------------+

| chrom | size | fileName |

+-----------------------+-----------+----------------------+

| chr1 | 249250621 | /gbdb/hg19/hg19.2bit |

| chr2 | 243199373 | /gbdb/hg19/hg19.2bit |

| chr3 | 198022430 | /gbdb/hg19/hg19.2bit |

| chr4 | 191154276 | /gbdb/hg19/hg19.2bit |

| chr5 | 180915260 | /gbdb/hg19/hg19.2bit |

| chr6 | 171115067 | /gbdb/hg19/hg19.2bit |

| chr7 | 159138663 | /gbdb/hg19/hg19.2bit |

| chrX | 155270560 | /gbdb/hg19/hg19.2bit |

| chr8 | 146364022 | /gbdb/hg19/hg19.2bit |

| chr9 | 141213431 | /gbdb/hg19/hg19.2bit |

| chr10 | 135534747 | /gbdb/hg19/hg19.2bit |

| chr11 | 135006516 | /gbdb/hg19/hg19.2bit |

| chr12 | 133851895 | /gbdb/hg19/hg19.2bit |

| chr13 | 115169878 | /gbdb/hg19/hg19.2bit |

| chr14 | 107349540 | /gbdb/hg19/hg19.2bit |

| chr15 | 102531392 | /gbdb/hg19/hg19.2bit |

| chr16 | 90354753 | /gbdb/hg19/hg19.2bit |

| chr17 | 81195210 | /gbdb/hg19/hg19.2bit |

| chr18 | 78077248 | /gbdb/hg19/hg19.2bit |

| chr20 | 63025520 | /gbdb/hg19/hg19.2bit |

| chrY | 59373566 | /gbdb/hg19/hg19.2bit |

| chr19 | 59128983 | /gbdb/hg19/hg19.2bit |

| chr22 | 51304566 | /gbdb/hg19/hg19.2bit |

| chr21 | 48129895 | /gbdb/hg19/hg19.2bit |

| chr6_ssto_hap7 | 4928567 | /gbdb/hg19/hg19.2bit |

| chr6_mcf_hap5 | 4833398 | /gbdb/hg19/hg19.2bit |

| chr6_cox_hap2 | 4795371 | /gbdb/hg19/hg19.2bit |

| chr6_mann_hap4 | 4683263 | /gbdb/hg19/hg19.2bit |

| chr6_apd_hap1 | 4622290 | /gbdb/hg19/hg19.2bit |

| chr6_qbl_hap6 | 4611984 | /gbdb/hg19/hg19.2bit |

| chr6_dbb_hap3 | 4610396 | /gbdb/hg19/hg19.2bit |

| chr17_ctg5_hap1 | 1680828 | /gbdb/hg19/hg19.2bit |

| chr4_ctg9_hap1 | 590426 | /gbdb/hg19/hg19.2bit |

| chr1_gl000192_random | 547496 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000225 | 211173 | /gbdb/hg19/hg19.2bit |

| chr4_gl000194_random | 191469 | /gbdb/hg19/hg19.2bit |

| chr4_gl000193_random | 189789 | /gbdb/hg19/hg19.2bit |

| chr9_gl000200_random | 187035 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000222 | 186861 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000212 | 186858 | /gbdb/hg19/hg19.2bit |

| chr7_gl000195_random | 182896 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000223 | 180455 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000224 | 179693 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000219 | 179198 | /gbdb/hg19/hg19.2bit |

| chr17_gl000205_random | 174588 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000215 | 172545 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000216 | 172294 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000217 | 172149 | /gbdb/hg19/hg19.2bit |

| chr9_gl000199_random | 169874 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000211 | 166566 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000213 | 164239 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000220 | 161802 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000218 | 161147 | /gbdb/hg19/hg19.2bit |

| chr19_gl000209_random | 159169 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000221 | 155397 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000214 | 137718 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000228 | 129120 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000227 | 128374 | /gbdb/hg19/hg19.2bit |

| chr1_gl000191_random | 106433 | /gbdb/hg19/hg19.2bit |

| chr19_gl000208_random | 92689 | /gbdb/hg19/hg19.2bit |

| chr9_gl000198_random | 90085 | /gbdb/hg19/hg19.2bit |

| chr17_gl000204_random | 81310 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000233 | 45941 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000237 | 45867 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000230 | 43691 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000242 | 43523 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000243 | 43341 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000241 | 42152 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000236 | 41934 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000240 | 41933 | /gbdb/hg19/hg19.2bit |

| chr17_gl000206_random | 41001 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000232 | 40652 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000234 | 40531 | /gbdb/hg19/hg19.2bit |

| chr11_gl000202_random | 40103 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000238 | 39939 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000244 | 39929 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000248 | 39786 | /gbdb/hg19/hg19.2bit |

| chr8_gl000196_random | 38914 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000249 | 38502 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000246 | 38154 | /gbdb/hg19/hg19.2bit |

| chr17_gl000203_random | 37498 | /gbdb/hg19/hg19.2bit |

| chr8_gl000197_random | 37175 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000245 | 36651 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000247 | 36422 | /gbdb/hg19/hg19.2bit |

| chr9_gl000201_random | 36148 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000235 | 34474 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000239 | 33824 | /gbdb/hg19/hg19.2bit |

| chr21_gl000210_random | 27682 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000231 | 27386 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000229 | 19913 | /gbdb/hg19/hg19.2bit |

| chrM | 16571 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000226 | 15008 | /gbdb/hg19/hg19.2bit |

| chr18_gl000207_random | 4262 | /gbdb/hg19/hg19.2bit |

+-----------------------+-----------+----------------------+

93 rows in set (0.10 sec)