Skip to content
PowCoder代写
  • Home
  • 编程语言代写
    • Java代写
    • Python代写
    • Matlab代写
    • R语言代写代考
    • DrRacket-Scheme代写
    • Prolog代写
    • Haskell代写代考
    • OCaml代写
    • MIPS汇编代写
    • C语言/C++代写
    • Javascript代写
  • Computer Science科目
    • 数据结构算法代写代考 data structure algorithm
    • 机器学习代写代考 machine learning
    • 人工智能 AI Artificial Intelligence
    • 计算机体系结构代写代考 Computer Architecture
    • 操作系统OS代写代考 (Operating System)
    • 计算机网络 套接字编程 computer network socket programming
    • 数据库代写代考 DB Database SQL
    • 编译器原理 Compiler
    • 计算机图形学 Computer Graphics opengl webgl
    • 系统编程 System programming
    • 计算理论 Theory of Computation
    • 网页应用 Web Application
    • 自然语言处理 NLP natural language processing
    • 计算机安全密码学computer security cryptography
  • Demo
  • 成功案例
  • 视频演示
PowCoder代写
  • C语言/C++代写
  • Demo
  • DrRacket-Scheme代写
  • GRE代考
  • Haskell代写代做代考
  • Haskell代写代考
  • Javascript代写
  • Java代写
  • Matlab代写
  • MIPS汇编代写
  • MIPS汇编代写代做代考
  • OCaml代写代做代考
  • Prolog代写
  • Prolog代写代做代考
  • Python代写
  • R语言代写代考
  • 专业留学生程序代写

java生物并行计算代写:CSE 231 K-MerCounting Assignment

程序代写 CS代考 / Java代写代考, 并行编程代写代考
K-MerCounting Assignment
Contents
1Background
2Breakdown
3KMerUtils
3.1long calculatePossibleKMers(int kMerLength)
3.2int calculateSumOfAllKMers(List<byte[]> sequences, int kMerLength)
3.3String toString(byte[] sequence, int offset, int kMerLength)
3.4String toString(byte[] kMer)
3.5int toPackedInt(byte[] sequence, int offset, int kMerLength)
3.6int toPackedInt(byte[] kMer)
3.7byte[] unpackInt(int kMer, int kMerLength)
3.8long toPackedLong(byte[] sequence, int offset, int kMerLength)
3.9byte[] unpackLong(long kMer, int kMerLength)
4Warmups
4.1String HashMap Implementation
4.2String ConcurrentHashMap Implementation
4.3ByteArrayOffset Implementation
5Assignment
5.1ThresholdSlices
5.2LongConcurrentHashMapKMerCounter
5.3IntArrayKMerCounter
5.4AtomicIntegerArrayKMerCounter
5.5IsolatedBucketDictionaryKMerCounter
5.5.1IsolatedBucketDictionary
6Extra Credit
6.1Open-ended K-mer Counter
7Rubric
8Research
Background
The term k-mer refers to a substring of length k and k-mer counting refers to the process of finding the number of occurrences of a k-mer in a given string. This computational problem has many real-world applications, the most notable being in the field of computational genomics. In this assignment, you will design a program that will ultimately take in a human X-chromosome and count the number of k-mers in the string of DNA. You will be making use of everything you have learned thus far, including atomic variables. It is useful to know that given a string of length n, the amount of possible k-mers is equal to n – k + 1.
Breakdown
This assignment is broken into in-class studios, demos, and a take-home assignment. For this assignment, you will create a k-mer counting program using several different data structures. Links to the documentation for all of the java.util.concurrent data types can be found below.
ConcurrentHashMap and AtomicIntegerArray.
KMerUtils
The following methods should be useful as you build the assignment.
Check out the Javadocs either on the web, or in the comments in KMerUtils.java for further details on how they work.
long calculatePossibleKMers(int kMerLength)
public static long calculatePossibleKMers(int kMerLength) {
int bitsPerBase = 2;
int bitsPerKMer = kMerLength * bitsPerBase;
long possibleKMers = 1L << bitsPerKMer;
return possibleKMers;
}
int calculateSumOfAllKMers(List<byte[]> sequences, int kMerLength)
public static int calculateSumOfAllKMers(List<byte[]> sequences, int kMerLength) {
int sum = 0;
for (byte[] sequence : sequences) {
int i = (sequence.length – kMerLength + 1);
if( i > 0 ) {
sum += i;
}
}
return sum;
}
String toString(byte[] sequence, int offset, int kMerLength)
public static String toString(byte[] sequence, int offset, int kMerLength) {
return new String(sequence, offset, kMerLength, StandardCharsets.UTF_8);
}
String toString(byte[] kMer)
public static String toString(byte[] kMer) {
return new String(kMer, StandardCharsets.UTF_8);
}
int toPackedInt(byte[] sequence, int offset, int kMerLength)
public static int toPackedInt(byte[] sequence, int offset, int kMerLength) {
int result = 0;
for (int i = 0; i < kMerLength; i++) {
switch (sequence[offset + i]) {
// case A:
// result |= (A_MASK_INT << (i + i));
// break;
case T:
result |= (T_MASK_INT << (i + i));
break;
case C:
result |= (C_MASK_INT << (i + i));
break;
case G:
result |= (G_MASK_INT << (i + i));
break;
}
}
return result;
}
int toPackedInt(byte[] kMer)
public static int toPackedInt(byte[] kMer) {
return toPackedInt(kMer, 0, kMer.length);
}
byte[] unpackInt(int kMer, int kMerLength)
public static byte[] unpackInt(int kMer, int kMerLength) {
byte[] result = new byte[kMerLength];
for (int i = 0; i < kMerLength; i++) {
int v = (kMer >> (i + i)) & 0x3;
switch (v) {
case A_MASK_INT:
result[i] = A;
break;
case T_MASK_INT:
result[i] = T;
break;
case C_MASK_INT:
result[i] = C;
break;
case G_MASK_INT:
result[i] = G;
break;
}
}
return result;
}
long toPackedLong(byte[] sequence, int offset, int kMerLength)
public static long toPackedLong(byte[] sequence, int offset, int kMerLength) {
long result = 0;
for (int i = 0; i < kMerLength; i++) {
switch (sequence[offset + i]) {
// case A:
// result |= (A_MASK_LONG << (i + i));
// break;
case T:
result |= (T_MASK_LONG << (i + i));
break;
case C:
result |= (C_MASK_LONG << (i + i));
break;
case G:
result |= (G_MASK_LONG << (i + i));
break;
}
}
return result;
}
==long toPackedLong(byte[] sequence)==
 <nowiki>public static long toPackedLong(byte[] sequence) {
return toPackedLong(sequence, 0, sequence.length);
}
byte[] unpackLong(long kMer, int kMerLength)
public static byte[] unpackLong(long kMer, int kMerLength) {
byte[] result = new byte[kMerLength];
for (int i = 0; i < kMerLength; i++) {
long v = (kMer >> (i + i)) & 0x3;
switch ((int) v) {
case A_MASK_INT:
result[i] = A;
break;
case T_MASK_INT:
result[i] = T;
break;
case C_MASK_INT:
result[i] = C;
break;
case G_MASK_INT:
result[i] = G;
break;
}
}
return result;
}
Warmups
The warm-ups associated with this assignment can be found under the kmer.warmup packages.
String HashMap Implementation
In this completely sequential implementation, you will have to write the parse method. The method takes in a list of arrays of bytes and a k-mer length. It should return a StringMapKmerCount (which takes in a map), a class provided to you which does exactly what its name suggests. In order to implement this method, we recommend taking a look at the KMerUtils class and more specifically the toString method, which converts a given byte array into a String in order to make it compatible with the String HashMap.
parse should go through the amount of possible k-mers for every byte array in the list of sequences. As it goes through the bytes in the array, use the KMerUtils.toString() method to create a string to use for the HashMap. The map should take in a String as the key and an Integer as the value. We recommend using the map.compute() method and reviewing how to use lambdas.
String ConcurrentHashMap Implementation
This implementation will make your sequential String HashMap implementation into a parallel one. To do so, you will be making use of Java’s atomic version of a HashMap: a ConcurrentHashMap. Like before, you will be need to complete the parse method. You can choose to finish this method with a forall loop or a finish/async approach, which is completely up to you.
ByteArrayOffset Implementation
This parallel implementation will overlap greatly with your previous two implementations, but it will use ByteArrayOffsetKMer instead of Strings. Refer to the ByteArrayOffsetKMer class in order to get a better idea of how to implement this method and instantiate an instance of the class. The class closely resembles the Slice class you worked on earlier in the semester.
Assignment
The assignment itself can be found under the kmer.assignment packages.
ThresholdSlicesThis class is designed to take in a list of bytes and convert it into a list of slices. The slice class should be familiar to you from earlier in the semester, but feel free to refer back to the documentation on Slices for a quick recap. You should perform this task recursively and there are two methods you will need to implement. The createSlicesBelowThreshold method should call the addToCollectionKernel once, but the kernel should take over the recursion from there.
In the createSlicesBelowThreshold method, you should call the recursive kernel for each sequence in the list of sequences. When defining the min and max values, keep the information mentioned in the “Background” section in mind. As for the kernel, it should add a new slice to the collection of slices once the length of the slice dips below the provided threshold, but it should otherwise split itself into two smaller slices using recursion.
Hint: when defining the new slice, the sliceIndexId does not matter in this case.
for N=10 and K=3, you should be slicing like this:
LongConcurrentHashMapKMerCounter
This parallel implementation will also overlap greatly with everything you have done previously, but instead of using Strings or SequenceSlice, we will be using Longs to represent DNA data. In order to do this, refer to the KMerUtils class, more specifically the toPackedLong method which will convert a sequence of bytes into a Long for our purposes. As the data that this implementation can take is much larger than the previous implementations, consider how to instantiate the ConcurrentHashMap in order to avoid dynamically resizing the map. KMerUtils has a method which will calculate this number for you, but think about which one to use, why you would use it, and how that would differ from simply calculating n – k + 1 yourself.
Hint: calculatePossibleKmers and calculateSumOfAllKmers are distinct and take different input parameters. calculateSumOfAllKmers performs the n – k + 1 calculation on a list of arrays of sequences and sums the total. calculatePossibleKmers only takes the length of the k-mer (k) as the input parameter, meaning it will calculate possibilities independent of the size of the sequence. Thus, the n – k + 1 calculation would yield a value ranging from 0 (inclusive) to the result of calculatePossibleKmers (exclusive).
IntArrayKMerCounter
Like the String HashMap implementation, this implementation should be completed sequentially. However, unlike the previous implementations, we will be using an array instead of a map. This means that you must keep in mind what size to make the array in order to instantiate it. KMerUtils has a method which will calculate this number for you, but think about which one to use, why you would use it, and how that would differ from simply calculating n – k + 1 yourself. Think of using the int array as similar to using the map, but with ints as the key/value pairs. Use KMerUtils.toPackedInt() in order to convert the sequence into an int for our purposes.
Hint: calculatePossibleKmers and calculateSumOfAllKmers are distinct and take different input parameters. calculateSumOfAllKmers performs the n – k + 1 calculation on a list of arrays of sequences and sums the total. calculatePossibleKmers only takes the length of the k-mer (k) as the input parameter, meaning it will calculate possibilities independent of the size of the sequence. Thus, the n – k + 1 calculation would yield a value ranging from 0 (inclusive) to the result of calculatePossibleKmers (exclusive).
AtomicIntegerArrayKMerCounter
This implementation is simply the parallel equivalent of the int array implementation. To do so, you will be making use of Java’s atomic version of an array: an AtomicIntegerArray. Like before, you will be need to complete the parse method. You can choose to finish this method with a forall loop or a finish/async approach, which is completely up to you.
IsolatedBucketDictionaryKMerCounter
This parallel implementation will go a step beyond the previous implementations and divide entries into 1024 buckets. Although a different number of buckets can be used, we will simply stick to 1024. Every time an entry is added to the bucket dictionary, it will create a hash based on the entry’s key and place the entry into one of the 1024 buckets. Because we would only need to search one bucket for an entry based on a key’s hash, this mainly serves the purpose of cutting down on search time. The compute method of the IsolatedBucketDictionary should automatically compute the correct bucket for the entry, so this implementation should closely resemble the previous ConcurrentHashMap implementations. When instantiating the IsolatedBucketDictionary, we recommend making the key/value pairs a String and an Integer.
IsolatedBucketDictionary
 Warning: Do NOT call get() from compute(). It will result in a nonintuitive error. getEntry(bucket, key) gives you everything you need.
 Tip: Be sure to make your IsolatedBucketDictionary thread-safe. If any code is going to access shared mutable data, it needs to do so in a thread-safe manner.
A simple example of how to use the many incantations of isolated:
@ThreadSafe
public class IsolatedInteger {
    private int value;
    public IsolatedInteger(int value) {
        this.value = value;
    }
    public int getValue() {
        return isolatedWithReturn(objectBased(readMode(this)), () -> {
            return value;
        });
    }
    public void increment() {
        isolated(objectBased(this), () -> {
            value++;
        });
    }
}
Extra Credit
Open-ended K-mer Counter
This implementation is worth up to 4.5 extra credit points.
It is intended as an opportunity to build a high-performing, thread-safe data structure.
One possible path was outlined at a high level in class. However, students are encouraged to think of alternate solutions.
We have explicitly disallowed implementations akin to our matrix map reduce assignment, since we have already done the “give everyone their own data and combine in post” pattern to death.
Partial credit is possible for solutions which demonstrate significant ingenuity.
Rubric
Total points: 100
Correct ThresholdSlices (10)
Correct Long ConcurrentHashMap (15)
Correct Int Array (15)
Correct AtomicIntegerArray (15)
Correct BucketDictionaryCounter (15)
Correct BucketDictionary implementation (20)
Clarity and efficiency (10)
Research
wikipedia article on k-mers
paper: A fast, lock-free approach for efficient parallel counting of occurrences of k-mers (Jellyfish)
paper: Multiple comparative metagenomics using multiset k-mer counting
Collection of approaches to k-mer counting
Burrows–Wheeler transform
Significance of k-mer Counting
← Previous Post
Next Post →

Related Posts

matlab simulation

程序代写 CS代考 / matlab代写代考, simulation

AB202 Assignment 1: Arkapong

程序代写 CS代考 / c++代做

MSc C++ Programming

程序代写 CS代考 / c++代做

MSc Assessed Prolog Lab Exercise 2

程序代写 CS代考 / Prolog代写代考

Spring Session:2015:Assignment 1

程序代写 CS代考 / c++代做, UML

Assignment 2: "Inception and Elaboration"

程序代写 CS代考 / UML

android app

程序代写 CS代考 / android, Java代写代考

COMP220: Software Development Tools

程序代写 CS代考 / Java代写代考, junit

Contact

  • QQ: 1823890830
  • 微信号(WeChat): powcoder
  • myweixin
  • Email: powcoder@163.com
  • 请加微信或QQ发要求
  • Contact me through WeChat

Categories

  • 机器学习代写代考 machine learning
  • 数据库代写代考 DB Database SQL
  • 数据结构算法代写代考 data structure algorithm
  • 人工智能 AI Artificial Intelligence
  • 编译器原理 Compiler
  • 计算机网络 套接字编程 computer network socket programming
  • 大数据 Hadoop Map Reduce Spark HBase
  • 操作系统OS代写代考 (Operating System)
  • 计算机体系结构代写代考 Computer Architecture
  • 计算机图形学 Computer Graphics opengl webgl
  • 自然语言处理 NLP natural language processing
  • 并行计算
  • 计算理论 Theory of Computation
  • 计算机安全密码学computer security cryptography
  • 系统编程 System programming
  • 数值科学计算
  • 计算机视觉代写代考(Compute Vision)
  • 网页应用 Web Application
  • 分布式系统
  • 笔试面试
  • 函数式编程
  • 数据挖掘 Data Mining
  • 离散数学代写代考 (Discrete mathematics)
  • 软件工程
  • 编程语言 Programming Language
  • 统计代写代考
  • 运筹学 Operation Research

Tag

Algorithm算法代写代考Java代写代考databasedata structurePython代写代考compilerScheme代写代考C语言代写AI代写c++代写SQL代写代考Haskell代写代考javascriptconcurrencymatlab代写代考financeinterpreterMIPS汇编代写代考data miningdecision treedeep learning深度学习代写代考Prolog代写代考file systemc++代做computer architectureERguiGPUdata sciencex86汇编代写代考case studydistributed systemandroidkernelARM汇编代写代考
  • gatech CS 6035 信息安全代写
  • gatech cs7638 AI4R 代做辅导
  • 程序代写 CS7641 Assignment 4 Markov Decision Processes Fall 2024
  • CS代写 CS 0447 Computer Organization and Assembly Language Midterm Project – Conne
  • CS代考 CS 0447 Computer Organization and Assembly Language Midterm Project – Conne

Copyright © 2026 PowCoder代写 | Powered by Astra WordPress Theme